linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Discard __GFP_ATOMIC
@ 2022-11-29 15:16 Mel Gorman
  2022-11-29 15:16 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

Neil's patch has been residing in mm-unstable as commit 2fafb4fe8f7a
("mm: discard __GFP_ATOMIC") for a long time and recently brought up
again. Most recently, I was worried that __GFP_HIGH allocations could
use high-order atomic reserves which is unintentional but there was no
response so lets revisit -- this series reworks how min reserves are used,
protects highorder reserves and then finishes with Neil's patch with very
minor modifications so it fits on top.

There was a review discussion on renaming __GFP_DIRECT_RECLAIM to
__GFP_ALLOW_BLOCKING but I didn't think it was that big an issue and is
ortogonal to the removal of __GFP_ATOMIC.

There were some concerns about how the gfp flags affect the min reserves
but it never reached a solid conclusion so I made my own attempt.

The series tries to iron out some of the details on how reserves are
used.  ALLOC_HIGH becomes ALLOC_MIN_RESERVE and ALLOC_HARDER becomes
ALLOC_NON_BLOCK and documents how the reserves are affected. For example,
ALLOC_NON_BLOCK (no direct reclaim) on its own allows 25% of the min reserve.
ALLOC_MIN_RESERVE (__GFP_HIGH) allows 50% and both combined allows deeper
access again. ALLOC_OOM allows access to 75%.

High-order atomic allocations are explicitly handled with the caveat that
no __GFP_ATOMIC flag means that any high-order allocation that specifies
GFP_HIGH and cannot enter direct reclaim will be treated as if it was
GFP_ATOMIC.

-- 
2.35.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
@ 2022-11-29 15:16 ` Mel Gorman
  2022-12-08 16:12   ` Vlastimil Babka
  2022-11-29 15:16 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to GFP_HIGH Mel Gorman
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

__GFP_HIGH aliases to ALLOC_HIGH but the name does not really hint
what it means. As ALLOC_HIGH is internal to the allocator, rename
it to ALLOC_MIN_RESERVE to document that the min reserves can
be depleted.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   | 4 +++-
 mm/page_alloc.c | 8 ++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 6b7ef495b56d..d503e57a57a1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -744,7 +744,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #endif
 
 #define ALLOC_HARDER		 0x10 /* try to alloc harder */
-#define ALLOC_HIGH		 0x20 /* __GFP_HIGH set */
+#define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
+				       * of the min watermark.
+				       */
 #define ALLOC_CPUSET		 0x40 /* check for correct cpuset */
 #define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */
 #ifdef CONFIG_ZONE_DMA32
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 218b28ee49ed..3b37909617bc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3981,7 +3981,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 	/* free_pages may go negative - that's OK */
 	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
 
-	if (alloc_flags & ALLOC_HIGH)
+	if (alloc_flags & ALLOC_MIN_RESERVE)
 		min -= min / 2;
 
 	if (unlikely(alloc_harder)) {
@@ -4823,18 +4823,18 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
 	/*
-	 * __GFP_HIGH is assumed to be the same as ALLOC_HIGH
+	 * __GFP_HIGH is assumed to be the same as ALLOC_MIN_RESERVE
 	 * and __GFP_KSWAPD_RECLAIM is assumed to be the same as ALLOC_KSWAPD
 	 * to save two branches.
 	 */
-	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
+	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_MIN_RESERVE);
 	BUILD_BUG_ON(__GFP_KSWAPD_RECLAIM != (__force gfp_t) ALLOC_KSWAPD);
 
 	/*
 	 * The caller may dip into page reserves a bit more if the caller
 	 * cannot run direct reclaim, or if the caller has realtime scheduling
 	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
-	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH).
+	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
 	 */
 	alloc_flags |= (__force int)
 		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to GFP_HIGH
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
  2022-11-29 15:16 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
@ 2022-11-29 15:16 ` Mel Gorman
  2022-12-08 16:16   ` Vlastimil Babka
  2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is
typically combined with ALLOC_MIN_RESERVE so RT tasks are a little
unusual. While there is some justification for allowing RT tasks
access to memory reserves, there is a strong chance that a RT task
that is also under memory pressure is at risk of missing deadlines
anyway. Relax how much reserves an RT task can access by treating
it the same as __GFP_HIGH allocations.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b37909617bc..da746e9eb2cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4852,7 +4852,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 */
 		alloc_flags &= ~ALLOC_CPUSET;
 	} else if (unlikely(rt_task(current)) && in_task())
-		alloc_flags |= ALLOC_HARDER;
+		alloc_flags |= ALLOC_MIN_RESERVE;
 
 	alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
  2022-11-29 15:16 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
  2022-11-29 15:16 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to GFP_HIGH Mel Gorman
@ 2022-11-29 15:16 ` Mel Gorman
  2022-12-05  5:17   ` NeilBrown
  2022-12-08 16:51   ` Vlastimil Babka
  2022-11-29 15:16 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
is accurate, it changes later in the series. In preparation, explicitly
record high-order atomic allocations in gfp_to_alloc_flags().

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   |  1 +
 mm/page_alloc.c | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index d503e57a57a1..9a9d9b5ee87f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -754,6 +754,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #else
 #define ALLOC_NOFRAGMENT	  0x0
 #endif
+#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 enum ttu_flags;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da746e9eb2cf..e2b65767dda0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 		 * reserved for high-order atomic allocation, so order-0
 		 * request should skip it.
 		 */
-		if (order > 0 && alloc_flags & ALLOC_HARDER)
+		if (alloc_flags & ALLOC_HIGHATOMIC)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
@@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			return true;
 		}
 #endif
-		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
+		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
+		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
 			return true;
+		}
 	}
 	return false;
 }
@@ -4291,7 +4293,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 			 * If this is a high-order atomic allocation then check
 			 * if the pageblock should be reserved for the future
 			 */
-			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
+			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
 				reserve_highatomic_pageblock(page, zone, order);
 
 			return page;
@@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
 }
 
 static inline unsigned int
-gfp_to_alloc_flags(gfp_t gfp_mask)
+gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 {
 	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
@@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.
 		 */
-		if (!(gfp_mask & __GFP_NOMEMALLOC))
+		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
 			alloc_flags |= ALLOC_HARDER;
+
+			if (order > 0)
+				alloc_flags |= ALLOC_HIGHATOMIC;
+		}
+
 		/*
 		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
 		 * comment for __cpuset_node_allowed().
@@ -5053,7 +5060,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * kswapd needs to be woken up, and to avoid the cost of setting up
 	 * alloc_flags precisely. So we do that now.
 	 */
-	alloc_flags = gfp_to_alloc_flags(gfp_mask);
+	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
 
 	/*
 	 * We need to recalculate the starting point for the zonelist iterator
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
                   ` (2 preceding siblings ...)
  2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
@ 2022-11-29 15:16 ` Mel Gorman
  2022-12-08 17:55   ` Vlastimil Babka
  2022-11-29 15:17 ` [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves Mel Gorman
  2022-11-29 15:17 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
  5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

As there are more ALLOC_ flags that affect reserves, define what flags
affect reserves and clarify the effect of each flag.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   |  3 +++
 mm/page_alloc.c | 36 +++++++++++++++++++++++-------------
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 9a9d9b5ee87f..370500718732 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -757,6 +757,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
+/* Flags that allow allocations below the min watermark. */
+#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+
 enum ttu_flags;
 struct tlbflush_unmap_batch;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2b65767dda0..85a87d0ac57a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3944,15 +3944,14 @@ ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE);
 static inline long __zone_watermark_unusable_free(struct zone *z,
 				unsigned int order, unsigned int alloc_flags)
 {
-	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
 	long unusable_free = (1 << order) - 1;
 
 	/*
-	 * If the caller does not have rights to ALLOC_HARDER then subtract
-	 * the high-atomic reserves. This will over-estimate the size of the
-	 * atomic reserve but it avoids a search.
+	 * If the caller does not have rights to reserves below the min
+	 * watermark then subtract the high-atomic reserves. This will
+	 * over-estimate the size of the atomic reserve but it avoids a search.
 	 */
-	if (likely(!alloc_harder))
+	if (likely(!(alloc_flags & ALLOC_RESERVES)))
 		unusable_free += z->nr_reserved_highatomic;
 
 #ifdef CONFIG_CMA
@@ -3976,25 +3975,36 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 {
 	long min = mark;
 	int o;
-	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
 
 	/* free_pages may go negative - that's OK */
 	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
 
-	if (alloc_flags & ALLOC_MIN_RESERVE)
-		min -= min / 2;
+	if (alloc_flags & ALLOC_RESERVES) {
+		/*
+		 * __GFP_HIGH allows access to 50% of the min reserve as well
+		 * as OOM.
+		 */
+		if (alloc_flags & ALLOC_MIN_RESERVE)
+			min -= min / 2;
 
-	if (unlikely(alloc_harder)) {
 		/*
-		 * OOM victims can try even harder than normal ALLOC_HARDER
+		 * Non-blocking allocations can access some of the reserve
+		 * with more access if also __GFP_HIGH. The reasoning is that
+		 * a non-blocking caller may incur a more severe penalty
+		 * if it cannot get memory quickly, particularly if it's
+		 * also __GFP_HIGH.
+		 */
+		if (alloc_flags & (ALLOC_HARDER|ALLOC_HIGHATOMIC))
+			min -= min / 4;
+
+		/*
+		 * OOM victims can try even harder than the normal reserve
 		 * users on the grounds that it's definitely going to be in
 		 * the exit path shortly and free memory. Any allocation it
 		 * makes during the free path will be small and short-lived.
 		 */
 		if (alloc_flags & ALLOC_OOM)
 			min -= min / 2;
-		else
-			min -= min / 4;
 	}
 
 	/*
@@ -5293,7 +5303,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * could deplete whole memory reserves which would just make
 		 * the situation worse
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
 		if (page)
 			goto got_pg;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
                   ` (3 preceding siblings ...)
  2022-11-29 15:16 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
@ 2022-11-29 15:17 ` Mel Gorman
  2022-12-08 18:07   ` Vlastimil Babka
  2022-11-29 15:17 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
  5 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:17 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
other non-blocking allocation requests equal access to reserve.  Rename
ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
means.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   |  7 +++++--
 mm/page_alloc.c | 23 +++++++++++++----------
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 370500718732..98b1e526559d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -743,7 +743,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_OOM		ALLOC_NO_WATERMARKS
 #endif
 
-#define ALLOC_HARDER		 0x10 /* try to alloc harder */
+#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
+				       * to 25% of the min watermark or
+				       * 62.5% if __GFP_HIGH is set.
+				       */
 #define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
 				       * of the min watermark.
 				       */
@@ -758,7 +761,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 /* Flags that allow allocations below the min watermark. */
-#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 85a87d0ac57a..6bee987ec9a3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3994,7 +3994,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		 * if it cannot get memory quickly, particularly if it's
 		 * also __GFP_HIGH.
 		 */
-		if (alloc_flags & (ALLOC_HARDER|ALLOC_HIGHATOMIC))
+		if (alloc_flags & (ALLOC_NON_BLOCK|ALLOC_HIGHATOMIC))
 			min -= min / 4;
 
 		/*
@@ -4846,28 +4846,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	 * The caller may dip into page reserves a bit more if the caller
 	 * cannot run direct reclaim, or if the caller has realtime scheduling
 	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
-	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
+	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
 	 */
 	alloc_flags |= (__force int)
 		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
 
-	if (gfp_mask & __GFP_ATOMIC) {
+	if (!(gfp_mask & __GFP_RECLAIM)) {
 		/*
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.
 		 */
 		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
-			alloc_flags |= ALLOC_HARDER;
+			alloc_flags |= ALLOC_NON_BLOCK;
 
 			if (order > 0)
 				alloc_flags |= ALLOC_HIGHATOMIC;
 		}
 
 		/*
-		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
-		 * comment for __cpuset_node_allowed().
+		 * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
+		 * GFP_ATOMIC) rather than fail, see the comment for
+		 * __cpuset_node_allowed().
 		 */
-		alloc_flags &= ~ALLOC_CPUSET;
+		if (alloc_flags & ALLOC_MIN_RESERVE)
+			alloc_flags &= ~ALLOC_CPUSET;
 	} else if (unlikely(rt_task(current)) && in_task())
 		alloc_flags |= ALLOC_MIN_RESERVE;
 
@@ -5299,11 +5301,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 		/*
 		 * Help non-failing allocations by giving them access to memory
-		 * reserves but do not use ALLOC_NO_WATERMARKS because this
+		 * reserves normally used for high priority non-blocking
+		 * allocations but do not use ALLOC_NO_WATERMARKS because this
 		 * could deplete whole memory reserves which would just make
-		 * the situation worse
+		 * the situation worse.
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_NON_BLOCK, ac);
 		if (page)
 			goto got_pg;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/6] mm: discard __GFP_ATOMIC
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
                   ` (4 preceding siblings ...)
  2022-11-29 15:17 ` [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves Mel Gorman
@ 2022-11-29 15:17 ` Mel Gorman
  2022-12-08 18:17   ` Vlastimil Babka
  2023-01-05 13:49   ` Mike Rapoport
  5 siblings, 2 replies; 24+ messages in thread
From: Mel Gorman @ 2022-11-29 15:17 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

From: NeilBrown <neilb@suse.de>

__GFP_ATOMIC serves little purpose.  Its main effect is to set
ALLOC_HARDER which adds a few little boosts to increase the chance of an
allocation succeeding, one of which is to lower the water-mark at which it
will succeed.

It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
adjusts this watermark.  It is probable that other users of __GFP_HIGH
should benefit from the other little bonuses that __GFP_ATOMIC gets.

__GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
There is little point to this.  We already get a might_sleep() warning if
__GFP_DIRECT_RECLAIM is set.

__GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
probable that testing ALLOC_HARDER is a better fit here.

__GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
sleep.  This should test __GFP_DIRECT_RECLAIM instead.

This patch:
 - removes __GFP_ATOMIC
 - allows __GFP_HIGH allocations to ignore watermark boosting as well
   as GFP_ATOMIC requests.
 - makes other adjustments as suggested by the above.

The net result is not change to GFP_ATOMIC allocations.  Other
allocations that use __GFP_HIGH will benefit from a few different extra
privileges.  This affects:
  xen, dm, md, ntfs3
  the vermillion frame buffer
  hibernation
  ksm
  swap
all of which likely produce more benefit than cost if these selected
allocation are more likely to succeed quickly.

[mgorman: Minor adjustments to rework on top of a series]
Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 Documentation/mm/balance.rst   |  2 +-
 drivers/iommu/tegra-smmu.c     |  4 ++--
 include/linux/gfp_types.h      | 12 ++++--------
 include/trace/events/mmflags.h |  1 -
 lib/test_printf.c              |  8 ++++----
 mm/internal.h                  |  2 +-
 mm/page_alloc.c                | 13 +++----------
 tools/perf/builtin-kmem.c      |  1 -
 8 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst
index 6a1fadf3e173..e38e9d83c1c7 100644
--- a/Documentation/mm/balance.rst
+++ b/Documentation/mm/balance.rst
@@ -6,7 +6,7 @@ Memory Balancing
 
 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
 
-Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
 well as for non __GFP_IO allocations.
 
 The first reason why a caller may avoid reclaim is that the caller can not
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 5b1af40221ec..af8d0e685260 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	 * allocate page in a sleeping context if GFP flags permit. Hence
 	 * spinlock needs to be unlocked and re-locked after allocation.
 	 */
-	if (!(gfp & __GFP_ATOMIC))
+	if (gfpflags_allow_blocking(gfp))
 		spin_unlock_irqrestore(&as->lock, *flags);
 
 	page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO);
 
-	if (!(gfp & __GFP_ATOMIC))
+	if (gfpflags_allow_blocking(gfp))
 		spin_lock_irqsave(&as->lock, *flags);
 
 	/*
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index d88c46ca82e1..5088637fe5c2 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t;
 #define ___GFP_IO		0x40u
 #define ___GFP_FS		0x80u
 #define ___GFP_ZERO		0x100u
-#define ___GFP_ATOMIC		0x200u
+/* 0x200u unused */
 #define ___GFP_DIRECT_RECLAIM	0x400u
 #define ___GFP_KSWAPD_RECLAIM	0x800u
 #define ___GFP_WRITE		0x1000u
@@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t;
  *
  * %__GFP_HIGH indicates that the caller is high-priority and that granting
  * the request is necessary before the system can make forward progress.
- * For example, creating an IO context to clean pages.
- *
- * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is
- * high priority. Users are typically interrupt handlers. This may be
- * used in conjunction with %__GFP_HIGH
+ * For example creating an IO context to clean pages and requests
+ * from atomic context.
  *
  * %__GFP_MEMALLOC allows access to all memory. This should only be used when
  * the caller guarantees the allocation will allow more memory to be freed
@@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t;
  * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
  * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
  */
-#define __GFP_ATOMIC	((__force gfp_t)___GFP_ATOMIC)
 #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)
 #define __GFP_MEMALLOC	((__force gfp_t)___GFP_MEMALLOC)
 #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC)
@@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t;
  * version does not attempt reclaim/compaction at all and is by default used
  * in page fault path, while the non-light is used by khugepaged.
  */
-#define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
+#define GFP_ATOMIC	(__GFP_HIGH|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
 #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
 #define GFP_NOWAIT	(__GFP_KSWAPD_RECLAIM)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index e87cb2b80ed3..11524cda4a95 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -31,7 +31,6 @@
 	gfpflag_string(__GFP_HIGHMEM),		\
 	gfpflag_string(GFP_DMA32),		\
 	gfpflag_string(__GFP_HIGH),		\
-	gfpflag_string(__GFP_ATOMIC),		\
 	gfpflag_string(__GFP_IO),		\
 	gfpflag_string(__GFP_FS),		\
 	gfpflag_string(__GFP_NOWARN),		\
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 4bd15a593fbd..fe13de1bed5f 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -686,17 +686,17 @@ flags(void)
 	gfp = GFP_ATOMIC|__GFP_DMA;
 	test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp);
 
-	gfp = __GFP_ATOMIC;
-	test("__GFP_ATOMIC", "%pGg", &gfp);
+	gfp = __GFP_HIGH;
+	test("__GFP_HIGH", "%pGg", &gfp);
 
 	/* Any flags not translated by the table should remain numeric */
 	gfp = ~__GFP_BITS_MASK;
 	snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp);
 	test(cmp_buffer, "%pGg", &gfp);
 
-	snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx",
+	snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx",
 							(unsigned long) gfp);
-	gfp |= __GFP_ATOMIC;
+	gfp |= __GFP_HIGH;
 	test(cmp_buffer, "%pGg", &gfp);
 
 	kfree(cmp_buffer);
diff --git a/mm/internal.h b/mm/internal.h
index 98b1e526559d..48926b290cd5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -24,7 +24,7 @@ struct folio_batch;
 #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
 			__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
 			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
-			__GFP_ATOMIC|__GFP_NOLOCKDEP)
+			__GFP_NOLOCKDEP)
 
 /* The GFP flags allowed during early boot */
 #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6bee987ec9a3..ad6c4705a79d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4081,13 +4081,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
 					free_pages))
 		return true;
+
 	/*
-	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
+	 * Ignore watermark boosting for GFP_HIGH order-0 allocations
 	 * when checking the min watermark. The min watermark is the
 	 * point where boosting is ignored so that kswapd is woken up
 	 * when below the low watermark.
 	 */
-	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
+	if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
 		&& ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
 		mark = z->_watermark[WMARK_MIN];
 		return __zone_watermark_ok(z, order, mark, highest_zoneidx,
@@ -5052,14 +5053,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned int zonelist_iter_cookie;
 	int reserve_flags;
 
-	/*
-	 * We also sanity check to catch abuse of atomic reserves being used by
-	 * callers that are not in atomic context.
-	 */
-	if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
-				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
-		gfp_mask &= ~__GFP_ATOMIC;
-
 restart:
 	compaction_retries = 0;
 	no_progress_loops = 0;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index ebfab2ca1702..4a06d83f2ac5 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -640,7 +640,6 @@ static const struct {
 	{ "__GFP_HIGHMEM",		"HM" },
 	{ "GFP_DMA32",			"D32" },
 	{ "__GFP_HIGH",			"H" },
-	{ "__GFP_ATOMIC",		"_A" },
 	{ "__GFP_IO",			"I" },
 	{ "__GFP_FS",			"F" },
 	{ "__GFP_NOWARN",		"NWR" },
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
@ 2022-12-05  5:17   ` NeilBrown
  2022-12-05 10:27     ` Mel Gorman
  2022-12-08 16:51   ` Vlastimil Babka
  1 sibling, 1 reply; 24+ messages in thread
From: NeilBrown @ 2022-12-05  5:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrew Morton, Michal Hocko, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman


Hi Mel,
 thanks a lot for doing this!
 I tried reviewing it but "HIGHATOMIC" is new to me and I quickly got
 lost :-(
 Maybe one day I'll work it out - now that several names are more
 meaningful, it will likely be easier.

Thanks,
NeilBrown

On Wed, 30 Nov 2022, Mel Gorman wrote:
> A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
> is accurate, it changes later in the series. In preparation, explicitly
> record high-order atomic allocations in gfp_to_alloc_flags().
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/internal.h   |  1 +
>  mm/page_alloc.c | 19 +++++++++++++------
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index d503e57a57a1..9a9d9b5ee87f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -754,6 +754,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #else
>  #define ALLOC_NOFRAGMENT	  0x0
>  #endif
> +#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  enum ttu_flags;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index da746e9eb2cf..e2b65767dda0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  		 * reserved for high-order atomic allocation, so order-0
>  		 * request should skip it.
>  		 */
> -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> +		if (alloc_flags & ALLOC_HIGHATOMIC)
>  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
>  		if (!page) {
>  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> @@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  			return true;
>  		}
>  #endif
> -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> +		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
> +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
>  			return true;
> +		}
>  	}
>  	return false;
>  }
> @@ -4291,7 +4293,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  			 * If this is a high-order atomic allocation then check
>  			 * if the pageblock should be reserved for the future
>  			 */
> -			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
> +			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
>  				reserve_highatomic_pageblock(page, zone, order);
>  
>  			return page;
> @@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
>  }
>  
>  static inline unsigned int
> -gfp_to_alloc_flags(gfp_t gfp_mask)
> +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  {
>  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>  
> @@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
>  			alloc_flags |= ALLOC_HARDER;
> +
> +			if (order > 0)
> +				alloc_flags |= ALLOC_HIGHATOMIC;
> +		}
> +
>  		/*
>  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
>  		 * comment for __cpuset_node_allowed().
> @@ -5053,7 +5060,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 * kswapd needs to be woken up, and to avoid the cost of setting up
>  	 * alloc_flags precisely. So we do that now.
>  	 */
> -	alloc_flags = gfp_to_alloc_flags(gfp_mask);
> +	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
>  
>  	/*
>  	 * We need to recalculate the starting point for the zonelist iterator
> -- 
> 2.35.3
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-12-05  5:17   ` NeilBrown
@ 2022-12-05 10:27     ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2022-12-05 10:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linux-MM, Andrew Morton, Michal Hocko, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML

On Mon, Dec 05, 2022 at 04:17:12PM +1100, NeilBrown wrote:
> 
> Hi Mel,
>  thanks a lot for doing this!

My pleasure.

>  I tried reviewing it but "HIGHATOMIC" is new to me and I quickly got
>  lost :-(

That's ok, HIGHATOMIC reserves are obscure and internal to the
allocator. It's almost as obscure as granting access to reserves for RT
tasks with the only difference being a lack of data on what sort of RT
tasks needed extra privileges back in the early 2000's but I happened to
remember why highatomic reserves were introduced. It was introduced when
fragmentation avoidance triggered  high-order atomic allocations failures
that "happened to work" by accident before fragmentation avoidance (details
in 0aaa29a56e4f). IIRC, there were a storm of bugs, mostly on small embedded
platforms where devices without an IOMMU relied on high-order atomics
to work so a small reserve was created. I was concerned your patch would
trigger this class of bug again even though it might take a few years to
show up as embedded platforms tend to stay on old kernels for ages.

The main point of this patch is identifying these high-order atomic
allocations without relying on __GFP_ATOMIC but it should not be necessary
to understand how high-order atomic reserves work. They still work the
same way, access is just granted differently.

>  Maybe one day I'll work it out - now that several names are more
>  meaningful, it will likely be easier.
> 
> > @@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
> >  }
> >  
> >  static inline unsigned int
> > -gfp_to_alloc_flags(gfp_t gfp_mask)
> > +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
> >  {
> >  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
> >  
> > @@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
> >  		 * if it can't schedule.
> >  		 */
> > -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> > +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> >  			alloc_flags |= ALLOC_HARDER;
> > +
> > +			if (order > 0)
> > +				alloc_flags |= ALLOC_HIGHATOMIC;
> > +		}
> > +
> >  		/*
> >  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
> >  		 * comment for __cpuset_node_allowed().

This is the most crucial hunk two hunks of the patch. If the series is
merged and we start seeing high-order atomic allocations failure, the key
will be to look at the gfp_flags and determine if this hunk is enough to
accurately detect high-order atomic allocations. If the GFP flags look ok,
then a tracing debugging patch will be created to dump gfp_flags every
time access is granted to high-atomic reserves to determine if access is
given incorrectly and under what circumstances.

The main concern I had with your original patch was that it was too
easy to grant access to high-atomic reserves for requests that were not
high-order atomics requests which might exhaust the reserves. The rest of
the series tries to improve the naming of the ALLOC flags and what they
mean. The actual changes to your patch are minimal.  I could have started
with your patch and fixed it up but I preferred this ordering to reduce
use of __GFP_ATOMIC and then delete it. It should be bisection safe.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE
  2022-11-29 15:16 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
@ 2022-12-08 16:12   ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2022-12-08 16:12 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On 11/29/22 16:16, Mel Gorman wrote:
> __GFP_HIGH aliases to ALLOC_HIGH but the name does not really hint
> what it means. As ALLOC_HIGH is internal to the allocator, rename
> it to ALLOC_MIN_RESERVE to document that the min reserves can
> be depleted.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to GFP_HIGH
  2022-11-29 15:16 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to GFP_HIGH Mel Gorman
@ 2022-12-08 16:16   ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2022-12-08 16:16 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

Subject should say __GFP_HIGH as there's no GFP_HIGH?

On 11/29/22 16:16, Mel Gorman wrote:
> RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is
> typically combined with ALLOC_MIN_RESERVE so RT tasks are a little
> unusual. While there is some justification for allowing RT tasks
> access to memory reserves, there is a strong chance that a RT task
> that is also under memory pressure is at risk of missing deadlines
> anyway. Relax how much reserves an RT task can access by treating
> it the same as __GFP_HIGH allocations.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3b37909617bc..da746e9eb2cf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4852,7 +4852,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 */
>  		alloc_flags &= ~ALLOC_CPUSET;
>  	} else if (unlikely(rt_task(current)) && in_task())
> -		alloc_flags |= ALLOC_HARDER;
> +		alloc_flags |= ALLOC_MIN_RESERVE;
>  
>  	alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
  2022-12-05  5:17   ` NeilBrown
@ 2022-12-08 16:51   ` Vlastimil Babka
  2023-01-04 11:45     ` Mel Gorman
  1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2022-12-08 16:51 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On 11/29/22 16:16, Mel Gorman wrote:
> A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
> is accurate, it changes later in the series. In preparation, explicitly
> record high-order atomic allocations in gfp_to_alloc_flags().
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/internal.h   |  1 +
>  mm/page_alloc.c | 19 +++++++++++++------
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index d503e57a57a1..9a9d9b5ee87f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -754,6 +754,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #else
>  #define ALLOC_NOFRAGMENT	  0x0
>  #endif
> +#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  enum ttu_flags;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index da746e9eb2cf..e2b65767dda0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  		 * reserved for high-order atomic allocation, so order-0
>  		 * request should skip it.
>  		 */
> -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> +		if (alloc_flags & ALLOC_HIGHATOMIC)
>  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
>  		if (!page) {
>  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> @@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  			return true;
>  		}
>  #endif
> -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> +		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
> +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
>  			return true;

alloc_harder is defined as
	(alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
AFAICS this means we no longer allow ALLOC_OOM to use the highatomic
reserve. Isn't that a risk?

> +		}
>  	}
>  	return false;
>  }
> @@ -4291,7 +4293,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  			 * If this is a high-order atomic allocation then check
>  			 * if the pageblock should be reserved for the future
>  			 */
> -			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
> +			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
>  				reserve_highatomic_pageblock(page, zone, order);
>  
>  			return page;
> @@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
>  }
>  
>  static inline unsigned int
> -gfp_to_alloc_flags(gfp_t gfp_mask)
> +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  {
>  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>  
> @@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
>  			alloc_flags |= ALLOC_HARDER;
> +
> +			if (order > 0)
> +				alloc_flags |= ALLOC_HIGHATOMIC;
> +		}
> +
>  		/*
>  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
>  		 * comment for __cpuset_node_allowed().
> @@ -5053,7 +5060,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 * kswapd needs to be woken up, and to avoid the cost of setting up
>  	 * alloc_flags precisely. So we do that now.
>  	 */
> -	alloc_flags = gfp_to_alloc_flags(gfp_mask);
> +	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
>  
>  	/*
>  	 * We need to recalculate the starting point for the zonelist iterator


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves
  2022-11-29 15:16 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
@ 2022-12-08 17:55   ` Vlastimil Babka
  2023-01-04 12:02     ` Mel Gorman
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2022-12-08 17:55 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On 11/29/22 16:16, Mel Gorman wrote:
> As there are more ALLOC_ flags that affect reserves, define what flags
> affect reserves and clarify the effect of each flag.

Seems to me this does more than a clarification, but also some functional
tweaks, so it could be helpful if those were spelled out in the changelog.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/internal.h   |  3 +++
>  mm/page_alloc.c | 36 +++++++++++++++++++++++-------------
>  2 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 9a9d9b5ee87f..370500718732 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -757,6 +757,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
> +/* Flags that allow allocations below the min watermark. */
> +#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
> +
>  enum ttu_flags;
>  struct tlbflush_unmap_batch;
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e2b65767dda0..85a87d0ac57a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3944,15 +3944,14 @@ ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE);
>  static inline long __zone_watermark_unusable_free(struct zone *z,
>  				unsigned int order, unsigned int alloc_flags)
>  {
> -	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
>  	long unusable_free = (1 << order) - 1;
>  
>  	/*
> -	 * If the caller does not have rights to ALLOC_HARDER then subtract
> -	 * the high-atomic reserves. This will over-estimate the size of the
> -	 * atomic reserve but it avoids a search.
> +	 * If the caller does not have rights to reserves below the min
> +	 * watermark then subtract the high-atomic reserves. This will
> +	 * over-estimate the size of the atomic reserve but it avoids a search.
>  	 */
> -	if (likely(!alloc_harder))
> +	if (likely(!(alloc_flags & ALLOC_RESERVES)))
>  		unusable_free += z->nr_reserved_highatomic;
>  
>  #ifdef CONFIG_CMA
> @@ -3976,25 +3975,36 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  {
>  	long min = mark;
>  	int o;
> -	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
>  
>  	/* free_pages may go negative - that's OK */
>  	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
>  
> -	if (alloc_flags & ALLOC_MIN_RESERVE)
> -		min -= min / 2;
> +	if (alloc_flags & ALLOC_RESERVES) {

Do we want to keep this unlikely() as alloc_harder did before?

> +		/*
> +		 * __GFP_HIGH allows access to 50% of the min reserve as well
> +		 * as OOM.
> +		 */
> +		if (alloc_flags & ALLOC_MIN_RESERVE)
> +			min -= min / 2;
>  
> -	if (unlikely(alloc_harder)) {
>  		/*
> -		 * OOM victims can try even harder than normal ALLOC_HARDER
> +		 * Non-blocking allocations can access some of the reserve
> +		 * with more access if also __GFP_HIGH. The reasoning is that
> +		 * a non-blocking caller may incur a more severe penalty
> +		 * if it cannot get memory quickly, particularly if it's
> +		 * also __GFP_HIGH.
> +		 */
> +		if (alloc_flags & (ALLOC_HARDER|ALLOC_HIGHATOMIC))
> +			min -= min / 4;

For example this seems to change the allowed dip to reserves for
ALLOC_HIGHATOMIC.

> +
> +		/*
> +		 * OOM victims can try even harder than the normal reserve
>  		 * users on the grounds that it's definitely going to be in
>  		 * the exit path shortly and free memory. Any allocation it
>  		 * makes during the free path will be small and short-lived.
>  		 */
>  		if (alloc_flags & ALLOC_OOM)
>  			min -= min / 2;
> -		else
> -			min -= min / 4;
>  	}

(noted that this patch doesn't seem to change the concern I raised in
previous patch)

>  	/*
> @@ -5293,7 +5303,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 * could deplete whole memory reserves which would just make
>  		 * the situation worse
>  		 */
> -		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);

And this AFAICS seems to give __GFP_NOFAIL 3/4 of min reserves instead of
1/4, which seems like a significant change (but hopefully ok) so worth
noting at least.

>  		if (page)
>  			goto got_pg;
>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves
  2022-11-29 15:17 ` [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves Mel Gorman
@ 2022-12-08 18:07   ` Vlastimil Babka
  2023-01-04 12:03     ` Mel Gorman
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2022-12-08 18:07 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On 11/29/22 16:17, Mel Gorman wrote:
> Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> other non-blocking allocation requests equal access to reserve.  Rename
> ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> means.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/internal.h   |  7 +++++--
>  mm/page_alloc.c | 23 +++++++++++++----------
>  2 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 370500718732..98b1e526559d 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -743,7 +743,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_OOM		ALLOC_NO_WATERMARKS
>  #endif
>  
> -#define ALLOC_HARDER		 0x10 /* try to alloc harder */
> +#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
> +				       * to 25% of the min watermark or
> +				       * 62.5% if __GFP_HIGH is set.
> +				       */
>  #define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
>  				       * of the min watermark.
>  				       */
> @@ -758,7 +761,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  /* Flags that allow allocations below the min watermark. */
> -#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
> +#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
>  
>  enum ttu_flags;
>  struct tlbflush_unmap_batch;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 85a87d0ac57a..6bee987ec9a3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3994,7 +3994,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  		 * if it cannot get memory quickly, particularly if it's
>  		 * also __GFP_HIGH.
>  		 */
> -		if (alloc_flags & (ALLOC_HARDER|ALLOC_HIGHATOMIC))
> +		if (alloc_flags & (ALLOC_NON_BLOCK|ALLOC_HIGHATOMIC))
>  			min -= min / 4;
>  
>  		/*
> @@ -4846,28 +4846,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  	 * The caller may dip into page reserves a bit more if the caller
>  	 * cannot run direct reclaim, or if the caller has realtime scheduling
>  	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
> -	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
> +	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
>  	 */
>  	alloc_flags |= (__force int)
>  		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
>  
> -	if (gfp_mask & __GFP_ATOMIC) {
> +	if (!(gfp_mask & __GFP_RECLAIM)) {

This is supposed to be __GFP_DIRECT_RECLAIM right? Otherwise that includes
also __GFP_KSWAPD_RECLAIM and GFP_ATOMIC sets that one...

>  		/*
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
>  		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> -			alloc_flags |= ALLOC_HARDER;
> +			alloc_flags |= ALLOC_NON_BLOCK;
>  
>  			if (order > 0)
>  				alloc_flags |= ALLOC_HIGHATOMIC;
>  		}
>  
>  		/*
> -		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
> -		 * comment for __cpuset_node_allowed().
> +		 * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
> +		 * GFP_ATOMIC) rather than fail, see the comment for
> +		 * __cpuset_node_allowed().
>  		 */
> -		alloc_flags &= ~ALLOC_CPUSET;
> +		if (alloc_flags & ALLOC_MIN_RESERVE)
> +			alloc_flags &= ~ALLOC_CPUSET;
>  	} else if (unlikely(rt_task(current)) && in_task())
>  		alloc_flags |= ALLOC_MIN_RESERVE;
>  
> @@ -5299,11 +5301,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  
>  		/*
>  		 * Help non-failing allocations by giving them access to memory
> -		 * reserves but do not use ALLOC_NO_WATERMARKS because this
> +		 * reserves normally used for high priority non-blocking
> +		 * allocations but do not use ALLOC_NO_WATERMARKS because this
>  		 * could deplete whole memory reserves which would just make
> -		 * the situation worse
> +		 * the situation worse.
>  		 */
> -		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_NON_BLOCK, ac);
>  		if (page)
>  			goto got_pg;
>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/6] mm: discard __GFP_ATOMIC
  2022-11-29 15:17 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
@ 2022-12-08 18:17   ` Vlastimil Babka
  2023-01-04 12:04     ` Mel Gorman
  2023-01-05 13:49   ` Mike Rapoport
  1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2022-12-08 18:17 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On 11/29/22 16:17, Mel Gorman wrote:
> From: NeilBrown <neilb@suse.de>
> 
> __GFP_ATOMIC serves little purpose.  Its main effect is to set
> ALLOC_HARDER which adds a few little boosts to increase the chance of an
> allocation succeeding, one of which is to lower the water-mark at which it
> will succeed.
> 
> It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
> adjusts this watermark.  It is probable that other users of __GFP_HIGH
> should benefit from the other little bonuses that __GFP_ATOMIC gets.
> 
> __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
> There is little point to this.  We already get a might_sleep() warning if
> __GFP_DIRECT_RECLAIM is set.
> 
> __GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
> probable that testing ALLOC_HARDER is a better fit here.
> 
> __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
> sleep.  This should test __GFP_DIRECT_RECLAIM instead.
> 
> This patch:
>  - removes __GFP_ATOMIC
>  - allows __GFP_HIGH allocations to ignore watermark boosting as well
>    as GFP_ATOMIC requests.
>  - makes other adjustments as suggested by the above.
> 
> The net result is not change to GFP_ATOMIC allocations.  Other
> allocations that use __GFP_HIGH will benefit from a few different extra
> privileges.  This affects:
>   xen, dm, md, ntfs3
>   the vermillion frame buffer
>   hibernation
>   ksm
>   swap
> all of which likely produce more benefit than cost if these selected
> allocation are more likely to succeed quickly.
> 
> [mgorman: Minor adjustments to rework on top of a series]
> Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Just a nit below.


> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4081,13 +4081,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
>  	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
>  					free_pages))
>  		return true;
> +
>  	/*
> -	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
> +	 * Ignore watermark boosting for GFP_HIGH order-0 allocations

There's no GFP_HIGH. We could either keep GFP_ATOMIC here if we want to talk
about the high-level flag combo, or __GFP_HIGH if about the low-level
detail. We're deep in the page allocator implementation so the latter would
be OK.

>  	 * when checking the min watermark. The min watermark is the
>  	 * point where boosting is ignored so that kswapd is woken up
>  	 * when below the low watermark.
>  	 */
> -	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
> +	if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-12-08 16:51   ` Vlastimil Babka
@ 2023-01-04 11:45     ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2023-01-04 11:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

First off, sorry for the long delay getting back to you. I was sick for
a few weeks and still catching up. I'm still not 100%.

On Thu, Dec 08, 2022 at 05:51:11PM +0100, Vlastimil Babka wrote:
> On 11/29/22 16:16, Mel Gorman wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index da746e9eb2cf..e2b65767dda0 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> >  		 * reserved for high-order atomic allocation, so order-0
> >  		 * request should skip it.
> >  		 */
> > -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> > +		if (alloc_flags & ALLOC_HIGHATOMIC)
> >  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> >  		if (!page) {
> >  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> > @@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
> >  			return true;
> >  		}
> >  #endif
> > -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> > +		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
> > +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
> >  			return true;
> 
> alloc_harder is defined as
> 	(alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
> AFAICS this means we no longer allow ALLOC_OOM to use the highatomic
> reserve. Isn't that a risk?
> 

Yes, it is. I intend to apply the patch below on top. I didn't alter the
first check for ALLOC_HIGHATOMIC as I wanted OOM handling to only use the
high-order reserves if there was no other option. While this is a change
in behaviour, it should be a harmless one. I'll add a note in the changelog.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 50fc1e7cb154..0ef4f3236a5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3710,6 +3710,16 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
+
+			/*
+			 * If the allocation fails, allow OOM handling access
+			 * to HIGHATOMIC reserves as failing now is worse than
+			 * failing a high-order atomic allocation in the
+			 * future.
+			 */
+			if (!page && (alloc_flags & ALLOC_OOM))
+				page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+
 			if (!page) {
 				spin_unlock_irqrestore(&zone->lock, flags);
 				return NULL;
@@ -4023,7 +4033,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			return true;
 		}
 #endif
-		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
+		if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
 		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
 			return true;
 		}
-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves
  2022-12-08 17:55   ` Vlastimil Babka
@ 2023-01-04 12:02     ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2023-01-04 12:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On Thu, Dec 08, 2022 at 06:55:00PM +0100, Vlastimil Babka wrote:
> On 11/29/22 16:16, Mel Gorman wrote:
> > As there are more ALLOC_ flags that affect reserves, define what flags
> > affect reserves and clarify the effect of each flag.
> 
> Seems to me this does more than a clarification, but also some functional
> tweaks, so it could be helpful if those were spelled out in the changelog.
> 

I will to take out the problematic parts that need clarification. There
are two, one I'll drop and the other will be split. More details below.

> > @@ -3976,25 +3975,36 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
> >  {
> >  	long min = mark;
> >  	int o;
> > -	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
> >  
> >  	/* free_pages may go negative - that's OK */
> >  	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
> >  
> > -	if (alloc_flags & ALLOC_MIN_RESERVE)
> > -		min -= min / 2;
> > +	if (alloc_flags & ALLOC_RESERVES) {
> 
> Do we want to keep this unlikely() as alloc_harder did before?
> 

Added back in.

> > +		/*
> > +		 * __GFP_HIGH allows access to 50% of the min reserve as well
> > +		 * as OOM.
> > +		 */
> > +		if (alloc_flags & ALLOC_MIN_RESERVE)
> > +			min -= min / 2;
> >  
> > -	if (unlikely(alloc_harder)) {
> >  		/*
> > -		 * OOM victims can try even harder than normal ALLOC_HARDER
> > +		 * Non-blocking allocations can access some of the reserve
> > +		 * with more access if also __GFP_HIGH. The reasoning is that
> > +		 * a non-blocking caller may incur a more severe penalty
> > +		 * if it cannot get memory quickly, particularly if it's
> > +		 * also __GFP_HIGH.
> > +		 */
> > +		if (alloc_flags & (ALLOC_HARDER|ALLOC_HIGHATOMIC))
> > +			min -= min / 4;
> 
> For example this seems to change the allowed dip to reserves for
> ALLOC_HIGHATOMIC.
> 

You're right and this could cause problems. If high-order atomic allocation
failures start appearing again, this change would help but it should be
a standalone patch in response to a bug. I'll drop it for now.

> > +
> > +		/*
> > +		 * OOM victims can try even harder than the normal reserve
> >  		 * users on the grounds that it's definitely going to be in
> >  		 * the exit path shortly and free memory. Any allocation it
> >  		 * makes during the free path will be small and short-lived.
> >  		 */
> >  		if (alloc_flags & ALLOC_OOM)
> >  			min -= min / 2;
> > -		else
> > -			min -= min / 4;
> >  	}
> 
> (noted that this patch doesn't seem to change the concern I raised in
> previous patch)
> 

This might be addressed now with the chjanges to the patch that caused
you concerns about OOM handling.

> >  	/*
> > @@ -5293,7 +5303,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  		 * could deplete whole memory reserves which would just make
> >  		 * the situation worse
> >  		 */
> > -		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> > +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
> 
> And this AFAICS seems to give __GFP_NOFAIL 3/4 of min reserves instead of
> 1/4, which seems like a significant change (but hopefully ok) so worth
> noting at least.
> 

It deserves a standalone patch. Below is the diff I intend to apply to
this patch and the standalone patch.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 58e01a31492e..6f41b84a97ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3984,7 +3984,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 	/* free_pages may go negative - that's OK */
 	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
 
-	if (alloc_flags & ALLOC_RESERVES) {
+	if (unlikely(alloc_flags & ALLOC_RESERVES)) {
 		/*
 		 * __GFP_HIGH allows access to 50% of the min reserve as well
 		 * as OOM.
@@ -3999,7 +3999,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		 * if it cannot get memory quickly, particularly if it's
 		 * also __GFP_HIGH.
 		 */
-		if (alloc_flags & (ALLOC_HARDER|ALLOC_HIGHATOMIC))
+		if (alloc_flags & ALLOC_HARDER)
 			min -= min / 4;
 
 		/*
@@ -5308,7 +5308,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * could deplete whole memory reserves which would just make
 		 * the situation worse
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
 		if (page)
 			goto got_pg;
 
The patch to allow __GFP_NOFAIL deeper access is this

--8<--
mm/page_alloc.c: Allow __GFP_NOFAIL requests deeper access to reserves

Currently __GFP_NOFAIL allocations without any other flags can access 25%
of the reserves but these requests imply that the system cannot make forward
progress until the allocation succeeds. Allow __GFP_NOFAIL access to 75%
of the min reserve.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6f41b84a97ac..d2df78f5baa2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5308,7 +5308,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * could deplete whole memory reserves which would just make
 		 * the situation worse
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
 		if (page)
 			goto got_pg;
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves
  2022-12-08 18:07   ` Vlastimil Babka
@ 2023-01-04 12:03     ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2023-01-04 12:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On Thu, Dec 08, 2022 at 07:07:06PM +0100, Vlastimil Babka wrote:
> On 11/29/22 16:17, Mel Gorman wrote:
> > @@ -4846,28 +4846,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
> >  	 * The caller may dip into page reserves a bit more if the caller
> >  	 * cannot run direct reclaim, or if the caller has realtime scheduling
> >  	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
> > -	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
> > +	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
> >  	 */
> >  	alloc_flags |= (__force int)
> >  		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
> >  
> > -	if (gfp_mask & __GFP_ATOMIC) {
> > +	if (!(gfp_mask & __GFP_RECLAIM)) {
> 
> This is supposed to be __GFP_DIRECT_RECLAIM right? Otherwise that includes
> also __GFP_KSWAPD_RECLAIM and GFP_ATOMIC sets that one...
> 

Yes

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b2093d17b48..2217bab2dbb2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4856,7 +4856,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	alloc_flags |= (__force int)
 		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
 
-	if (!(gfp_mask & __GFP_RECLAIM)) {
+	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		/*
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/6] mm: discard __GFP_ATOMIC
  2022-12-08 18:17   ` Vlastimil Babka
@ 2023-01-04 12:04     ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2023-01-04 12:04 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On Thu, Dec 08, 2022 at 07:17:48PM +0100, Vlastimil Babka wrote:
> > @@ -4081,13 +4081,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
> >  	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
> >  					free_pages))
> >  		return true;
> > +
> >  	/*
> > -	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
> > +	 * Ignore watermark boosting for GFP_HIGH order-0 allocations
> 
> There's no GFP_HIGH. We could either keep GFP_ATOMIC here if we want to talk
> about the high-level flag combo, or __GFP_HIGH if about the low-level
> detail. We're deep in the page allocator implementation so the latter would
> be OK.
> 

Fixed

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/6] mm: discard __GFP_ATOMIC
  2022-11-29 15:17 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
  2022-12-08 18:17   ` Vlastimil Babka
@ 2023-01-05 13:49   ` Mike Rapoport
  2023-01-05 21:53     ` NeilBrown
  2023-01-06  9:35     ` Mel Gorman
  1 sibling, 2 replies; 24+ messages in thread
From: Mike Rapoport @ 2023-01-05 13:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML

Hi Mel,

On Tue, Nov 29, 2022 at 03:17:01PM +0000, Mel Gorman wrote:
> From: NeilBrown <neilb@suse.de>
> 
> __GFP_ATOMIC serves little purpose.  Its main effect is to set
> ALLOC_HARDER which adds a few little boosts to increase the chance of an
> allocation succeeding, one of which is to lower the water-mark at which it
> will succeed.
> 
> It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
> adjusts this watermark.  It is probable that other users of __GFP_HIGH
> should benefit from the other little bonuses that __GFP_ATOMIC gets.
> 
> __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
> There is little point to this.  We already get a might_sleep() warning if
> __GFP_DIRECT_RECLAIM is set.
> 
> __GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
> probable that testing ALLOC_HARDER is a better fit here.
> 
> __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
> sleep.  This should test __GFP_DIRECT_RECLAIM instead.
> 
> This patch:
>  - removes __GFP_ATOMIC
>  - allows __GFP_HIGH allocations to ignore watermark boosting as well
>    as GFP_ATOMIC requests.
>  - makes other adjustments as suggested by the above.
> 
> The net result is not change to GFP_ATOMIC allocations.  Other
> allocations that use __GFP_HIGH will benefit from a few different extra
> privileges.  This affects:
>   xen, dm, md, ntfs3
>   the vermillion frame buffer
>   hibernation
>   ksm
>   swap
> all of which likely produce more benefit than cost if these selected
> allocation are more likely to succeed quickly.
> 
> [mgorman: Minor adjustments to rework on top of a series]
> Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  Documentation/mm/balance.rst   |  2 +-

Documentation/core-api/memory-allocation.rst needs an update as well, and
there are other mentions of GFP_ATOMIC in Documentation/

>  drivers/iommu/tegra-smmu.c     |  4 ++--
>  include/linux/gfp_types.h      | 12 ++++--------
>  include/trace/events/mmflags.h |  1 -
>  lib/test_printf.c              |  8 ++++----
>  mm/internal.h                  |  2 +-
>  mm/page_alloc.c                | 13 +++----------
>  tools/perf/builtin-kmem.c      |  1 -
>  8 files changed, 15 insertions(+), 28 deletions(-)
> 
> diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst
> index 6a1fadf3e173..e38e9d83c1c7 100644
> --- a/Documentation/mm/balance.rst
> +++ b/Documentation/mm/balance.rst
> @@ -6,7 +6,7 @@ Memory Balancing
>  
>  Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
>  
> -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
> +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
>  well as for non __GFP_IO allocations.
>  
>  The first reason why a caller may avoid reclaim is that the caller can not
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 5b1af40221ec..af8d0e685260 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
>  	 * allocate page in a sleeping context if GFP flags permit. Hence
>  	 * spinlock needs to be unlocked and re-locked after allocation.
>  	 */
> -	if (!(gfp & __GFP_ATOMIC))
> +	if (gfpflags_allow_blocking(gfp))
>  		spin_unlock_irqrestore(&as->lock, *flags);
>  
>  	page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO);
>  
> -	if (!(gfp & __GFP_ATOMIC))
> +	if (gfpflags_allow_blocking(gfp))
>  		spin_lock_irqsave(&as->lock, *flags);
>  
>  	/*
> diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> index d88c46ca82e1..5088637fe5c2 100644
> --- a/include/linux/gfp_types.h
> +++ b/include/linux/gfp_types.h
> @@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t;
>  #define ___GFP_IO		0x40u
>  #define ___GFP_FS		0x80u
>  #define ___GFP_ZERO		0x100u
> -#define ___GFP_ATOMIC		0x200u
> +/* 0x200u unused */
>  #define ___GFP_DIRECT_RECLAIM	0x400u
>  #define ___GFP_KSWAPD_RECLAIM	0x800u
>  #define ___GFP_WRITE		0x1000u
> @@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t;
>   *
>   * %__GFP_HIGH indicates that the caller is high-priority and that granting
>   * the request is necessary before the system can make forward progress.
> - * For example, creating an IO context to clean pages.
> - *
> - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is
> - * high priority. Users are typically interrupt handlers. This may be
> - * used in conjunction with %__GFP_HIGH
> + * For example creating an IO context to clean pages and requests
> + * from atomic context.
>   *
>   * %__GFP_MEMALLOC allows access to all memory. This should only be used when
>   * the caller guarantees the allocation will allow more memory to be freed
> @@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t;
>   * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
>   * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
>   */
> -#define __GFP_ATOMIC	((__force gfp_t)___GFP_ATOMIC)
>  #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)
>  #define __GFP_MEMALLOC	((__force gfp_t)___GFP_MEMALLOC)
>  #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC)
> @@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t;
>   * version does not attempt reclaim/compaction at all and is by default used
>   * in page fault path, while the non-light is used by khugepaged.
>   */
> -#define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
> +#define GFP_ATOMIC	(__GFP_HIGH|__GFP_KSWAPD_RECLAIM)
>  #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
>  #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
>  #define GFP_NOWAIT	(__GFP_KSWAPD_RECLAIM)
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index e87cb2b80ed3..11524cda4a95 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -31,7 +31,6 @@
>  	gfpflag_string(__GFP_HIGHMEM),		\
>  	gfpflag_string(GFP_DMA32),		\
>  	gfpflag_string(__GFP_HIGH),		\
> -	gfpflag_string(__GFP_ATOMIC),		\
>  	gfpflag_string(__GFP_IO),		\
>  	gfpflag_string(__GFP_FS),		\
>  	gfpflag_string(__GFP_NOWARN),		\
> diff --git a/lib/test_printf.c b/lib/test_printf.c
> index 4bd15a593fbd..fe13de1bed5f 100644
> --- a/lib/test_printf.c
> +++ b/lib/test_printf.c
> @@ -686,17 +686,17 @@ flags(void)
>  	gfp = GFP_ATOMIC|__GFP_DMA;
>  	test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp);
>  
> -	gfp = __GFP_ATOMIC;
> -	test("__GFP_ATOMIC", "%pGg", &gfp);
> +	gfp = __GFP_HIGH;
> +	test("__GFP_HIGH", "%pGg", &gfp);
>  
>  	/* Any flags not translated by the table should remain numeric */
>  	gfp = ~__GFP_BITS_MASK;
>  	snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp);
>  	test(cmp_buffer, "%pGg", &gfp);
>  
> -	snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx",
> +	snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx",
>  							(unsigned long) gfp);
> -	gfp |= __GFP_ATOMIC;
> +	gfp |= __GFP_HIGH;
>  	test(cmp_buffer, "%pGg", &gfp);
>  
>  	kfree(cmp_buffer);
> diff --git a/mm/internal.h b/mm/internal.h
> index 98b1e526559d..48926b290cd5 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -24,7 +24,7 @@ struct folio_batch;
>  #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
>  			__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
>  			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
> -			__GFP_ATOMIC|__GFP_NOLOCKDEP)
> +			__GFP_NOLOCKDEP)
>  
>  /* The GFP flags allowed during early boot */
>  #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6bee987ec9a3..ad6c4705a79d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4081,13 +4081,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
>  	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
>  					free_pages))
>  		return true;
> +
>  	/*
> -	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
> +	 * Ignore watermark boosting for GFP_HIGH order-0 allocations
>  	 * when checking the min watermark. The min watermark is the
>  	 * point where boosting is ignored so that kswapd is woken up
>  	 * when below the low watermark.
>  	 */
> -	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
> +	if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
>  		&& ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
>  		mark = z->_watermark[WMARK_MIN];
>  		return __zone_watermark_ok(z, order, mark, highest_zoneidx,
> @@ -5052,14 +5053,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	unsigned int zonelist_iter_cookie;
>  	int reserve_flags;
>  
> -	/*
> -	 * We also sanity check to catch abuse of atomic reserves being used by
> -	 * callers that are not in atomic context.
> -	 */
> -	if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
> -				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
> -		gfp_mask &= ~__GFP_ATOMIC;
> -
>  restart:
>  	compaction_retries = 0;
>  	no_progress_loops = 0;
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index ebfab2ca1702..4a06d83f2ac5 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -640,7 +640,6 @@ static const struct {
>  	{ "__GFP_HIGHMEM",		"HM" },
>  	{ "GFP_DMA32",			"D32" },
>  	{ "__GFP_HIGH",			"H" },
> -	{ "__GFP_ATOMIC",		"_A" },
>  	{ "__GFP_IO",			"I" },
>  	{ "__GFP_FS",			"F" },
>  	{ "__GFP_NOWARN",		"NWR" },
> -- 
> 2.35.3
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/6] mm: discard __GFP_ATOMIC
  2023-01-05 13:49   ` Mike Rapoport
@ 2023-01-05 21:53     ` NeilBrown
  2023-01-06  9:35     ` Mel Gorman
  1 sibling, 0 replies; 24+ messages in thread
From: NeilBrown @ 2023-01-05 21:53 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mel Gorman, Linux-MM, Andrew Morton, Michal Hocko,
	Thierry Reding, Matthew Wilcox, Vlastimil Babka, LKML

On Fri, 06 Jan 2023, Mike Rapoport wrote:
> Hi Mel,
> 
> On Tue, Nov 29, 2022 at 03:17:01PM +0000, Mel Gorman wrote:
> > From: NeilBrown <neilb@suse.de>
> > 
> > __GFP_ATOMIC serves little purpose.  Its main effect is to set
> > ALLOC_HARDER which adds a few little boosts to increase the chance of an
> > allocation succeeding, one of which is to lower the water-mark at which it
> > will succeed.
> > 
> > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
> > adjusts this watermark.  It is probable that other users of __GFP_HIGH
> > should benefit from the other little bonuses that __GFP_ATOMIC gets.
> > 
> > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
> > There is little point to this.  We already get a might_sleep() warning if
> > __GFP_DIRECT_RECLAIM is set.
> > 
> > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
> > probable that testing ALLOC_HARDER is a better fit here.
> > 
> > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
> > sleep.  This should test __GFP_DIRECT_RECLAIM instead.
> > 
> > This patch:
> >  - removes __GFP_ATOMIC
> >  - allows __GFP_HIGH allocations to ignore watermark boosting as well
> >    as GFP_ATOMIC requests.
> >  - makes other adjustments as suggested by the above.
> > 
> > The net result is not change to GFP_ATOMIC allocations.  Other
> > allocations that use __GFP_HIGH will benefit from a few different extra
> > privileges.  This affects:
> >   xen, dm, md, ntfs3
> >   the vermillion frame buffer
> >   hibernation
> >   ksm
> >   swap
> > all of which likely produce more benefit than cost if these selected
> > allocation are more likely to succeed quickly.
> > 
> > [mgorman: Minor adjustments to rework on top of a series]
> > Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  Documentation/mm/balance.rst   |  2 +-
> 
> Documentation/core-api/memory-allocation.rst needs an update as well, and
> there are other mentions of GFP_ATOMIC in Documentation/

Note that this patch removes __GFP_ATOMIC, but does not change the
behaviour of GFP_ATOMIC.  So I don't believe there is any other
documentation that need changing.

Thanks,
NeilBrown


> 
> >  drivers/iommu/tegra-smmu.c     |  4 ++--
> >  include/linux/gfp_types.h      | 12 ++++--------
> >  include/trace/events/mmflags.h |  1 -
> >  lib/test_printf.c              |  8 ++++----
> >  mm/internal.h                  |  2 +-
> >  mm/page_alloc.c                | 13 +++----------
> >  tools/perf/builtin-kmem.c      |  1 -
> >  8 files changed, 15 insertions(+), 28 deletions(-)
> > 
> > diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst
> > index 6a1fadf3e173..e38e9d83c1c7 100644
> > --- a/Documentation/mm/balance.rst
> > +++ b/Documentation/mm/balance.rst
> > @@ -6,7 +6,7 @@ Memory Balancing
> >  
> >  Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
> >  
> > -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
> > +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
> >  well as for non __GFP_IO allocations.
> >  
> >  The first reason why a caller may avoid reclaim is that the caller can not
> > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> > index 5b1af40221ec..af8d0e685260 100644
> > --- a/drivers/iommu/tegra-smmu.c
> > +++ b/drivers/iommu/tegra-smmu.c
> > @@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
> >  	 * allocate page in a sleeping context if GFP flags permit. Hence
> >  	 * spinlock needs to be unlocked and re-locked after allocation.
> >  	 */
> > -	if (!(gfp & __GFP_ATOMIC))
> > +	if (gfpflags_allow_blocking(gfp))
> >  		spin_unlock_irqrestore(&as->lock, *flags);
> >  
> >  	page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO);
> >  
> > -	if (!(gfp & __GFP_ATOMIC))
> > +	if (gfpflags_allow_blocking(gfp))
> >  		spin_lock_irqsave(&as->lock, *flags);
> >  
> >  	/*
> > diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> > index d88c46ca82e1..5088637fe5c2 100644
> > --- a/include/linux/gfp_types.h
> > +++ b/include/linux/gfp_types.h
> > @@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t;
> >  #define ___GFP_IO		0x40u
> >  #define ___GFP_FS		0x80u
> >  #define ___GFP_ZERO		0x100u
> > -#define ___GFP_ATOMIC		0x200u
> > +/* 0x200u unused */
> >  #define ___GFP_DIRECT_RECLAIM	0x400u
> >  #define ___GFP_KSWAPD_RECLAIM	0x800u
> >  #define ___GFP_WRITE		0x1000u
> > @@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t;
> >   *
> >   * %__GFP_HIGH indicates that the caller is high-priority and that granting
> >   * the request is necessary before the system can make forward progress.
> > - * For example, creating an IO context to clean pages.
> > - *
> > - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is
> > - * high priority. Users are typically interrupt handlers. This may be
> > - * used in conjunction with %__GFP_HIGH
> > + * For example creating an IO context to clean pages and requests
> > + * from atomic context.
> >   *
> >   * %__GFP_MEMALLOC allows access to all memory. This should only be used when
> >   * the caller guarantees the allocation will allow more memory to be freed
> > @@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t;
> >   * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
> >   * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
> >   */
> > -#define __GFP_ATOMIC	((__force gfp_t)___GFP_ATOMIC)
> >  #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)
> >  #define __GFP_MEMALLOC	((__force gfp_t)___GFP_MEMALLOC)
> >  #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC)
> > @@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t;
> >   * version does not attempt reclaim/compaction at all and is by default used
> >   * in page fault path, while the non-light is used by khugepaged.
> >   */
> > -#define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
> > +#define GFP_ATOMIC	(__GFP_HIGH|__GFP_KSWAPD_RECLAIM)
> >  #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> >  #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
> >  #define GFP_NOWAIT	(__GFP_KSWAPD_RECLAIM)
> > diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> > index e87cb2b80ed3..11524cda4a95 100644
> > --- a/include/trace/events/mmflags.h
> > +++ b/include/trace/events/mmflags.h
> > @@ -31,7 +31,6 @@
> >  	gfpflag_string(__GFP_HIGHMEM),		\
> >  	gfpflag_string(GFP_DMA32),		\
> >  	gfpflag_string(__GFP_HIGH),		\
> > -	gfpflag_string(__GFP_ATOMIC),		\
> >  	gfpflag_string(__GFP_IO),		\
> >  	gfpflag_string(__GFP_FS),		\
> >  	gfpflag_string(__GFP_NOWARN),		\
> > diff --git a/lib/test_printf.c b/lib/test_printf.c
> > index 4bd15a593fbd..fe13de1bed5f 100644
> > --- a/lib/test_printf.c
> > +++ b/lib/test_printf.c
> > @@ -686,17 +686,17 @@ flags(void)
> >  	gfp = GFP_ATOMIC|__GFP_DMA;
> >  	test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp);
> >  
> > -	gfp = __GFP_ATOMIC;
> > -	test("__GFP_ATOMIC", "%pGg", &gfp);
> > +	gfp = __GFP_HIGH;
> > +	test("__GFP_HIGH", "%pGg", &gfp);
> >  
> >  	/* Any flags not translated by the table should remain numeric */
> >  	gfp = ~__GFP_BITS_MASK;
> >  	snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp);
> >  	test(cmp_buffer, "%pGg", &gfp);
> >  
> > -	snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx",
> > +	snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx",
> >  							(unsigned long) gfp);
> > -	gfp |= __GFP_ATOMIC;
> > +	gfp |= __GFP_HIGH;
> >  	test(cmp_buffer, "%pGg", &gfp);
> >  
> >  	kfree(cmp_buffer);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 98b1e526559d..48926b290cd5 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -24,7 +24,7 @@ struct folio_batch;
> >  #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
> >  			__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
> >  			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
> > -			__GFP_ATOMIC|__GFP_NOLOCKDEP)
> > +			__GFP_NOLOCKDEP)
> >  
> >  /* The GFP flags allowed during early boot */
> >  #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 6bee987ec9a3..ad6c4705a79d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4081,13 +4081,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
> >  	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
> >  					free_pages))
> >  		return true;
> > +
> >  	/*
> > -	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
> > +	 * Ignore watermark boosting for GFP_HIGH order-0 allocations
> >  	 * when checking the min watermark. The min watermark is the
> >  	 * point where boosting is ignored so that kswapd is woken up
> >  	 * when below the low watermark.
> >  	 */
> > -	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
> > +	if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
> >  		&& ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
> >  		mark = z->_watermark[WMARK_MIN];
> >  		return __zone_watermark_ok(z, order, mark, highest_zoneidx,
> > @@ -5052,14 +5053,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	unsigned int zonelist_iter_cookie;
> >  	int reserve_flags;
> >  
> > -	/*
> > -	 * We also sanity check to catch abuse of atomic reserves being used by
> > -	 * callers that are not in atomic context.
> > -	 */
> > -	if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
> > -				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
> > -		gfp_mask &= ~__GFP_ATOMIC;
> > -
> >  restart:
> >  	compaction_retries = 0;
> >  	no_progress_loops = 0;
> > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> > index ebfab2ca1702..4a06d83f2ac5 100644
> > --- a/tools/perf/builtin-kmem.c
> > +++ b/tools/perf/builtin-kmem.c
> > @@ -640,7 +640,6 @@ static const struct {
> >  	{ "__GFP_HIGHMEM",		"HM" },
> >  	{ "GFP_DMA32",			"D32" },
> >  	{ "__GFP_HIGH",			"H" },
> > -	{ "__GFP_ATOMIC",		"_A" },
> >  	{ "__GFP_IO",			"I" },
> >  	{ "__GFP_FS",			"F" },
> >  	{ "__GFP_NOWARN",		"NWR" },
> > -- 
> > 2.35.3
> > 
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/6] mm: discard __GFP_ATOMIC
  2023-01-05 13:49   ` Mike Rapoport
  2023-01-05 21:53     ` NeilBrown
@ 2023-01-06  9:35     ` Mel Gorman
  2023-01-08  9:30       ` Mike Rapoport
  1 sibling, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2023-01-06  9:35 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML

On Thu, Jan 05, 2023 at 03:49:44PM +0200, Mike Rapoport wrote:
> Hi Mel,
> 
> On Tue, Nov 29, 2022 at 03:17:01PM +0000, Mel Gorman wrote:
> > From: NeilBrown <neilb@suse.de>
> > 
> > __GFP_ATOMIC serves little purpose.  Its main effect is to set
> > ALLOC_HARDER which adds a few little boosts to increase the chance of an
> > allocation succeeding, one of which is to lower the water-mark at which it
> > will succeed.
> > 
> > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
> > adjusts this watermark.  It is probable that other users of __GFP_HIGH
> > should benefit from the other little bonuses that __GFP_ATOMIC gets.
> > 
> > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
> > There is little point to this.  We already get a might_sleep() warning if
> > __GFP_DIRECT_RECLAIM is set.
> > 
> > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
> > probable that testing ALLOC_HARDER is a better fit here.
> > 
> > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
> > sleep.  This should test __GFP_DIRECT_RECLAIM instead.
> > 
> > This patch:
> >  - removes __GFP_ATOMIC
> >  - allows __GFP_HIGH allocations to ignore watermark boosting as well
> >    as GFP_ATOMIC requests.
> >  - makes other adjustments as suggested by the above.
> > 
> > The net result is not change to GFP_ATOMIC allocations.  Other
> > allocations that use __GFP_HIGH will benefit from a few different extra
> > privileges.  This affects:
> >   xen, dm, md, ntfs3
> >   the vermillion frame buffer
> >   hibernation
> >   ksm
> >   swap
> > all of which likely produce more benefit than cost if these selected
> > allocation are more likely to succeed quickly.
> > 
> > [mgorman: Minor adjustments to rework on top of a series]
> > Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  Documentation/mm/balance.rst   |  2 +-
> 
> Documentation/core-api/memory-allocation.rst needs an update as well, and
> there are other mentions of GFP_ATOMIC in Documentation/
> 

What part do you think needs updating in that file?

There are two references to GFP_ATOMIC in that file, one about accessing
reserves and another about non-sleeping allocations and the accuracy
does not change after the series. If anything, this statement should
change because it invites GFP_ATOMIC abuse for spurious reasons

  * If you think that accessing memory reserves is justified and the kernel
    will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.

There are other references to GFP_ATOMIC in Documentation/ that are are a
little inaccurate but not in a way that is harmful and is not changed by
the series. For example;

	GFP_ATOMIC requests are kernel internal allocations that must
	be satisfied, immediately.  The kernel may drop some request,
	in rare cases even panic, if a GFP_ATOMIC alloc fails.

This is a stronger statement than GFP_ATOMIC deserves but it's close enough
in the given context.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/6] mm: discard __GFP_ATOMIC
  2023-01-06  9:35     ` Mel Gorman
@ 2023-01-08  9:30       ` Mike Rapoport
  0 siblings, 0 replies; 24+ messages in thread
From: Mike Rapoport @ 2023-01-08  9:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML

On Fri, Jan 06, 2023 at 09:35:24AM +0000, Mel Gorman wrote:
> On Thu, Jan 05, 2023 at 03:49:44PM +0200, Mike Rapoport wrote:
> > Hi Mel,
> > 
> > On Tue, Nov 29, 2022 at 03:17:01PM +0000, Mel Gorman wrote:
> > > From: NeilBrown <neilb@suse.de>
> > > 
> > > __GFP_ATOMIC serves little purpose.  Its main effect is to set
> > > ALLOC_HARDER which adds a few little boosts to increase the chance of an
> > > allocation succeeding, one of which is to lower the water-mark at which it
> > > will succeed.
> > > 
> > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
> > > adjusts this watermark.  It is probable that other users of __GFP_HIGH
> > > should benefit from the other little bonuses that __GFP_ATOMIC gets.
> > > 
> > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
> > > There is little point to this.  We already get a might_sleep() warning if
> > > __GFP_DIRECT_RECLAIM is set.
> > > 
> > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
> > > probable that testing ALLOC_HARDER is a better fit here.
> > > 
> > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
> > > sleep.  This should test __GFP_DIRECT_RECLAIM instead.
> > > 
> > > This patch:
> > >  - removes __GFP_ATOMIC
> > >  - allows __GFP_HIGH allocations to ignore watermark boosting as well
> > >    as GFP_ATOMIC requests.
> > >  - makes other adjustments as suggested by the above.
> > > 
> > > The net result is not change to GFP_ATOMIC allocations.  Other
> > > allocations that use __GFP_HIGH will benefit from a few different extra
> > > privileges.  This affects:
> > >   xen, dm, md, ntfs3
> > >   the vermillion frame buffer
> > >   hibernation
> > >   ksm
> > >   swap
> > > all of which likely produce more benefit than cost if these selected
> > > allocation are more likely to succeed quickly.
> > > 
> > > [mgorman: Minor adjustments to rework on top of a series]
> > > Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > > ---
> > >  Documentation/mm/balance.rst   |  2 +-
> > 
> > Documentation/core-api/memory-allocation.rst needs an update as well, and
> > there are other mentions of GFP_ATOMIC in Documentation/
> > 
> 
> What part do you think needs updating in that file?
>
> There are two references to GFP_ATOMIC in that file, one about accessing
> reserves and another about non-sleeping allocations and the accuracy
> does not change after the series.

You are right, I got confused.

> If anything, this statement should change because it invites GFP_ATOMIC
> abuse for spurious reasons
> 
>   * If you think that accessing memory reserves is justified and the kernel
>     will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.

Care to send a patch? ;-)
 
> There are other references to GFP_ATOMIC in Documentation/ that are are a
> little inaccurate but not in a way that is harmful and is not changed by
> the series. For example;
> 
> 	GFP_ATOMIC requests are kernel internal allocations that must
> 	be satisfied, immediately.  The kernel may drop some request,
> 	in rare cases even panic, if a GFP_ATOMIC alloc fails.
> 
> This is a stronger statement than GFP_ATOMIC deserves but it's close enough
> in the given context.
> 
> -- 
> Mel Gorman
> SUSE Labs
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 6/6] mm: discard __GFP_ATOMIC
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

From: NeilBrown <neilb@suse.de>

__GFP_ATOMIC serves little purpose.  Its main effect is to set
ALLOC_HARDER which adds a few little boosts to increase the chance of an
allocation succeeding, one of which is to lower the water-mark at which it
will succeed.

It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
adjusts this watermark.  It is probable that other users of __GFP_HIGH
should benefit from the other little bonuses that __GFP_ATOMIC gets.

__GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
There is little point to this.  We already get a might_sleep() warning if
__GFP_DIRECT_RECLAIM is set.

__GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
probable that testing ALLOC_HARDER is a better fit here.

__GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
sleep.  This should test __GFP_DIRECT_RECLAIM instead.

This patch:
 - removes __GFP_ATOMIC
 - allows __GFP_HIGH allocations to ignore watermark boosting as well
   as GFP_ATOMIC requests.
 - makes other adjustments as suggested by the above.

The net result is not change to GFP_ATOMIC allocations.  Other
allocations that use __GFP_HIGH will benefit from a few different extra
privileges.  This affects:
  xen, dm, md, ntfs3
  the vermillion frame buffer
  hibernation
  ksm
  swap
all of which likely produce more benefit than cost if these selected
allocation are more likely to succeed quickly.

[mgorman: Minor adjustments to rework on top of a series]
Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/mm/balance.rst   |  2 +-
 drivers/iommu/tegra-smmu.c     |  4 ++--
 include/linux/gfp_types.h      | 12 ++++--------
 include/trace/events/mmflags.h |  1 -
 lib/test_printf.c              |  8 ++++----
 mm/internal.h                  |  2 +-
 mm/page_alloc.c                | 13 +++----------
 tools/perf/builtin-kmem.c      |  1 -
 8 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst
index 6a1fadf3e173..e38e9d83c1c7 100644
--- a/Documentation/mm/balance.rst
+++ b/Documentation/mm/balance.rst
@@ -6,7 +6,7 @@ Memory Balancing
 
 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
 
-Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
 well as for non __GFP_IO allocations.
 
 The first reason why a caller may avoid reclaim is that the caller can not
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 5b1af40221ec..af8d0e685260 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	 * allocate page in a sleeping context if GFP flags permit. Hence
 	 * spinlock needs to be unlocked and re-locked after allocation.
 	 */
-	if (!(gfp & __GFP_ATOMIC))
+	if (gfpflags_allow_blocking(gfp))
 		spin_unlock_irqrestore(&as->lock, *flags);
 
 	page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO);
 
-	if (!(gfp & __GFP_ATOMIC))
+	if (gfpflags_allow_blocking(gfp))
 		spin_lock_irqsave(&as->lock, *flags);
 
 	/*
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index d88c46ca82e1..5088637fe5c2 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t;
 #define ___GFP_IO		0x40u
 #define ___GFP_FS		0x80u
 #define ___GFP_ZERO		0x100u
-#define ___GFP_ATOMIC		0x200u
+/* 0x200u unused */
 #define ___GFP_DIRECT_RECLAIM	0x400u
 #define ___GFP_KSWAPD_RECLAIM	0x800u
 #define ___GFP_WRITE		0x1000u
@@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t;
  *
  * %__GFP_HIGH indicates that the caller is high-priority and that granting
  * the request is necessary before the system can make forward progress.
- * For example, creating an IO context to clean pages.
- *
- * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is
- * high priority. Users are typically interrupt handlers. This may be
- * used in conjunction with %__GFP_HIGH
+ * For example creating an IO context to clean pages and requests
+ * from atomic context.
  *
  * %__GFP_MEMALLOC allows access to all memory. This should only be used when
  * the caller guarantees the allocation will allow more memory to be freed
@@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t;
  * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
  * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
  */
-#define __GFP_ATOMIC	((__force gfp_t)___GFP_ATOMIC)
 #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)
 #define __GFP_MEMALLOC	((__force gfp_t)___GFP_MEMALLOC)
 #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC)
@@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t;
  * version does not attempt reclaim/compaction at all and is by default used
  * in page fault path, while the non-light is used by khugepaged.
  */
-#define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
+#define GFP_ATOMIC	(__GFP_HIGH|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
 #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
 #define GFP_NOWAIT	(__GFP_KSWAPD_RECLAIM)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 412b5a46374c..9db52bc4ce19 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -31,7 +31,6 @@
 	gfpflag_string(__GFP_HIGHMEM),		\
 	gfpflag_string(GFP_DMA32),		\
 	gfpflag_string(__GFP_HIGH),		\
-	gfpflag_string(__GFP_ATOMIC),		\
 	gfpflag_string(__GFP_IO),		\
 	gfpflag_string(__GFP_FS),		\
 	gfpflag_string(__GFP_NOWARN),		\
diff --git a/lib/test_printf.c b/lib/test_printf.c
index d34dc636b81c..46b4e6c414a3 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -674,17 +674,17 @@ flags(void)
 	gfp = GFP_ATOMIC|__GFP_DMA;
 	test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp);
 
-	gfp = __GFP_ATOMIC;
-	test("__GFP_ATOMIC", "%pGg", &gfp);
+	gfp = __GFP_HIGH;
+	test("__GFP_HIGH", "%pGg", &gfp);
 
 	/* Any flags not translated by the table should remain numeric */
 	gfp = ~__GFP_BITS_MASK;
 	snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp);
 	test(cmp_buffer, "%pGg", &gfp);
 
-	snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx",
+	snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx",
 							(unsigned long) gfp);
-	gfp |= __GFP_ATOMIC;
+	gfp |= __GFP_HIGH;
 	test(cmp_buffer, "%pGg", &gfp);
 
 	kfree(cmp_buffer);
diff --git a/mm/internal.h b/mm/internal.h
index 23a37588073a..71b1111427f3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -24,7 +24,7 @@ struct folio_batch;
 #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
 			__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
 			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
-			__GFP_ATOMIC|__GFP_NOLOCKDEP)
+			__GFP_NOLOCKDEP)
 
 /* The GFP flags allowed during early boot */
 #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9ae0ba0a2ab..78ffebc4798b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4087,13 +4087,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
 					free_pages))
 		return true;
+
 	/*
-	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
+	 * Ignore watermark boosting for __GFP_HIGH order-0 allocations
 	 * when checking the min watermark. The min watermark is the
 	 * point where boosting is ignored so that kswapd is woken up
 	 * when below the low watermark.
 	 */
-	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
+	if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
 		&& ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
 		mark = z->_watermark[WMARK_MIN];
 		return __zone_watermark_ok(z, order, mark, highest_zoneidx,
@@ -5058,14 +5059,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned int zonelist_iter_cookie;
 	int reserve_flags;
 
-	/*
-	 * We also sanity check to catch abuse of atomic reserves being used by
-	 * callers that are not in atomic context.
-	 */
-	if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
-				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
-		gfp_mask &= ~__GFP_ATOMIC;
-
 restart:
 	compaction_retries = 0;
 	no_progress_loops = 0;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index e20656c431a4..173d407dce92 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -641,7 +641,6 @@ static const struct {
 	{ "__GFP_HIGHMEM",		"HM" },
 	{ "GFP_DMA32",			"D32" },
 	{ "__GFP_HIGH",			"H" },
-	{ "__GFP_ATOMIC",		"_A" },
 	{ "__GFP_IO",			"I" },
 	{ "__GFP_FS",			"F" },
 	{ "__GFP_NOWARN",		"NWR" },
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-01-13 11:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
2022-11-29 15:16 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
2022-12-08 16:12   ` Vlastimil Babka
2022-11-29 15:16 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to GFP_HIGH Mel Gorman
2022-12-08 16:16   ` Vlastimil Babka
2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
2022-12-05  5:17   ` NeilBrown
2022-12-05 10:27     ` Mel Gorman
2022-12-08 16:51   ` Vlastimil Babka
2023-01-04 11:45     ` Mel Gorman
2022-11-29 15:16 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
2022-12-08 17:55   ` Vlastimil Babka
2023-01-04 12:02     ` Mel Gorman
2022-11-29 15:17 ` [PATCH 5/6] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves Mel Gorman
2022-12-08 18:07   ` Vlastimil Babka
2023-01-04 12:03     ` Mel Gorman
2022-11-29 15:17 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
2022-12-08 18:17   ` Vlastimil Babka
2023-01-04 12:04     ` Mel Gorman
2023-01-05 13:49   ` Mike Rapoport
2023-01-05 21:53     ` NeilBrown
2023-01-06  9:35     ` Mel Gorman
2023-01-08  9:30       ` Mike Rapoport
2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
2023-01-13 11:12 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).