linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector
@ 2023-05-08  7:11 Kefeng Wang
  2023-05-08  7:11 ` [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c Kefeng Wang
                   ` (11 more replies)
  0 siblings, 12 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

This is aim to reduce more space in page_alloc.c, also do some
cleanup, no functional changes intended.

This is based on next-20230508.

Kefeng Wang (12):
  mm: page_alloc: move mirrored_kernelcore into mm_init.c
  mm: page_alloc: move init_on_alloc/free() into mm_init.c
  mm: page_alloc: move set_zone_contiguous() into mm_init.c
  mm: page_alloc: collect mem statistic into show_mem.c
  mm: page_alloc: squash page_is_consistent()
  mm: page_alloc: remove alloc_contig_dump_pages() stub
  mm: page_alloc: split out FAIL_PAGE_ALLOC
  mm: page_alloc: split out DEBUG_PAGEALLOC
  mm: page_alloc: move mark_free_page() into snapshot.c
  mm: page_alloc: move pm_* function into power
  mm: vmscan: use gfp_has_io_fs()
  mm: page_alloc: move sysctls into it own fils

 include/linux/fault-inject.h   |   9 +
 include/linux/gfp.h            |  15 +-
 include/linux/memory_hotplug.h |   3 -
 include/linux/mm.h             |  87 ++--
 include/linux/mmzone.h         |  21 -
 include/linux/suspend.h        |   9 +-
 kernel/power/main.c            |  27 ++
 kernel/power/power.h           |   5 +
 kernel/power/snapshot.c        |  52 ++
 kernel/sysctl.c                |  67 ---
 lib/Makefile                   |   2 +-
 lib/show_mem.c                 |  37 --
 mm/Makefile                    |   4 +-
 mm/debug_page_alloc.c          |  59 +++
 mm/fail_page_alloc.c           |  66 +++
 mm/internal.h                  |  16 +
 mm/mm_init.c                   |  84 ++++
 mm/page_alloc.c                | 844 ++++-----------------------------
 mm/show_mem.c                  | 429 +++++++++++++++++
 mm/swapfile.c                  |   1 +
 mm/vmscan.c                    |   2 +-
 21 files changed, 902 insertions(+), 937 deletions(-)
 delete mode 100644 lib/show_mem.c
 create mode 100644 mm/debug_page_alloc.c
 create mode 100644 mm/fail_page_alloc.c
 create mode 100644 mm/show_mem.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-09 16:38   ` Mike Rapoport
  2023-05-08  7:11 ` [PATCH 02/12] mm: page_alloc: move init_on_alloc/free() " Kefeng Wang
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

Since commit 9420f89db2dd ("mm: move most of core MM initialization
to mm/mm_init.c"), mirrored_kernelcore should be moved into mm_init.c,
as most related codes are already there.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mm_init.c    | 2 ++
 mm/page_alloc.c | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 7f7f9c677854..da162b7a044c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -259,6 +259,8 @@ static int __init cmdline_parse_core(char *p, unsigned long *core,
 	return 0;
 }
 
+bool mirrored_kernelcore __initdata_memblock;
+
 /*
  * kernelcore=size sets the amount of memory for use for allocations that
  * cannot be reclaimed or migrated.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af9c995d3c1e..d1086aeca8f2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -23,7 +23,6 @@
 #include <linux/interrupt.h>
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
-#include <linux/memblock.h>
 #include <linux/compiler.h>
 #include <linux/kernel.h>
 #include <linux/kasan.h>
@@ -374,8 +373,6 @@ int user_min_free_kbytes = -1;
 int watermark_boost_factor __read_mostly = 15000;
 int watermark_scale_factor = 10;
 
-bool mirrored_kernelcore __initdata_memblock;
-
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
 EXPORT_SYMBOL(movable_zone);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/12] mm: page_alloc: move init_on_alloc/free() into mm_init.c
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
  2023-05-08  7:11 ` [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-09 16:38   ` Mike Rapoport
  2023-05-08  7:11 ` [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() " Kefeng Wang
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

Since commit f2fc4b44ec2b ("mm: move init_mem_debugging_and_hardening()
to mm/mm_init.c"), the init_on_alloc() and init_on_free() define is
better to move there too.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mm_init.c    | 6 ++++++
 mm/page_alloc.c | 5 -----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index da162b7a044c..15201887f8e0 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2543,6 +2543,12 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
 	__free_pages_core(page, order);
 }
 
+DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
+EXPORT_SYMBOL(init_on_alloc);
+
+DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
+EXPORT_SYMBOL(init_on_free);
+
 static bool _init_on_alloc_enabled_early __read_mostly
 				= IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON);
 static int __init early_init_on_alloc(char *buf)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1086aeca8f2..4f094ba7c8fb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -233,11 +233,6 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_high_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
-DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
-EXPORT_SYMBOL(init_on_alloc);
-
-DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
-EXPORT_SYMBOL(init_on_free);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() into mm_init.c
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
  2023-05-08  7:11 ` [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c Kefeng Wang
  2023-05-08  7:11 ` [PATCH 02/12] mm: page_alloc: move init_on_alloc/free() " Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-08  7:12   ` Huang, Ying
  2023-05-10  8:01   ` [PATCH v2 " Kefeng Wang
  2023-05-08  7:11 ` [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c Kefeng Wang
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

set_zone_contiguous() is only used in mm init/hotplug, and
clear_zone_contiguous() only used in hotplug, move them from
page_alloc.c to the more appropriate file.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/memory_hotplug.h |  3 --
 mm/internal.h                  |  7 +++
 mm/mm_init.c                   | 74 +++++++++++++++++++++++++++++++
 mm/page_alloc.c                | 79 ----------------------------------
 4 files changed, 81 insertions(+), 82 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9fcbf5706595..04bc286eed42 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,6 @@ static inline int remove_memory(u64 start, u64 size)
 static inline void __remove_memory(u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
 extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
diff --git a/mm/internal.h b/mm/internal.h
index e28442c0858a..9482862b28cc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -371,6 +371,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
 }
 
+void set_zone_contiguous(struct zone *zone);
+
+static inline void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
+}
+
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 15201887f8e0..1f30b9e16577 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2330,6 +2330,80 @@ void __init init_cma_reserved_pageblock(struct page *page)
 }
 #endif
 
+/*
+ * Check that the whole (or subset of) a pageblock given by the interval of
+ * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
+ * with the migration of free compaction scanner.
+ *
+ * Return struct page pointer of start_pfn, or NULL if checks were not passed.
+ *
+ * It's possible on some configurations to have a setup like node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of pages do not
+ * belong to a single zone. We assume that a border between node0 and node1
+ * can occur within a single pageblock, but not a node0 node1 node0
+ * interleaving within a single pageblock. It is therefore sufficient to check
+ * the first and last page of a pageblock and avoid checking each individual
+ * page in a pageblock.
+ *
+ * Note: the function may return non-NULL struct page even for a page block
+ * which contains a memory hole (i.e. there is no physical memory for a subset
+ * of the pfn range). For example, if the pageblock order is MAX_ORDER, which
+ * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole
+ * even though the start pfn is online and valid. This should be safe most of
+ * the time because struct pages are still initialized via init_unavailable_range()
+ * and pfn walkers shouldn't touch any physical memory range for which they do
+ * not recognize any specific metadata in struct pages.
+ */
+struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				     unsigned long end_pfn, struct zone *zone)
+{
+	struct page *start_page;
+	struct page *end_page;
+
+	/* end_pfn is one past the range we are checking */
+	end_pfn--;
+
+	if (!pfn_valid(end_pfn))
+		return NULL;
+
+	start_page = pfn_to_online_page(start_pfn);
+	if (!start_page)
+		return NULL;
+
+	if (page_zone(start_page) != zone)
+		return NULL;
+
+	end_page = pfn_to_page(end_pfn);
+
+	/* This gives a shorter code than deriving page_zone(end_page) */
+	if (page_zone_id(start_page) != page_zone_id(end_page))
+		return NULL;
+
+	return start_page;
+}
+
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+
+	block_end_pfn = pageblock_end_pfn(block_start_pfn);
+	for (; block_start_pfn < zone_end_pfn(zone);
+			block_start_pfn = block_end_pfn,
+			 block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					     block_end_pfn, zone))
+			return;
+		cond_resched();
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
 void __init page_alloc_init_late(void)
 {
 	struct zone *zone;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4f094ba7c8fb..fe7c1ee5becd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1480,85 +1480,6 @@ void __free_pages_core(struct page *page, unsigned int order)
 	__free_pages_ok(page, order, FPI_TO_TAIL);
 }
 
-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- *
- * Note: the function may return non-NULL struct page even for a page block
- * which contains a memory hole (i.e. there is no physical memory for a subset
- * of the pfn range). For example, if the pageblock order is MAX_ORDER, which
- * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole
- * even though the start pfn is online and valid. This should be safe most of
- * the time because struct pages are still initialized via init_unavailable_range()
- * and pfn walkers shouldn't touch any physical memory range for which they do
- * not recognize any specific metadata in struct pages.
- */
-struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
-				     unsigned long end_pfn, struct zone *zone)
-{
-	struct page *start_page;
-	struct page *end_page;
-
-	/* end_pfn is one past the range we are checking */
-	end_pfn--;
-
-	if (!pfn_valid(end_pfn))
-		return NULL;
-
-	start_page = pfn_to_online_page(start_pfn);
-	if (!start_page)
-		return NULL;
-
-	if (page_zone(start_page) != zone)
-		return NULL;
-
-	end_page = pfn_to_page(end_pfn);
-
-	/* This gives a shorter code than deriving page_zone(end_page) */
-	if (page_zone_id(start_page) != page_zone_id(end_page))
-		return NULL;
-
-	return start_page;
-}
-
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-
-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
-	for (; block_start_pfn < zone_end_pfn(zone);
-			block_start_pfn = block_end_pfn,
-			 block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					     block_end_pfn, zone))
-			return;
-		cond_resched();
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
-void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
 /*
  * The order of subdivision here is critical for the IO subsystem.
  * Please do not alter this order without good reasons and regression
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (2 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() " Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-11  0:04   ` kernel test robot
  2023-05-08  7:11 ` [PATCH 05/12] mm: page_alloc: squash page_is_consistent() Kefeng Wang
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

Let's move show_mem.c from lib to mm, as it belongs memory subsystem,
also split some memory statistic related functions from page_alloc.c
to show_mem.c, and we cleanup some unneeded include.

There is no functional change.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 lib/Makefile    |   2 +-
 lib/show_mem.c  |  37 -----
 mm/Makefile     |   2 +-
 mm/page_alloc.c | 402 ---------------------------------------------
 mm/show_mem.c   | 429 ++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 431 insertions(+), 441 deletions(-)
 delete mode 100644 lib/show_mem.c
 create mode 100644 mm/show_mem.c

diff --git a/lib/Makefile b/lib/Makefile
index 876fcdeae34e..38f23f352736 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -30,7 +30,7 @@ endif
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 maple_tree.o idr.o extable.o irq_regs.o argv_split.o \
-	 flex_proportions.o ratelimit.o show_mem.o \
+	 flex_proportions.o ratelimit.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
 	 nmi_backtrace.o win_minmax.o memcat_p.o \
diff --git a/lib/show_mem.c b/lib/show_mem.c
deleted file mode 100644
index 1485c87be935..000000000000
--- a/lib/show_mem.c
+++ /dev/null
@@ -1,37 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Generic show_mem() implementation
- *
- * Copyright (C) 2008 Johannes Weiner <hannes@saeurebad.de>
- */
-
-#include <linux/mm.h>
-#include <linux/cma.h>
-
-void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
-{
-	unsigned long total = 0, reserved = 0, highmem = 0;
-	struct zone *zone;
-
-	printk("Mem-Info:\n");
-	__show_free_areas(filter, nodemask, max_zone_idx);
-
-	for_each_populated_zone(zone) {
-
-		total += zone->present_pages;
-		reserved += zone->present_pages - zone_managed_pages(zone);
-
-		if (is_highmem(zone))
-			highmem += zone->present_pages;
-	}
-
-	printk("%lu pages RAM\n", total);
-	printk("%lu pages HighMem/MovableOnly\n", highmem);
-	printk("%lu pages reserved\n", reserved);
-#ifdef CONFIG_CMA
-	printk("%lu pages cma reserved\n", totalcma_pages);
-#endif
-#ifdef CONFIG_MEMORY_FAILURE
-	printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
-#endif
-}
diff --git a/mm/Makefile b/mm/Makefile
index e29afc890cde..5262ce5baa28 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -51,7 +51,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   util.o mmzone.o vmstat.o backing-dev.o \
 			   mm_init.o percpu.o slab_common.o \
-			   compaction.o \
+			   compaction.o show_mem.o\
 			   interval_tree.o list_lru.o workingset.o \
 			   debug.o gup.o mmap_lock.o $(mmu-y)
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fe7c1ee5becd..9a85238f1140 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -18,10 +18,7 @@
 #include <linux/stddef.h>
 #include <linux/mm.h>
 #include <linux/highmem.h>
-#include <linux/swap.h>
-#include <linux/swapops.h>
 #include <linux/interrupt.h>
-#include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/compiler.h>
 #include <linux/kernel.h>
@@ -30,8 +27,6 @@
 #include <linux/module.h>
 #include <linux/suspend.h>
 #include <linux/pagevec.h>
-#include <linux/blkdev.h>
-#include <linux/slab.h>
 #include <linux/ratelimit.h>
 #include <linux/oom.h>
 #include <linux/topology.h>
@@ -40,19 +35,10 @@
 #include <linux/cpuset.h>
 #include <linux/memory_hotplug.h>
 #include <linux/nodemask.h>
-#include <linux/vmalloc.h>
 #include <linux/vmstat.h>
-#include <linux/mempolicy.h>
-#include <linux/memremap.h>
-#include <linux/stop_machine.h>
-#include <linux/random.h>
 #include <linux/sort.h>
 #include <linux/pfn.h>
-#include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
-#include <linux/page-isolation.h>
-#include <linux/debugobjects.h>
-#include <linux/kmemleak.h>
 #include <linux/compaction.h>
 #include <trace/events/kmem.h>
 #include <trace/events/oom.h>
@@ -60,12 +46,9 @@
 #include <linux/mm_inline.h>
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
-#include <linux/hugetlb.h>
-#include <linux/sched/rt.h>
 #include <linux/sched/mm.h>
 #include <linux/page_owner.h>
 #include <linux/page_table_check.h>
-#include <linux/kthread.h>
 #include <linux/memcontrol.h>
 #include <linux/ftrace.h>
 #include <linux/lockdep.h>
@@ -73,13 +56,10 @@
 #include <linux/psi.h>
 #include <linux/khugepaged.h>
 #include <linux/delayacct.h>
-#include <asm/sections.h>
-#include <asm/tlbflush.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
 #include "page_reporting.h"
-#include "swap.h"
 
 /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */
 typedef int __bitwise fpi_t;
@@ -226,11 +206,6 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 };
 EXPORT_SYMBOL(node_states);
 
-atomic_long_t _totalram_pages __read_mostly;
-EXPORT_SYMBOL(_totalram_pages);
-unsigned long totalreserve_pages __read_mostly;
-unsigned long totalcma_pages __read_mostly;
-
 int percpu_pagelist_high_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
 
@@ -5050,383 +5025,6 @@ unsigned long nr_free_buffer_pages(void)
 }
 EXPORT_SYMBOL_GPL(nr_free_buffer_pages);
 
-static inline void show_node(struct zone *zone)
-{
-	if (IS_ENABLED(CONFIG_NUMA))
-		printk("Node %d ", zone_to_nid(zone));
-}
-
-long si_mem_available(void)
-{
-	long available;
-	unsigned long pagecache;
-	unsigned long wmark_low = 0;
-	unsigned long pages[NR_LRU_LISTS];
-	unsigned long reclaimable;
-	struct zone *zone;
-	int lru;
-
-	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
-		pages[lru] = global_node_page_state(NR_LRU_BASE + lru);
-
-	for_each_zone(zone)
-		wmark_low += low_wmark_pages(zone);
-
-	/*
-	 * Estimate the amount of memory available for userspace allocations,
-	 * without causing swapping or OOM.
-	 */
-	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
-
-	/*
-	 * Not all the page cache can be freed, otherwise the system will
-	 * start swapping or thrashing. Assume at least half of the page
-	 * cache, or the low watermark worth of cache, needs to stay.
-	 */
-	pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
-	pagecache -= min(pagecache / 2, wmark_low);
-	available += pagecache;
-
-	/*
-	 * Part of the reclaimable slab and other kernel memory consists of
-	 * items that are in use, and cannot be freed. Cap this estimate at the
-	 * low watermark.
-	 */
-	reclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) +
-		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
-	available += reclaimable - min(reclaimable / 2, wmark_low);
-
-	if (available < 0)
-		available = 0;
-	return available;
-}
-EXPORT_SYMBOL_GPL(si_mem_available);
-
-void si_meminfo(struct sysinfo *val)
-{
-	val->totalram = totalram_pages();
-	val->sharedram = global_node_page_state(NR_SHMEM);
-	val->freeram = global_zone_page_state(NR_FREE_PAGES);
-	val->bufferram = nr_blockdev_pages();
-	val->totalhigh = totalhigh_pages();
-	val->freehigh = nr_free_highpages();
-	val->mem_unit = PAGE_SIZE;
-}
-
-EXPORT_SYMBOL(si_meminfo);
-
-#ifdef CONFIG_NUMA
-void si_meminfo_node(struct sysinfo *val, int nid)
-{
-	int zone_type;		/* needs to be signed */
-	unsigned long managed_pages = 0;
-	unsigned long managed_highpages = 0;
-	unsigned long free_highpages = 0;
-	pg_data_t *pgdat = NODE_DATA(nid);
-
-	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-		managed_pages += zone_managed_pages(&pgdat->node_zones[zone_type]);
-	val->totalram = managed_pages;
-	val->sharedram = node_page_state(pgdat, NR_SHMEM);
-	val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES);
-#ifdef CONFIG_HIGHMEM
-	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
-		struct zone *zone = &pgdat->node_zones[zone_type];
-
-		if (is_highmem(zone)) {
-			managed_highpages += zone_managed_pages(zone);
-			free_highpages += zone_page_state(zone, NR_FREE_PAGES);
-		}
-	}
-	val->totalhigh = managed_highpages;
-	val->freehigh = free_highpages;
-#else
-	val->totalhigh = managed_highpages;
-	val->freehigh = free_highpages;
-#endif
-	val->mem_unit = PAGE_SIZE;
-}
-#endif
-
-/*
- * Determine whether the node should be displayed or not, depending on whether
- * SHOW_MEM_FILTER_NODES was passed to show_free_areas().
- */
-static bool show_mem_node_skip(unsigned int flags, int nid, nodemask_t *nodemask)
-{
-	if (!(flags & SHOW_MEM_FILTER_NODES))
-		return false;
-
-	/*
-	 * no node mask - aka implicit memory numa policy. Do not bother with
-	 * the synchronization - read_mems_allowed_begin - because we do not
-	 * have to be precise here.
-	 */
-	if (!nodemask)
-		nodemask = &cpuset_current_mems_allowed;
-
-	return !node_isset(nid, *nodemask);
-}
-
-static void show_migration_types(unsigned char type)
-{
-	static const char types[MIGRATE_TYPES] = {
-		[MIGRATE_UNMOVABLE]	= 'U',
-		[MIGRATE_MOVABLE]	= 'M',
-		[MIGRATE_RECLAIMABLE]	= 'E',
-		[MIGRATE_HIGHATOMIC]	= 'H',
-#ifdef CONFIG_CMA
-		[MIGRATE_CMA]		= 'C',
-#endif
-#ifdef CONFIG_MEMORY_ISOLATION
-		[MIGRATE_ISOLATE]	= 'I',
-#endif
-	};
-	char tmp[MIGRATE_TYPES + 1];
-	char *p = tmp;
-	int i;
-
-	for (i = 0; i < MIGRATE_TYPES; i++) {
-		if (type & (1 << i))
-			*p++ = types[i];
-	}
-
-	*p = '\0';
-	printk(KERN_CONT "(%s) ", tmp);
-}
-
-static bool node_has_managed_zones(pg_data_t *pgdat, int max_zone_idx)
-{
-	int zone_idx;
-	for (zone_idx = 0; zone_idx <= max_zone_idx; zone_idx++)
-		if (zone_managed_pages(pgdat->node_zones + zone_idx))
-			return true;
-	return false;
-}
-
-/*
- * Show free area list (used inside shift_scroll-lock stuff)
- * We also calculate the percentage fragmentation. We do this by counting the
- * memory on each free list with the exception of the first item on the list.
- *
- * Bits in @filter:
- * SHOW_MEM_FILTER_NODES: suppress nodes that are not allowed by current's
- *   cpuset.
- */
-void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
-{
-	unsigned long free_pcp = 0;
-	int cpu, nid;
-	struct zone *zone;
-	pg_data_t *pgdat;
-
-	for_each_populated_zone(zone) {
-		if (zone_idx(zone) > max_zone_idx)
-			continue;
-		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
-			continue;
-
-		for_each_online_cpu(cpu)
-			free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
-	}
-
-	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
-		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
-		" unevictable:%lu dirty:%lu writeback:%lu\n"
-		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu\n"
-		" sec_pagetables:%lu bounce:%lu\n"
-		" kernel_misc_reclaimable:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
-		global_node_page_state(NR_ACTIVE_ANON),
-		global_node_page_state(NR_INACTIVE_ANON),
-		global_node_page_state(NR_ISOLATED_ANON),
-		global_node_page_state(NR_ACTIVE_FILE),
-		global_node_page_state(NR_INACTIVE_FILE),
-		global_node_page_state(NR_ISOLATED_FILE),
-		global_node_page_state(NR_UNEVICTABLE),
-		global_node_page_state(NR_FILE_DIRTY),
-		global_node_page_state(NR_WRITEBACK),
-		global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B),
-		global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B),
-		global_node_page_state(NR_FILE_MAPPED),
-		global_node_page_state(NR_SHMEM),
-		global_node_page_state(NR_PAGETABLE),
-		global_node_page_state(NR_SECONDARY_PAGETABLE),
-		global_zone_page_state(NR_BOUNCE),
-		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
-		global_zone_page_state(NR_FREE_PAGES),
-		free_pcp,
-		global_zone_page_state(NR_FREE_CMA_PAGES));
-
-	for_each_online_pgdat(pgdat) {
-		if (show_mem_node_skip(filter, pgdat->node_id, nodemask))
-			continue;
-		if (!node_has_managed_zones(pgdat, max_zone_idx))
-			continue;
-
-		printk("Node %d"
-			" active_anon:%lukB"
-			" inactive_anon:%lukB"
-			" active_file:%lukB"
-			" inactive_file:%lukB"
-			" unevictable:%lukB"
-			" isolated(anon):%lukB"
-			" isolated(file):%lukB"
-			" mapped:%lukB"
-			" dirty:%lukB"
-			" writeback:%lukB"
-			" shmem:%lukB"
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-			" shmem_thp: %lukB"
-			" shmem_pmdmapped: %lukB"
-			" anon_thp: %lukB"
-#endif
-			" writeback_tmp:%lukB"
-			" kernel_stack:%lukB"
-#ifdef CONFIG_SHADOW_CALL_STACK
-			" shadow_call_stack:%lukB"
-#endif
-			" pagetables:%lukB"
-			" sec_pagetables:%lukB"
-			" all_unreclaimable? %s"
-			"\n",
-			pgdat->node_id,
-			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
-			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
-			K(node_page_state(pgdat, NR_ACTIVE_FILE)),
-			K(node_page_state(pgdat, NR_INACTIVE_FILE)),
-			K(node_page_state(pgdat, NR_UNEVICTABLE)),
-			K(node_page_state(pgdat, NR_ISOLATED_ANON)),
-			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
-			K(node_page_state(pgdat, NR_FILE_MAPPED)),
-			K(node_page_state(pgdat, NR_FILE_DIRTY)),
-			K(node_page_state(pgdat, NR_WRITEBACK)),
-			K(node_page_state(pgdat, NR_SHMEM)),
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-			K(node_page_state(pgdat, NR_SHMEM_THPS)),
-			K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)),
-			K(node_page_state(pgdat, NR_ANON_THPS)),
-#endif
-			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
-			node_page_state(pgdat, NR_KERNEL_STACK_KB),
-#ifdef CONFIG_SHADOW_CALL_STACK
-			node_page_state(pgdat, NR_KERNEL_SCS_KB),
-#endif
-			K(node_page_state(pgdat, NR_PAGETABLE)),
-			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
-			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
-				"yes" : "no");
-	}
-
-	for_each_populated_zone(zone) {
-		int i;
-
-		if (zone_idx(zone) > max_zone_idx)
-			continue;
-		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
-			continue;
-
-		free_pcp = 0;
-		for_each_online_cpu(cpu)
-			free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
-
-		show_node(zone);
-		printk(KERN_CONT
-			"%s"
-			" free:%lukB"
-			" boost:%lukB"
-			" min:%lukB"
-			" low:%lukB"
-			" high:%lukB"
-			" reserved_highatomic:%luKB"
-			" active_anon:%lukB"
-			" inactive_anon:%lukB"
-			" active_file:%lukB"
-			" inactive_file:%lukB"
-			" unevictable:%lukB"
-			" writepending:%lukB"
-			" present:%lukB"
-			" managed:%lukB"
-			" mlocked:%lukB"
-			" bounce:%lukB"
-			" free_pcp:%lukB"
-			" local_pcp:%ukB"
-			" free_cma:%lukB"
-			"\n",
-			zone->name,
-			K(zone_page_state(zone, NR_FREE_PAGES)),
-			K(zone->watermark_boost),
-			K(min_wmark_pages(zone)),
-			K(low_wmark_pages(zone)),
-			K(high_wmark_pages(zone)),
-			K(zone->nr_reserved_highatomic),
-			K(zone_page_state(zone, NR_ZONE_ACTIVE_ANON)),
-			K(zone_page_state(zone, NR_ZONE_INACTIVE_ANON)),
-			K(zone_page_state(zone, NR_ZONE_ACTIVE_FILE)),
-			K(zone_page_state(zone, NR_ZONE_INACTIVE_FILE)),
-			K(zone_page_state(zone, NR_ZONE_UNEVICTABLE)),
-			K(zone_page_state(zone, NR_ZONE_WRITE_PENDING)),
-			K(zone->present_pages),
-			K(zone_managed_pages(zone)),
-			K(zone_page_state(zone, NR_MLOCK)),
-			K(zone_page_state(zone, NR_BOUNCE)),
-			K(free_pcp),
-			K(this_cpu_read(zone->per_cpu_pageset->count)),
-			K(zone_page_state(zone, NR_FREE_CMA_PAGES)));
-		printk("lowmem_reserve[]:");
-		for (i = 0; i < MAX_NR_ZONES; i++)
-			printk(KERN_CONT " %ld", zone->lowmem_reserve[i]);
-		printk(KERN_CONT "\n");
-	}
-
-	for_each_populated_zone(zone) {
-		unsigned int order;
-		unsigned long nr[MAX_ORDER + 1], flags, total = 0;
-		unsigned char types[MAX_ORDER + 1];
-
-		if (zone_idx(zone) > max_zone_idx)
-			continue;
-		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
-			continue;
-		show_node(zone);
-		printk(KERN_CONT "%s: ", zone->name);
-
-		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order <= MAX_ORDER; order++) {
-			struct free_area *area = &zone->free_area[order];
-			int type;
-
-			nr[order] = area->nr_free;
-			total += nr[order] << order;
-
-			types[order] = 0;
-			for (type = 0; type < MIGRATE_TYPES; type++) {
-				if (!free_area_empty(area, type))
-					types[order] |= 1 << type;
-			}
-		}
-		spin_unlock_irqrestore(&zone->lock, flags);
-		for (order = 0; order <= MAX_ORDER; order++) {
-			printk(KERN_CONT "%lu*%lukB ",
-			       nr[order], K(1UL) << order);
-			if (nr[order])
-				show_migration_types(types[order]);
-		}
-		printk(KERN_CONT "= %lukB\n", K(total));
-	}
-
-	for_each_online_node(nid) {
-		if (show_mem_node_skip(filter, nid, nodemask))
-			continue;
-		hugetlb_show_meminfo_node(nid);
-	}
-
-	printk("%ld total pagecache pages\n", global_node_page_state(NR_FILE_PAGES));
-
-	show_swap_cache_info();
-}
-
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
diff --git a/mm/show_mem.c b/mm/show_mem.c
new file mode 100644
index 000000000000..9f1a5d8b03d1
--- /dev/null
+++ b/mm/show_mem.c
@@ -0,0 +1,429 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Generic show_mem() implementation
+ *
+ * Copyright (C) 2008 Johannes Weiner <hannes@saeurebad.de>
+ */
+
+#include <linux/blkdev.h>
+#include <linux/cma.h>
+#include <linux/cpuset.h>
+#include <linux/highmem.h>
+#include <linux/hugetlb.h>
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/swap.h>
+#include <linux/vmstat.h>
+
+#include "internal.h"
+#include "swap.h"
+
+atomic_long_t _totalram_pages __read_mostly;
+EXPORT_SYMBOL(_totalram_pages);
+unsigned long totalreserve_pages __read_mostly;
+unsigned long totalcma_pages __read_mostly;
+
+void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
+{
+	unsigned long total = 0, reserved = 0, highmem = 0;
+	struct zone *zone;
+
+	printk("Mem-Info:\n");
+	__show_free_areas(filter, nodemask, max_zone_idx);
+
+	for_each_populated_zone(zone) {
+
+		total += zone->present_pages;
+		reserved += zone->present_pages - zone_managed_pages(zone);
+
+		if (is_highmem(zone))
+			highmem += zone->present_pages;
+	}
+
+	printk("%lu pages RAM\n", total);
+	printk("%lu pages HighMem/MovableOnly\n", highmem);
+	printk("%lu pages reserved\n", reserved);
+#ifdef CONFIG_CMA
+	printk("%lu pages cma reserved\n", totalcma_pages);
+#endif
+#ifdef CONFIG_MEMORY_FAILURE
+	printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
+#endif
+}
+
+static inline void show_node(struct zone *zone)
+{
+	if (IS_ENABLED(CONFIG_NUMA))
+		printk("Node %d ", zone_to_nid(zone));
+}
+
+long si_mem_available(void)
+{
+	long available;
+	unsigned long pagecache;
+	unsigned long wmark_low = 0;
+	unsigned long pages[NR_LRU_LISTS];
+	unsigned long reclaimable;
+	struct zone *zone;
+	int lru;
+
+	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
+		pages[lru] = global_node_page_state(NR_LRU_BASE + lru);
+
+	for_each_zone(zone)
+		wmark_low += low_wmark_pages(zone);
+
+	/*
+	 * Estimate the amount of memory available for userspace allocations,
+	 * without causing swapping or OOM.
+	 */
+	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
+
+	/*
+	 * Not all the page cache can be freed, otherwise the system will
+	 * start swapping or thrashing. Assume at least half of the page
+	 * cache, or the low watermark worth of cache, needs to stay.
+	 */
+	pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
+	pagecache -= min(pagecache / 2, wmark_low);
+	available += pagecache;
+
+	/*
+	 * Part of the reclaimable slab and other kernel memory consists of
+	 * items that are in use, and cannot be freed. Cap this estimate at the
+	 * low watermark.
+	 */
+	reclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) +
+		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
+	available += reclaimable - min(reclaimable / 2, wmark_low);
+
+	if (available < 0)
+		available = 0;
+	return available;
+}
+EXPORT_SYMBOL_GPL(si_mem_available);
+
+void si_meminfo(struct sysinfo *val)
+{
+	val->totalram = totalram_pages();
+	val->sharedram = global_node_page_state(NR_SHMEM);
+	val->freeram = global_zone_page_state(NR_FREE_PAGES);
+	val->bufferram = nr_blockdev_pages();
+	val->totalhigh = totalhigh_pages();
+	val->freehigh = nr_free_highpages();
+	val->mem_unit = PAGE_SIZE;
+}
+
+EXPORT_SYMBOL(si_meminfo);
+
+#ifdef CONFIG_NUMA
+void si_meminfo_node(struct sysinfo *val, int nid)
+{
+	int zone_type;		/* needs to be signed */
+	unsigned long managed_pages = 0;
+	unsigned long managed_highpages = 0;
+	unsigned long free_highpages = 0;
+	pg_data_t *pgdat = NODE_DATA(nid);
+
+	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
+		managed_pages += zone_managed_pages(&pgdat->node_zones[zone_type]);
+	val->totalram = managed_pages;
+	val->sharedram = node_page_state(pgdat, NR_SHMEM);
+	val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES);
+#ifdef CONFIG_HIGHMEM
+	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
+		struct zone *zone = &pgdat->node_zones[zone_type];
+
+		if (is_highmem(zone)) {
+			managed_highpages += zone_managed_pages(zone);
+			free_highpages += zone_page_state(zone, NR_FREE_PAGES);
+		}
+	}
+	val->totalhigh = managed_highpages;
+	val->freehigh = free_highpages;
+#else
+	val->totalhigh = managed_highpages;
+	val->freehigh = free_highpages;
+#endif
+	val->mem_unit = PAGE_SIZE;
+}
+#endif
+
+/*
+ * Determine whether the node should be displayed or not, depending on whether
+ * SHOW_MEM_FILTER_NODES was passed to show_free_areas().
+ */
+static bool show_mem_node_skip(unsigned int flags, int nid, nodemask_t *nodemask)
+{
+	if (!(flags & SHOW_MEM_FILTER_NODES))
+		return false;
+
+	/*
+	 * no node mask - aka implicit memory numa policy. Do not bother with
+	 * the synchronization - read_mems_allowed_begin - because we do not
+	 * have to be precise here.
+	 */
+	if (!nodemask)
+		nodemask = &cpuset_current_mems_allowed;
+
+	return !node_isset(nid, *nodemask);
+}
+
+static void show_migration_types(unsigned char type)
+{
+	static const char types[MIGRATE_TYPES] = {
+		[MIGRATE_UNMOVABLE]	= 'U',
+		[MIGRATE_MOVABLE]	= 'M',
+		[MIGRATE_RECLAIMABLE]	= 'E',
+		[MIGRATE_HIGHATOMIC]	= 'H',
+#ifdef CONFIG_CMA
+		[MIGRATE_CMA]		= 'C',
+#endif
+#ifdef CONFIG_MEMORY_ISOLATION
+		[MIGRATE_ISOLATE]	= 'I',
+#endif
+	};
+	char tmp[MIGRATE_TYPES + 1];
+	char *p = tmp;
+	int i;
+
+	for (i = 0; i < MIGRATE_TYPES; i++) {
+		if (type & (1 << i))
+			*p++ = types[i];
+	}
+
+	*p = '\0';
+	printk(KERN_CONT "(%s) ", tmp);
+}
+
+static bool node_has_managed_zones(pg_data_t *pgdat, int max_zone_idx)
+{
+	int zone_idx;
+	for (zone_idx = 0; zone_idx <= max_zone_idx; zone_idx++)
+		if (zone_managed_pages(pgdat->node_zones + zone_idx))
+			return true;
+	return false;
+}
+
+/*
+ * Show free area list (used inside shift_scroll-lock stuff)
+ * We also calculate the percentage fragmentation. We do this by counting the
+ * memory on each free list with the exception of the first item on the list.
+ *
+ * Bits in @filter:
+ * SHOW_MEM_FILTER_NODES: suppress nodes that are not allowed by current's
+ *   cpuset.
+ */
+void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
+{
+	unsigned long free_pcp = 0;
+	int cpu, nid;
+	struct zone *zone;
+	pg_data_t *pgdat;
+
+	for_each_populated_zone(zone) {
+		if (zone_idx(zone) > max_zone_idx)
+			continue;
+		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
+			continue;
+
+		for_each_online_cpu(cpu)
+			free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
+	}
+
+	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
+		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
+		" unevictable:%lu dirty:%lu writeback:%lu\n"
+		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
+		" mapped:%lu shmem:%lu pagetables:%lu\n"
+		" sec_pagetables:%lu bounce:%lu\n"
+		" kernel_misc_reclaimable:%lu\n"
+		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		global_node_page_state(NR_ACTIVE_ANON),
+		global_node_page_state(NR_INACTIVE_ANON),
+		global_node_page_state(NR_ISOLATED_ANON),
+		global_node_page_state(NR_ACTIVE_FILE),
+		global_node_page_state(NR_INACTIVE_FILE),
+		global_node_page_state(NR_ISOLATED_FILE),
+		global_node_page_state(NR_UNEVICTABLE),
+		global_node_page_state(NR_FILE_DIRTY),
+		global_node_page_state(NR_WRITEBACK),
+		global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B),
+		global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B),
+		global_node_page_state(NR_FILE_MAPPED),
+		global_node_page_state(NR_SHMEM),
+		global_node_page_state(NR_PAGETABLE),
+		global_node_page_state(NR_SECONDARY_PAGETABLE),
+		global_zone_page_state(NR_BOUNCE),
+		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
+		global_zone_page_state(NR_FREE_PAGES),
+		free_pcp,
+		global_zone_page_state(NR_FREE_CMA_PAGES));
+
+	for_each_online_pgdat(pgdat) {
+		if (show_mem_node_skip(filter, pgdat->node_id, nodemask))
+			continue;
+		if (!node_has_managed_zones(pgdat, max_zone_idx))
+			continue;
+
+		printk("Node %d"
+			" active_anon:%lukB"
+			" inactive_anon:%lukB"
+			" active_file:%lukB"
+			" inactive_file:%lukB"
+			" unevictable:%lukB"
+			" isolated(anon):%lukB"
+			" isolated(file):%lukB"
+			" mapped:%lukB"
+			" dirty:%lukB"
+			" writeback:%lukB"
+			" shmem:%lukB"
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			" shmem_thp: %lukB"
+			" shmem_pmdmapped: %lukB"
+			" anon_thp: %lukB"
+#endif
+			" writeback_tmp:%lukB"
+			" kernel_stack:%lukB"
+#ifdef CONFIG_SHADOW_CALL_STACK
+			" shadow_call_stack:%lukB"
+#endif
+			" pagetables:%lukB"
+			" sec_pagetables:%lukB"
+			" all_unreclaimable? %s"
+			"\n",
+			pgdat->node_id,
+			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
+			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
+			K(node_page_state(pgdat, NR_ACTIVE_FILE)),
+			K(node_page_state(pgdat, NR_INACTIVE_FILE)),
+			K(node_page_state(pgdat, NR_UNEVICTABLE)),
+			K(node_page_state(pgdat, NR_ISOLATED_ANON)),
+			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
+			K(node_page_state(pgdat, NR_FILE_MAPPED)),
+			K(node_page_state(pgdat, NR_FILE_DIRTY)),
+			K(node_page_state(pgdat, NR_WRITEBACK)),
+			K(node_page_state(pgdat, NR_SHMEM)),
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			K(node_page_state(pgdat, NR_SHMEM_THPS)),
+			K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)),
+			K(node_page_state(pgdat, NR_ANON_THPS)),
+#endif
+			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
+			node_page_state(pgdat, NR_KERNEL_STACK_KB),
+#ifdef CONFIG_SHADOW_CALL_STACK
+			node_page_state(pgdat, NR_KERNEL_SCS_KB),
+#endif
+			K(node_page_state(pgdat, NR_PAGETABLE)),
+			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
+			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
+				"yes" : "no");
+	}
+
+	for_each_populated_zone(zone) {
+		int i;
+
+		if (zone_idx(zone) > max_zone_idx)
+			continue;
+		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
+			continue;
+
+		free_pcp = 0;
+		for_each_online_cpu(cpu)
+			free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
+
+		show_node(zone);
+		printk(KERN_CONT
+			"%s"
+			" free:%lukB"
+			" boost:%lukB"
+			" min:%lukB"
+			" low:%lukB"
+			" high:%lukB"
+			" reserved_highatomic:%luKB"
+			" active_anon:%lukB"
+			" inactive_anon:%lukB"
+			" active_file:%lukB"
+			" inactive_file:%lukB"
+			" unevictable:%lukB"
+			" writepending:%lukB"
+			" present:%lukB"
+			" managed:%lukB"
+			" mlocked:%lukB"
+			" bounce:%lukB"
+			" free_pcp:%lukB"
+			" local_pcp:%ukB"
+			" free_cma:%lukB"
+			"\n",
+			zone->name,
+			K(zone_page_state(zone, NR_FREE_PAGES)),
+			K(zone->watermark_boost),
+			K(min_wmark_pages(zone)),
+			K(low_wmark_pages(zone)),
+			K(high_wmark_pages(zone)),
+			K(zone->nr_reserved_highatomic),
+			K(zone_page_state(zone, NR_ZONE_ACTIVE_ANON)),
+			K(zone_page_state(zone, NR_ZONE_INACTIVE_ANON)),
+			K(zone_page_state(zone, NR_ZONE_ACTIVE_FILE)),
+			K(zone_page_state(zone, NR_ZONE_INACTIVE_FILE)),
+			K(zone_page_state(zone, NR_ZONE_UNEVICTABLE)),
+			K(zone_page_state(zone, NR_ZONE_WRITE_PENDING)),
+			K(zone->present_pages),
+			K(zone_managed_pages(zone)),
+			K(zone_page_state(zone, NR_MLOCK)),
+			K(zone_page_state(zone, NR_BOUNCE)),
+			K(free_pcp),
+			K(this_cpu_read(zone->per_cpu_pageset->count)),
+			K(zone_page_state(zone, NR_FREE_CMA_PAGES)));
+		printk("lowmem_reserve[]:");
+		for (i = 0; i < MAX_NR_ZONES; i++)
+			printk(KERN_CONT " %ld", zone->lowmem_reserve[i]);
+		printk(KERN_CONT "\n");
+	}
+
+	for_each_populated_zone(zone) {
+		unsigned int order;
+		unsigned long nr[MAX_ORDER + 1], flags, total = 0;
+		unsigned char types[MAX_ORDER + 1];
+
+		if (zone_idx(zone) > max_zone_idx)
+			continue;
+		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
+			continue;
+		show_node(zone);
+		printk(KERN_CONT "%s: ", zone->name);
+
+		spin_lock_irqsave(&zone->lock, flags);
+		for (order = 0; order <= MAX_ORDER; order++) {
+			struct free_area *area = &zone->free_area[order];
+			int type;
+
+			nr[order] = area->nr_free;
+			total += nr[order] << order;
+
+			types[order] = 0;
+			for (type = 0; type < MIGRATE_TYPES; type++) {
+				if (!free_area_empty(area, type))
+					types[order] |= 1 << type;
+			}
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+		for (order = 0; order <= MAX_ORDER; order++) {
+			printk(KERN_CONT "%lu*%lukB ",
+			       nr[order], K(1UL) << order);
+			if (nr[order])
+				show_migration_types(types[order]);
+		}
+		printk(KERN_CONT "= %lukB\n", K(total));
+	}
+
+	for_each_online_node(nid) {
+		if (show_mem_node_skip(filter, nid, nodemask))
+			continue;
+		hugetlb_show_meminfo_node(nid);
+	}
+
+	printk("%ld total pagecache pages\n", global_node_page_state(NR_FILE_PAGES));
+
+	show_swap_cache_info();
+}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/12] mm: page_alloc: squash page_is_consistent()
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (3 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-09 16:43   ` Mike Rapoport
  2023-05-08  7:11 ` [PATCH 06/12] mm: page_alloc: remove alloc_contig_dump_pages() stub Kefeng Wang
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

Squash the page_is_consistent() into bad_range() as there is
only one caller.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/page_alloc.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9a85238f1140..348dcbaca757 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -517,13 +517,6 @@ static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 	return ret;
 }
 
-static int page_is_consistent(struct zone *zone, struct page *page)
-{
-	if (zone != page_zone(page))
-		return 0;
-
-	return 1;
-}
 /*
  * Temporary debugging check for pages not lying within a given zone.
  */
@@ -531,7 +524,7 @@ static int __maybe_unused bad_range(struct zone *zone, struct page *page)
 {
 	if (page_outside_zone_boundaries(zone, page))
 		return 1;
-	if (!page_is_consistent(zone, page))
+	if (zone != page_zone(page))
 		return 1;
 
 	return 0;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/12] mm: page_alloc: remove alloc_contig_dump_pages() stub
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (4 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 05/12] mm: page_alloc: squash page_is_consistent() Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-09 16:48   ` Mike Rapoport
  2023-05-08  7:11 ` [PATCH 07/12] mm: page_alloc: split out FAIL_PAGE_ALLOC Kefeng Wang
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

DEFINE_DYNAMIC_DEBUG_METADATA and DYNAMIC_DEBUG_BRANCH already has
stub definitions without dynamic debug feature, remove unnecessary
alloc_contig_dump_pages() stub.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/page_alloc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 348dcbaca757..bc453edbad21 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6161,8 +6161,6 @@ int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table,
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
-#if defined(CONFIG_DYNAMIC_DEBUG) || \
-	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
 /* Usage: See admin-guide/dynamic-debug-howto.rst */
 static void alloc_contig_dump_pages(struct list_head *page_list)
 {
@@ -6176,11 +6174,6 @@ static void alloc_contig_dump_pages(struct list_head *page_list)
 			dump_page(page, "migration failure");
 	}
 }
-#else
-static inline void alloc_contig_dump_pages(struct list_head *page_list)
-{
-}
-#endif
 
 /* [start, end) must belong to a single zone. */
 int __alloc_contig_migrate_range(struct compact_control *cc,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/12] mm: page_alloc: split out FAIL_PAGE_ALLOC
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (5 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 06/12] mm: page_alloc: remove alloc_contig_dump_pages() stub Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-08  7:11 ` [PATCH 08/12] mm: page_alloc: split out DEBUG_PAGEALLOC Kefeng Wang
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

... to a single file to reduce a bit of page_alloc.c.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/fault-inject.h |  9 +++++
 mm/Makefile                  |  1 +
 mm/fail_page_alloc.c         | 66 ++++++++++++++++++++++++++++++++
 mm/page_alloc.c              | 74 ------------------------------------
 4 files changed, 76 insertions(+), 74 deletions(-)
 create mode 100644 mm/fail_page_alloc.c

diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
index 481abf530b3c..6d5edef09d45 100644
--- a/include/linux/fault-inject.h
+++ b/include/linux/fault-inject.h
@@ -93,6 +93,15 @@ struct kmem_cache;
 
 bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order);
 
+#ifdef CONFIG_FAIL_PAGE_ALLOC
+bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order);
+#else
+static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
+{
+	return false;
+}
+#endif /* CONFIG_FAIL_PAGE_ALLOC */
+
 int should_failslab(struct kmem_cache *s, gfp_t gfpflags);
 #ifdef CONFIG_FAILSLAB
 extern bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags);
diff --git a/mm/Makefile b/mm/Makefile
index 5262ce5baa28..0eec4bc72d3f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -89,6 +89,7 @@ obj-$(CONFIG_KASAN)	+= kasan/
 obj-$(CONFIG_KFENCE) += kfence/
 obj-$(CONFIG_KMSAN)	+= kmsan/
 obj-$(CONFIG_FAILSLAB) += failslab.o
+obj-$(CONFIG_FAIL_PAGE_ALLOC) += fail_page_alloc.o
 obj-$(CONFIG_MEMTEST)		+= memtest.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_NUMA) += memory-tiers.o
diff --git a/mm/fail_page_alloc.c b/mm/fail_page_alloc.c
new file mode 100644
index 000000000000..b1b09cce9394
--- /dev/null
+++ b/mm/fail_page_alloc.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fault-inject.h>
+#include <linux/mm.h>
+
+static struct {
+	struct fault_attr attr;
+
+	bool ignore_gfp_highmem;
+	bool ignore_gfp_reclaim;
+	u32 min_order;
+} fail_page_alloc = {
+	.attr = FAULT_ATTR_INITIALIZER,
+	.ignore_gfp_reclaim = true,
+	.ignore_gfp_highmem = true,
+	.min_order = 1,
+};
+
+static int __init setup_fail_page_alloc(char *str)
+{
+	return setup_fault_attr(&fail_page_alloc.attr, str);
+}
+__setup("fail_page_alloc=", setup_fail_page_alloc);
+
+bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
+{
+	int flags = 0;
+
+	if (order < fail_page_alloc.min_order)
+		return false;
+	if (gfp_mask & __GFP_NOFAIL)
+		return false;
+	if (fail_page_alloc.ignore_gfp_highmem && (gfp_mask & __GFP_HIGHMEM))
+		return false;
+	if (fail_page_alloc.ignore_gfp_reclaim &&
+			(gfp_mask & __GFP_DIRECT_RECLAIM))
+		return false;
+
+	/* See comment in __should_failslab() */
+	if (gfp_mask & __GFP_NOWARN)
+		flags |= FAULT_NOWARN;
+
+	return should_fail_ex(&fail_page_alloc.attr, 1 << order, flags);
+}
+
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+
+static int __init fail_page_alloc_debugfs(void)
+{
+	umode_t mode = S_IFREG | 0600;
+	struct dentry *dir;
+
+	dir = fault_create_debugfs_attr("fail_page_alloc", NULL,
+					&fail_page_alloc.attr);
+
+	debugfs_create_bool("ignore-gfp-wait", mode, dir,
+			    &fail_page_alloc.ignore_gfp_reclaim);
+	debugfs_create_bool("ignore-gfp-highmem", mode, dir,
+			    &fail_page_alloc.ignore_gfp_highmem);
+	debugfs_create_u32("min-order", mode, dir, &fail_page_alloc.min_order);
+
+	return 0;
+}
+
+late_initcall(fail_page_alloc_debugfs);
+
+#endif /* CONFIG_FAULT_INJECTION_DEBUG_FS */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc453edbad21..fce47ccbcb3a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2942,80 +2942,6 @@ struct page *rmqueue(struct zone *preferred_zone,
 	return page;
 }
 
-#ifdef CONFIG_FAIL_PAGE_ALLOC
-
-static struct {
-	struct fault_attr attr;
-
-	bool ignore_gfp_highmem;
-	bool ignore_gfp_reclaim;
-	u32 min_order;
-} fail_page_alloc = {
-	.attr = FAULT_ATTR_INITIALIZER,
-	.ignore_gfp_reclaim = true,
-	.ignore_gfp_highmem = true,
-	.min_order = 1,
-};
-
-static int __init setup_fail_page_alloc(char *str)
-{
-	return setup_fault_attr(&fail_page_alloc.attr, str);
-}
-__setup("fail_page_alloc=", setup_fail_page_alloc);
-
-static bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
-{
-	int flags = 0;
-
-	if (order < fail_page_alloc.min_order)
-		return false;
-	if (gfp_mask & __GFP_NOFAIL)
-		return false;
-	if (fail_page_alloc.ignore_gfp_highmem && (gfp_mask & __GFP_HIGHMEM))
-		return false;
-	if (fail_page_alloc.ignore_gfp_reclaim &&
-			(gfp_mask & __GFP_DIRECT_RECLAIM))
-		return false;
-
-	/* See comment in __should_failslab() */
-	if (gfp_mask & __GFP_NOWARN)
-		flags |= FAULT_NOWARN;
-
-	return should_fail_ex(&fail_page_alloc.attr, 1 << order, flags);
-}
-
-#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
-
-static int __init fail_page_alloc_debugfs(void)
-{
-	umode_t mode = S_IFREG | 0600;
-	struct dentry *dir;
-
-	dir = fault_create_debugfs_attr("fail_page_alloc", NULL,
-					&fail_page_alloc.attr);
-
-	debugfs_create_bool("ignore-gfp-wait", mode, dir,
-			    &fail_page_alloc.ignore_gfp_reclaim);
-	debugfs_create_bool("ignore-gfp-highmem", mode, dir,
-			    &fail_page_alloc.ignore_gfp_highmem);
-	debugfs_create_u32("min-order", mode, dir, &fail_page_alloc.min_order);
-
-	return 0;
-}
-
-late_initcall(fail_page_alloc_debugfs);
-
-#endif /* CONFIG_FAULT_INJECTION_DEBUG_FS */
-
-#else /* CONFIG_FAIL_PAGE_ALLOC */
-
-static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
-{
-	return false;
-}
-
-#endif /* CONFIG_FAIL_PAGE_ALLOC */
-
 noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 {
 	return __should_fail_alloc_page(gfp_mask, order);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/12] mm: page_alloc: split out DEBUG_PAGEALLOC
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (6 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 07/12] mm: page_alloc: split out FAIL_PAGE_ALLOC Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-08  7:11 ` [PATCH 09/12] mm: page_alloc: move mark_free_page() into snapshot.c Kefeng Wang
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

Move DEBUG_PAGEALLOC related functions into a single file to
reduce a bit of page_alloc.c.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/mm.h    | 76 ++++++++++++++++++++++++++++---------------
 mm/Makefile           |  1 +
 mm/debug_page_alloc.c | 59 +++++++++++++++++++++++++++++++++
 mm/page_alloc.c       | 69 ---------------------------------------
 4 files changed, 109 insertions(+), 96 deletions(-)
 create mode 100644 mm/debug_page_alloc.c

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e5d7b65075a0..fc8732a119cf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3534,9 +3534,58 @@ static inline void debug_pagealloc_unmap_pages(struct page *page, int numpages)
 	if (debug_pagealloc_enabled_static())
 		__kernel_map_pages(page, numpages, 0);
 }
+
+extern unsigned int _debug_guardpage_minorder;
+DECLARE_STATIC_KEY_FALSE(_debug_guardpage_enabled);
+
+static inline unsigned int debug_guardpage_minorder(void)
+{
+	return _debug_guardpage_minorder;
+}
+
+static inline bool debug_guardpage_enabled(void)
+{
+	return static_branch_unlikely(&_debug_guardpage_enabled);
+}
+
+static inline bool page_is_guard(struct page *page)
+{
+	if (!debug_guardpage_enabled())
+		return false;
+
+	return PageGuard(page);
+}
+
+bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order,
+		      int migratetype);
+static inline bool set_page_guard(struct zone *zone, struct page *page,
+				  unsigned int order, int migratetype)
+{
+	if (!debug_guardpage_enabled())
+		return false;
+	return __set_page_guard(zone, page, order, migratetype);
+}
+
+void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order,
+			int migratetype);
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				    unsigned int order, int migratetype)
+{
+	if (!debug_guardpage_enabled())
+		return;
+	__clear_page_guard(zone, page, order, migratetype);
+}
+
 #else	/* CONFIG_DEBUG_PAGEALLOC */
 static inline void debug_pagealloc_map_pages(struct page *page, int numpages) {}
 static inline void debug_pagealloc_unmap_pages(struct page *page, int numpages) {}
+static inline unsigned int debug_guardpage_minorder(void) { return 0; }
+static inline bool debug_guardpage_enabled(void) { return false; }
+static inline bool page_is_guard(struct page *page) { return false; }
+static inline bool set_page_guard(struct zone *zone, struct page *page,
+			unsigned int order, int migratetype) { return false; }
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
 #endif	/* CONFIG_DEBUG_PAGEALLOC */
 
 #ifdef __HAVE_ARCH_GATE_AREA
@@ -3775,33 +3824,6 @@ static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
 
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-extern unsigned int _debug_guardpage_minorder;
-DECLARE_STATIC_KEY_FALSE(_debug_guardpage_enabled);
-
-static inline unsigned int debug_guardpage_minorder(void)
-{
-	return _debug_guardpage_minorder;
-}
-
-static inline bool debug_guardpage_enabled(void)
-{
-	return static_branch_unlikely(&_debug_guardpage_enabled);
-}
-
-static inline bool page_is_guard(struct page *page)
-{
-	if (!debug_guardpage_enabled())
-		return false;
-
-	return PageGuard(page);
-}
-#else
-static inline unsigned int debug_guardpage_minorder(void) { return 0; }
-static inline bool debug_guardpage_enabled(void) { return false; }
-static inline bool page_is_guard(struct page *page) { return false; }
-#endif /* CONFIG_DEBUG_PAGEALLOC */
-
 #if MAX_NUMNODES > 1
 void __init setup_nr_node_ids(void);
 #else
diff --git a/mm/Makefile b/mm/Makefile
index 0eec4bc72d3f..678530a07326 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_SECRETMEM) += secretmem.o
 obj-$(CONFIG_CMA_SYSFS) += cma_sysfs.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
+obj-$(CONFIG_DEBUG_PAGEALLOC) += debug_page_alloc.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_DAMON) += damon/
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
diff --git a/mm/debug_page_alloc.c b/mm/debug_page_alloc.c
new file mode 100644
index 000000000000..f9d145730fd1
--- /dev/null
+++ b/mm/debug_page_alloc.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/mm.h>
+#include <linux/page-isolation.h>
+
+unsigned int _debug_guardpage_minorder;
+
+bool _debug_pagealloc_enabled_early __read_mostly
+			= IS_ENABLED(CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT);
+EXPORT_SYMBOL(_debug_pagealloc_enabled_early);
+DEFINE_STATIC_KEY_FALSE(_debug_pagealloc_enabled);
+EXPORT_SYMBOL(_debug_pagealloc_enabled);
+
+DEFINE_STATIC_KEY_FALSE(_debug_guardpage_enabled);
+
+static int __init early_debug_pagealloc(char *buf)
+{
+	return kstrtobool(buf, &_debug_pagealloc_enabled_early);
+}
+early_param("debug_pagealloc", early_debug_pagealloc);
+
+static int __init debug_guardpage_minorder_setup(char *buf)
+{
+	unsigned long res;
+
+	if (kstrtoul(buf, 10, &res) < 0 ||  res > MAX_ORDER / 2) {
+		pr_err("Bad debug_guardpage_minorder value\n");
+		return 0;
+	}
+	_debug_guardpage_minorder = res;
+	pr_info("Setting debug_guardpage_minorder to %lu\n", res);
+	return 0;
+}
+early_param("debug_guardpage_minorder", debug_guardpage_minorder_setup);
+
+bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order,
+		      int migratetype)
+{
+	if (order >= debug_guardpage_minorder())
+		return false;
+
+	__SetPageGuard(page);
+	INIT_LIST_HEAD(&page->buddy_list);
+	set_page_private(page, order);
+	/* Guard pages are not available for any usage */
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, -(1 << order), migratetype);
+
+	return true;
+}
+
+void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order,
+		      int migratetype)
+{
+	__ClearPageGuard(page);
+
+	set_page_private(page, 0);
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, (1 << order), migratetype);
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fce47ccbcb3a..78d8a59f2afa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -664,75 +664,6 @@ void destroy_large_folio(struct folio *folio)
 	compound_page_dtors[dtor](&folio->page);
 }
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-unsigned int _debug_guardpage_minorder;
-
-bool _debug_pagealloc_enabled_early __read_mostly
-			= IS_ENABLED(CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT);
-EXPORT_SYMBOL(_debug_pagealloc_enabled_early);
-DEFINE_STATIC_KEY_FALSE(_debug_pagealloc_enabled);
-EXPORT_SYMBOL(_debug_pagealloc_enabled);
-
-DEFINE_STATIC_KEY_FALSE(_debug_guardpage_enabled);
-
-static int __init early_debug_pagealloc(char *buf)
-{
-	return kstrtobool(buf, &_debug_pagealloc_enabled_early);
-}
-early_param("debug_pagealloc", early_debug_pagealloc);
-
-static int __init debug_guardpage_minorder_setup(char *buf)
-{
-	unsigned long res;
-
-	if (kstrtoul(buf, 10, &res) < 0 ||  res > MAX_ORDER / 2) {
-		pr_err("Bad debug_guardpage_minorder value\n");
-		return 0;
-	}
-	_debug_guardpage_minorder = res;
-	pr_info("Setting debug_guardpage_minorder to %lu\n", res);
-	return 0;
-}
-early_param("debug_guardpage_minorder", debug_guardpage_minorder_setup);
-
-static inline bool set_page_guard(struct zone *zone, struct page *page,
-				unsigned int order, int migratetype)
-{
-	if (!debug_guardpage_enabled())
-		return false;
-
-	if (order >= debug_guardpage_minorder())
-		return false;
-
-	__SetPageGuard(page);
-	INIT_LIST_HEAD(&page->buddy_list);
-	set_page_private(page, order);
-	/* Guard pages are not available for any usage */
-	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, -(1 << order), migratetype);
-
-	return true;
-}
-
-static inline void clear_page_guard(struct zone *zone, struct page *page,
-				unsigned int order, int migratetype)
-{
-	if (!debug_guardpage_enabled())
-		return;
-
-	__ClearPageGuard(page);
-
-	set_page_private(page, 0);
-	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, (1 << order), migratetype);
-}
-#else
-static inline bool set_page_guard(struct zone *zone, struct page *page,
-			unsigned int order, int migratetype) { return false; }
-static inline void clear_page_guard(struct zone *zone, struct page *page,
-				unsigned int order, int migratetype) {}
-#endif
-
 static inline void set_buddy_order(struct page *page, unsigned int order)
 {
 	set_page_private(page, order);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/12] mm: page_alloc: move mark_free_page() into snapshot.c
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (7 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 08/12] mm: page_alloc: split out DEBUG_PAGEALLOC Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-08  7:11 ` [PATCH 10/12] mm: page_alloc: move pm_* function into power Kefeng Wang
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

The mark_free_page() is only used in kernel/power/snapshot.c,
move it out to reduce a bit of page_alloc.c

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/suspend.h |  3 ---
 kernel/power/snapshot.c | 52 ++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c         | 55 -----------------------------------------
 3 files changed, 52 insertions(+), 58 deletions(-)

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index d0d4598a7b3f..3950a7bf33ae 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -364,9 +364,6 @@ struct pbe {
 	struct pbe *next;
 };
 
-/* mm/page_alloc.c */
-extern void mark_free_pages(struct zone *zone);
-
 /**
  * struct platform_hibernation_ops - hibernation platform support
  *
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index cd8b7b35f1e8..45ef0bf81c85 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1228,6 +1228,58 @@ unsigned int snapshot_additional_pages(struct zone *zone)
 	return 2 * rtree;
 }
 
+/*
+ * Touch the watchdog for every WD_PAGE_COUNT pages.
+ */
+#define WD_PAGE_COUNT	(128*1024)
+
+static void mark_free_pages(struct zone *zone)
+{
+	unsigned long pfn, max_zone_pfn, page_count = WD_PAGE_COUNT;
+	unsigned long flags;
+	unsigned int order, t;
+	struct page *page;
+
+	if (zone_is_empty(zone))
+		return;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	max_zone_pfn = zone_end_pfn(zone);
+	for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+
+			if (!--page_count) {
+				touch_nmi_watchdog();
+				page_count = WD_PAGE_COUNT;
+			}
+
+			if (page_zone(page) != zone)
+				continue;
+
+			if (!swsusp_page_is_forbidden(page))
+				swsusp_unset_page_free(page);
+		}
+
+	for_each_migratetype_order(order, t) {
+		list_for_each_entry(page,
+				&zone->free_area[order].free_list[t], buddy_list) {
+			unsigned long i;
+
+			pfn = page_to_pfn(page);
+			for (i = 0; i < (1UL << order); i++) {
+				if (!--page_count) {
+					touch_nmi_watchdog();
+					page_count = WD_PAGE_COUNT;
+				}
+				swsusp_set_page_free(pfn_to_page(pfn + i));
+			}
+		}
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
  * count_free_highmem_pages - Compute the total number of free highmem pages.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 78d8a59f2afa..9284edf0259b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2313,61 +2313,6 @@ void drain_all_pages(struct zone *zone)
 	__drain_all_pages(zone, false);
 }
 
-#ifdef CONFIG_HIBERNATION
-
-/*
- * Touch the watchdog for every WD_PAGE_COUNT pages.
- */
-#define WD_PAGE_COUNT	(128*1024)
-
-void mark_free_pages(struct zone *zone)
-{
-	unsigned long pfn, max_zone_pfn, page_count = WD_PAGE_COUNT;
-	unsigned long flags;
-	unsigned int order, t;
-	struct page *page;
-
-	if (zone_is_empty(zone))
-		return;
-
-	spin_lock_irqsave(&zone->lock, flags);
-
-	max_zone_pfn = zone_end_pfn(zone);
-	for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-
-			if (!--page_count) {
-				touch_nmi_watchdog();
-				page_count = WD_PAGE_COUNT;
-			}
-
-			if (page_zone(page) != zone)
-				continue;
-
-			if (!swsusp_page_is_forbidden(page))
-				swsusp_unset_page_free(page);
-		}
-
-	for_each_migratetype_order(order, t) {
-		list_for_each_entry(page,
-				&zone->free_area[order].free_list[t], buddy_list) {
-			unsigned long i;
-
-			pfn = page_to_pfn(page);
-			for (i = 0; i < (1UL << order); i++) {
-				if (!--page_count) {
-					touch_nmi_watchdog();
-					page_count = WD_PAGE_COUNT;
-				}
-				swsusp_set_page_free(pfn_to_page(pfn + i));
-			}
-		}
-	}
-	spin_unlock_irqrestore(&zone->lock, flags);
-}
-#endif /* CONFIG_PM */
-
 static bool free_unref_page_prepare(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/12] mm: page_alloc: move pm_* function into power
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (8 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 09/12] mm: page_alloc: move mark_free_page() into snapshot.c Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-08  7:11 ` [PATCH 11/12] mm: vmscan: use gfp_has_io_fs() Kefeng Wang
  2023-05-08  7:12 ` [PATCH 12/12] mm: page_alloc: move sysctls into it own fils Kefeng Wang
  11 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

pm_restrict_gfp_mask()/pm_restore_gfp_mask() only used in power,
let's move them out of page_alloc.c.

Adding a general gfp_has_io_fs() function which return true if
gfp with both __GFP_IO and __GFP_FS flags, then use it inside of
pm_suspended_storage(), also the pm_suspended_storage() is moved
into suspend.h.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/gfp.h     | 15 ++++-----------
 include/linux/suspend.h |  6 ++++++
 kernel/power/main.c     | 27 +++++++++++++++++++++++++++
 kernel/power/power.h    |  5 +++++
 mm/page_alloc.c         | 38 --------------------------------------
 mm/swapfile.c           |  1 +
 6 files changed, 43 insertions(+), 49 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ed8cb537c6a7..665f06675c83 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -338,19 +338,12 @@ extern gfp_t gfp_allowed_mask;
 /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
 bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
 
-extern void pm_restrict_gfp_mask(void);
-extern void pm_restore_gfp_mask(void);
-
-extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
-
-#ifdef CONFIG_PM_SLEEP
-extern bool pm_suspended_storage(void);
-#else
-static inline bool pm_suspended_storage(void)
+static inline bool gfp_has_io_fs(gfp_t gfp)
 {
-	return false;
+	return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
 }
-#endif /* CONFIG_PM_SLEEP */
+
+extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
 
 #ifdef CONFIG_CONTIG_ALLOC
 /* The below functions must be run on a range from a single zone. */
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 3950a7bf33ae..76923051c03d 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -502,6 +502,11 @@ extern void pm_report_max_hw_sleep(u64 t);
 extern bool events_check_enabled;
 extern suspend_state_t pm_suspend_target_state;
 
+static inline bool pm_suspended_storage(void)
+{
+	return !gfp_has_io_fs(gfp_allowed_mask);
+}
+
 extern bool pm_wakeup_pending(void);
 extern void pm_system_wakeup(void);
 extern void pm_system_cancel_wakeup(void);
@@ -535,6 +540,7 @@ static inline void ksys_sync_helper(void) {}
 
 #define pm_notifier(fn, pri)	do { (void)(fn); } while (0)
 
+static inline bool pm_suspended_storage(void) { return false; }
 static inline bool pm_wakeup_pending(void) { return false; }
 static inline void pm_system_wakeup(void) {}
 static inline void pm_wakeup_clear(bool reset) {}
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 3113ec2f1db4..34fc8359145b 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -21,6 +21,33 @@
 #include "power.h"
 
 #ifdef CONFIG_PM_SLEEP
+/*
+ * The following functions are used by the suspend/hibernate code to temporarily
+ * change gfp_allowed_mask in order to avoid using I/O during memory allocations
+ * while devices are suspended.  To avoid races with the suspend/hibernate code,
+ * they should always be called with system_transition_mutex held
+ * (gfp_allowed_mask also should only be modified with system_transition_mutex
+ * held, unless the suspend/hibernate code is guaranteed not to run in parallel
+ * with that modification).
+ */
+static gfp_t saved_gfp_mask;
+
+void pm_restore_gfp_mask(void)
+{
+	WARN_ON(!mutex_is_locked(&system_transition_mutex));
+	if (saved_gfp_mask) {
+		gfp_allowed_mask = saved_gfp_mask;
+		saved_gfp_mask = 0;
+	}
+}
+
+void pm_restrict_gfp_mask(void)
+{
+	WARN_ON(!mutex_is_locked(&system_transition_mutex));
+	WARN_ON(saved_gfp_mask);
+	saved_gfp_mask = gfp_allowed_mask;
+	gfp_allowed_mask &= ~(__GFP_IO | __GFP_FS);
+}
 
 unsigned int lock_system_sleep(void)
 {
diff --git a/kernel/power/power.h b/kernel/power/power.h
index b83c8d5e188d..ac14d1b463d1 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -216,6 +216,11 @@ static inline void suspend_test_finish(const char *label) {}
 /* kernel/power/main.c */
 extern int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down);
 extern int pm_notifier_call_chain(unsigned long val);
+void pm_restrict_gfp_mask(void);
+void pm_restore_gfp_mask(void);
+#else
+static inline void pm_restrict_gfp_mask(void) {}
+static inline void pm_restore_gfp_mask(void) {}
 #endif
 
 #ifdef CONFIG_HIGHMEM
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9284edf0259b..aa4e4af9fc88 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -227,44 +227,6 @@ static inline void set_pcppage_migratetype(struct page *page, int migratetype)
 	page->index = migratetype;
 }
 
-#ifdef CONFIG_PM_SLEEP
-/*
- * The following functions are used by the suspend/hibernate code to temporarily
- * change gfp_allowed_mask in order to avoid using I/O during memory allocations
- * while devices are suspended.  To avoid races with the suspend/hibernate code,
- * they should always be called with system_transition_mutex held
- * (gfp_allowed_mask also should only be modified with system_transition_mutex
- * held, unless the suspend/hibernate code is guaranteed not to run in parallel
- * with that modification).
- */
-
-static gfp_t saved_gfp_mask;
-
-void pm_restore_gfp_mask(void)
-{
-	WARN_ON(!mutex_is_locked(&system_transition_mutex));
-	if (saved_gfp_mask) {
-		gfp_allowed_mask = saved_gfp_mask;
-		saved_gfp_mask = 0;
-	}
-}
-
-void pm_restrict_gfp_mask(void)
-{
-	WARN_ON(!mutex_is_locked(&system_transition_mutex));
-	WARN_ON(saved_gfp_mask);
-	saved_gfp_mask = gfp_allowed_mask;
-	gfp_allowed_mask &= ~(__GFP_IO | __GFP_FS);
-}
-
-bool pm_suspended_storage(void)
-{
-	if ((gfp_allowed_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS))
-		return false;
-	return true;
-}
-#endif /* CONFIG_PM_SLEEP */
-
 #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
 unsigned int pageblock_order __read_mostly;
 #endif
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 274bbf797480..c74259001d5e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -41,6 +41,7 @@
 #include <linux/swap_slots.h>
 #include <linux/sort.h>
 #include <linux/completion.h>
+#include <linux/suspend.h>
 
 #include <asm/tlbflush.h>
 #include <linux/swapops.h>
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/12] mm: vmscan: use gfp_has_io_fs()
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (9 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 10/12] mm: page_alloc: move pm_* function into power Kefeng Wang
@ 2023-05-08  7:11 ` Kefeng Wang
  2023-05-08  7:12 ` [PATCH 12/12] mm: page_alloc: move sysctls into it own fils Kefeng Wang
  11 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:11 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

Use gfp_has_io_fs() instead of open-code.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6d0cd2840cf0..15efbfbb1963 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2458,7 +2458,7 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
 	 * won't get blocked by normal direct-reclaimers, forming a circular
 	 * deadlock.
 	 */
-	if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS))
+	if (gfp_has_io_fs(sc->gfp_mask))
 		inactive >>= 3;
 
 	too_many = isolated > inactive;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/12] mm: page_alloc: move sysctls into it own fils
  2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
                   ` (10 preceding siblings ...)
  2023-05-08  7:11 ` [PATCH 11/12] mm: vmscan: use gfp_has_io_fs() Kefeng Wang
@ 2023-05-08  7:12 ` Kefeng Wang
  11 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:12 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, Kefeng Wang

This moves all page alloc related sysctls to its own file,
as part of the kernel/sysctl.c spring cleaning, also move
some functions declarations from mm.h into internal.h.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/mm.h     |  11 -----
 include/linux/mmzone.h |  21 ---------
 kernel/sysctl.c        |  67 ---------------------------
 mm/internal.h          |   9 ++++
 mm/mm_init.c           |   2 +
 mm/page_alloc.c        | 103 +++++++++++++++++++++++++++++++++++------
 6 files changed, 100 insertions(+), 113 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fc8732a119cf..d533ef955dd0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3045,12 +3045,6 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_range(unsigned long, int, unsigned long,
-		unsigned long, unsigned long, enum meminit_context,
-		struct vmem_altmap *, int migratetype);
-extern void setup_per_zone_wmarks(void);
-extern void calculate_min_free_kbytes(void);
-extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
 extern void __init mmap_init(void);
 
@@ -3071,11 +3065,6 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...);
 
 extern void setup_per_cpu_pageset(void);
 
-/* page_alloc.c */
-extern int min_free_kbytes;
-extern int watermark_boost_factor;
-extern int watermark_scale_factor;
-
 /* nommu.c */
 extern atomic_long_t mmap_pages_allocated;
 extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a4889c9d4055..3a68326c9989 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1512,27 +1512,6 @@ static inline bool has_managed_dma(void)
 }
 #endif
 
-/* These two functions are used to setup the per zone pages min values */
-struct ctl_table;
-
-int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void *, size_t *,
-		loff_t *);
-int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, void *,
-		size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
-int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void *,
-		size_t *, loff_t *);
-int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *, int,
-		void *, size_t *, loff_t *);
-int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
-		void *, size_t *, loff_t *);
-int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
-		void *, size_t *, loff_t *);
-int numa_zonelist_order_handler(struct ctl_table *, int,
-		void *, size_t *, loff_t *);
-extern int percpu_pagelist_high_fraction;
-extern char numa_zonelist_order[];
-#define NUMA_ZONELIST_ORDER_LEN	16
 
 #ifndef CONFIG_NUMA
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index bfe53e835524..a57de67f032f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2119,13 +2119,6 @@ static struct ctl_table vm_table[] = {
 		.extra2		= SYSCTL_ONE,
 	},
 #endif
-	{
-		.procname	= "lowmem_reserve_ratio",
-		.data		= &sysctl_lowmem_reserve_ratio,
-		.maxlen		= sizeof(sysctl_lowmem_reserve_ratio),
-		.mode		= 0644,
-		.proc_handler	= lowmem_reserve_ratio_sysctl_handler,
-	},
 	{
 		.procname	= "drop_caches",
 		.data		= &sysctl_drop_caches,
@@ -2135,39 +2128,6 @@ static struct ctl_table vm_table[] = {
 		.extra1		= SYSCTL_ONE,
 		.extra2		= SYSCTL_FOUR,
 	},
-	{
-		.procname	= "min_free_kbytes",
-		.data		= &min_free_kbytes,
-		.maxlen		= sizeof(min_free_kbytes),
-		.mode		= 0644,
-		.proc_handler	= min_free_kbytes_sysctl_handler,
-		.extra1		= SYSCTL_ZERO,
-	},
-	{
-		.procname	= "watermark_boost_factor",
-		.data		= &watermark_boost_factor,
-		.maxlen		= sizeof(watermark_boost_factor),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= SYSCTL_ZERO,
-	},
-	{
-		.procname	= "watermark_scale_factor",
-		.data		= &watermark_scale_factor,
-		.maxlen		= sizeof(watermark_scale_factor),
-		.mode		= 0644,
-		.proc_handler	= watermark_scale_factor_sysctl_handler,
-		.extra1		= SYSCTL_ONE,
-		.extra2		= SYSCTL_THREE_THOUSAND,
-	},
-	{
-		.procname	= "percpu_pagelist_high_fraction",
-		.data		= &percpu_pagelist_high_fraction,
-		.maxlen		= sizeof(percpu_pagelist_high_fraction),
-		.mode		= 0644,
-		.proc_handler	= percpu_pagelist_high_fraction_sysctl_handler,
-		.extra1		= SYSCTL_ZERO,
-	},
 	{
 		.procname	= "page_lock_unfairness",
 		.data		= &sysctl_page_lock_unfairness,
@@ -2223,24 +2183,6 @@ static struct ctl_table vm_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 	},
-	{
-		.procname	= "min_unmapped_ratio",
-		.data		= &sysctl_min_unmapped_ratio,
-		.maxlen		= sizeof(sysctl_min_unmapped_ratio),
-		.mode		= 0644,
-		.proc_handler	= sysctl_min_unmapped_ratio_sysctl_handler,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE_HUNDRED,
-	},
-	{
-		.procname	= "min_slab_ratio",
-		.data		= &sysctl_min_slab_ratio,
-		.maxlen		= sizeof(sysctl_min_slab_ratio),
-		.mode		= 0644,
-		.proc_handler	= sysctl_min_slab_ratio_sysctl_handler,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE_HUNDRED,
-	},
 #endif
 #ifdef CONFIG_SMP
 	{
@@ -2267,15 +2209,6 @@ static struct ctl_table vm_table[] = {
 		.proc_handler	= mmap_min_addr_handler,
 	},
 #endif
-#ifdef CONFIG_NUMA
-	{
-		.procname	= "numa_zonelist_order",
-		.data		= &numa_zonelist_order,
-		.maxlen		= NUMA_ZONELIST_ORDER_LEN,
-		.mode		= 0644,
-		.proc_handler	= numa_zonelist_order_handler,
-	},
-#endif
 #if (defined(CONFIG_X86_32) && !defined(CONFIG_UML))|| \
    (defined(CONFIG_SUPERH) && defined(CONFIG_VSYSCALL))
 	{
diff --git a/mm/internal.h b/mm/internal.h
index 9482862b28cc..8d8b2faebc89 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -213,6 +213,15 @@ static inline bool is_check_pages_enabled(void)
 	return static_branch_unlikely(&check_pages_enabled);
 }
 
+extern int min_free_kbytes;
+
+void page_alloc_sysctl_init(void);
+void setup_per_zone_wmarks(void);
+void calculate_min_free_kbytes(void);
+int __meminit init_per_zone_wmark_min(void);
+void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
+		unsigned long, enum meminit_context, struct vmem_altmap *, int);
+
 /*
  * Structure for holding the mostly immutable allocation parameters passed
  * between functions involved in allocations, including the alloc_pages*
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 1f30b9e16577..afa56cd50ca4 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2444,6 +2444,8 @@ void __init page_alloc_init_late(void)
 	/* Initialize page ext after all struct pages are initialized. */
 	if (deferred_struct_pages)
 		page_ext_init();
+
+	page_alloc_sysctl_init();
 }
 
 #ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aa4e4af9fc88..880f08575d59 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -206,7 +206,6 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 };
 EXPORT_SYMBOL(node_states);
 
-int percpu_pagelist_high_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
 
 /*
@@ -302,8 +301,8 @@ compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = {
 
 int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
-int watermark_boost_factor __read_mostly = 15000;
-int watermark_scale_factor = 10;
+static int watermark_boost_factor __read_mostly = 15000;
+static int watermark_scale_factor = 10;
 
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
@@ -4828,12 +4827,12 @@ static int __parse_numa_zonelist_order(char *s)
 	return 0;
 }
 
-char numa_zonelist_order[] = "Node";
-
+static char numa_zonelist_order[] = "Node";
+#define NUMA_ZONELIST_ORDER_LEN	16
 /*
  * sysctl handler for numa_zonelist_order
  */
-int numa_zonelist_order_handler(struct ctl_table *table, int write,
+static int numa_zonelist_order_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *length, loff_t *ppos)
 {
 	if (write)
@@ -4841,7 +4840,6 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write,
 	return proc_dostring(table, write, buffer, length, ppos);
 }
 
-
 static int node_load[MAX_NUMNODES];
 
 /**
@@ -5244,6 +5242,7 @@ static int zone_batchsize(struct zone *zone)
 #endif
 }
 
+static int percpu_pagelist_high_fraction;
 static int zone_highsize(struct zone *zone, int batch, int cpu_online)
 {
 #ifdef CONFIG_MMU
@@ -5773,7 +5772,7 @@ postcore_initcall(init_per_zone_wmark_min)
  *	that we can call two helper functions whenever min_free_kbytes
  *	changes.
  */
-int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+static int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *length, loff_t *ppos)
 {
 	int rc;
@@ -5789,7 +5788,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
 	return 0;
 }
 
-int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write,
+static int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *length, loff_t *ppos)
 {
 	int rc;
@@ -5819,7 +5818,7 @@ static void setup_min_unmapped_ratio(void)
 }
 
 
-int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
+static int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *length, loff_t *ppos)
 {
 	int rc;
@@ -5846,7 +5845,7 @@ static void setup_min_slab_ratio(void)
 						     sysctl_min_slab_ratio) / 100;
 }
 
-int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
+static int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *length, loff_t *ppos)
 {
 	int rc;
@@ -5870,8 +5869,8 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
  * minimum watermarks. The lowmem reserve ratio can only make sense
  * if in function of the boot time zone sizes.
  */
-int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
-		void *buffer, size_t *length, loff_t *ppos)
+static int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table,
+		int write, void *buffer, size_t *length, loff_t *ppos)
 {
 	int i;
 
@@ -5891,7 +5890,7 @@ int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
  * cpu. It is the fraction of total pages in each zone that a hot per cpu
  * pagelist can have before it gets flushed back to buddy allocator.
  */
-int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table,
+static int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table,
 		int write, void *buffer, size_t *length, loff_t *ppos)
 {
 	struct zone *zone;
@@ -5924,6 +5923,82 @@ int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table,
 	return ret;
 }
 
+static struct ctl_table page_alloc_sysctl_table[] = {
+	{
+		.procname	= "min_free_kbytes",
+		.data		= &min_free_kbytes,
+		.maxlen		= sizeof(min_free_kbytes),
+		.mode		= 0644,
+		.proc_handler	= min_free_kbytes_sysctl_handler,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "watermark_boost_factor",
+		.data		= &watermark_boost_factor,
+		.maxlen		= sizeof(watermark_boost_factor),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "watermark_scale_factor",
+		.data		= &watermark_scale_factor,
+		.maxlen		= sizeof(watermark_scale_factor),
+		.mode		= 0644,
+		.proc_handler	= watermark_scale_factor_sysctl_handler,
+		.extra1		= SYSCTL_ONE,
+		.extra2		= SYSCTL_THREE_THOUSAND,
+	},
+	{
+		.procname	= "percpu_pagelist_high_fraction",
+		.data		= &percpu_pagelist_high_fraction,
+		.maxlen		= sizeof(percpu_pagelist_high_fraction),
+		.mode		= 0644,
+		.proc_handler	= percpu_pagelist_high_fraction_sysctl_handler,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "lowmem_reserve_ratio",
+		.data		= &sysctl_lowmem_reserve_ratio,
+		.maxlen		= sizeof(sysctl_lowmem_reserve_ratio),
+		.mode		= 0644,
+		.proc_handler	= lowmem_reserve_ratio_sysctl_handler,
+	},
+#ifdef CONFIG_NUMA
+	{
+		.procname	= "numa_zonelist_order",
+		.data		= &numa_zonelist_order,
+		.maxlen		= NUMA_ZONELIST_ORDER_LEN,
+		.mode		= 0644,
+		.proc_handler	= numa_zonelist_order_handler,
+	},
+	{
+		.procname	= "min_unmapped_ratio",
+		.data		= &sysctl_min_unmapped_ratio,
+		.maxlen		= sizeof(sysctl_min_unmapped_ratio),
+		.mode		= 0644,
+		.proc_handler	= sysctl_min_unmapped_ratio_sysctl_handler,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE_HUNDRED,
+	},
+	{
+		.procname	= "min_slab_ratio",
+		.data		= &sysctl_min_slab_ratio,
+		.maxlen		= sizeof(sysctl_min_slab_ratio),
+		.mode		= 0644,
+		.proc_handler	= sysctl_min_slab_ratio_sysctl_handler,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE_HUNDRED,
+	},
+#endif
+	{}
+};
+
+void __init page_alloc_sysctl_init(void)
+{
+	register_sysctl_init("vm", page_alloc_sysctl_table);
+}
+
 #ifdef CONFIG_CONTIG_ALLOC
 /* Usage: See admin-guide/dynamic-debug-howto.rst */
 static void alloc_contig_dump_pages(struct list_head *page_list)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() into mm_init.c
  2023-05-08  7:11 ` [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() " Kefeng Wang
@ 2023-05-08  7:12   ` Huang, Ying
  2023-05-08  7:27     ` Kefeng Wang
  2023-05-10  8:01   ` [PATCH v2 " Kefeng Wang
  1 sibling, 1 reply; 22+ messages in thread
From: Huang, Ying @ 2023-05-08  7:12 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, Mike Rapoport, linux-mm, David Hildenbrand,
	Oscar Salvador, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-pm, linux-fsdevel

Kefeng Wang <wangkefeng.wang@huawei.com> writes:

> set_zone_contiguous() is only used in mm init/hotplug, and
> clear_zone_contiguous() only used in hotplug, move them from
> page_alloc.c to the more appropriate file.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  include/linux/memory_hotplug.h |  3 --
>  mm/internal.h                  |  7 +++
>  mm/mm_init.c                   | 74 +++++++++++++++++++++++++++++++
>  mm/page_alloc.c                | 79 ----------------------------------
>  4 files changed, 81 insertions(+), 82 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 9fcbf5706595..04bc286eed42 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -326,9 +326,6 @@ static inline int remove_memory(u64 start, u64 size)
>  static inline void __remove_memory(u64 start, u64 size) {}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
> -extern void set_zone_contiguous(struct zone *zone);
> -extern void clear_zone_contiguous(struct zone *zone);
> -
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
>  extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> diff --git a/mm/internal.h b/mm/internal.h
> index e28442c0858a..9482862b28cc 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -371,6 +371,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>  }
>  
> +void set_zone_contiguous(struct zone *zone);
> +
> +static inline void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
> +}
> +
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __putback_isolated_page(struct page *page, unsigned int order,
>  				    int mt);
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 15201887f8e0..1f30b9e16577 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -2330,6 +2330,80 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  }
>  #endif
>  
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + *
> + * Note: the function may return non-NULL struct page even for a page block
> + * which contains a memory hole (i.e. there is no physical memory for a subset
> + * of the pfn range). For example, if the pageblock order is MAX_ORDER, which
> + * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole
> + * even though the start pfn is online and valid. This should be safe most of
> + * the time because struct pages are still initialized via init_unavailable_range()
> + * and pfn walkers shouldn't touch any physical memory range for which they do
> + * not recognize any specific metadata in struct pages.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				     unsigned long end_pfn, struct zone *zone)

__pageblock_pfn_to_page() is also called by compaction code too (e.g.,
isolate_freepages_range() -> pageblock_pfn_to_page() ->
__pageblock_pfn_to_page()).

So, it is used not only by initialization and hotplug?

Best Regards,
Huang, Ying

> +{
> +	struct page *start_page;
> +	struct page *end_page;
> +
> +	/* end_pfn is one past the range we are checking */
> +	end_pfn--;
> +
> +	if (!pfn_valid(end_pfn))
> +		return NULL;
> +
> +	start_page = pfn_to_online_page(start_pfn);
> +	if (!start_page)
> +		return NULL;
> +
> +	if (page_zone(start_page) != zone)
> +		return NULL;
> +
> +	end_page = pfn_to_page(end_pfn);
> +
> +	/* This gives a shorter code than deriving page_zone(end_page) */
> +	if (page_zone_id(start_page) != page_zone_id(end_page))
> +		return NULL;
> +
> +	return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +
> +	block_end_pfn = pageblock_end_pfn(block_start_pfn);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +			block_start_pfn = block_end_pfn,
> +			 block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					     block_end_pfn, zone))
> +			return;
> +		cond_resched();
> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
> +
>  void __init page_alloc_init_late(void)
>  {
>  	struct zone *zone;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4f094ba7c8fb..fe7c1ee5becd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1480,85 +1480,6 @@ void __free_pages_core(struct page *page, unsigned int order)
>  	__free_pages_ok(page, order, FPI_TO_TAIL);
>  }
>  
> -/*
> - * Check that the whole (or subset of) a pageblock given by the interval of
> - * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> - * with the migration of free compaction scanner.
> - *
> - * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> - *
> - * It's possible on some configurations to have a setup like node0 node1 node0
> - * i.e. it's possible that all pages within a zones range of pages do not
> - * belong to a single zone. We assume that a border between node0 and node1
> - * can occur within a single pageblock, but not a node0 node1 node0
> - * interleaving within a single pageblock. It is therefore sufficient to check
> - * the first and last page of a pageblock and avoid checking each individual
> - * page in a pageblock.
> - *
> - * Note: the function may return non-NULL struct page even for a page block
> - * which contains a memory hole (i.e. there is no physical memory for a subset
> - * of the pfn range). For example, if the pageblock order is MAX_ORDER, which
> - * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole
> - * even though the start pfn is online and valid. This should be safe most of
> - * the time because struct pages are still initialized via init_unavailable_range()
> - * and pfn walkers shouldn't touch any physical memory range for which they do
> - * not recognize any specific metadata in struct pages.
> - */
> -struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> -				     unsigned long end_pfn, struct zone *zone)
> -{
> -	struct page *start_page;
> -	struct page *end_page;
> -
> -	/* end_pfn is one past the range we are checking */
> -	end_pfn--;
> -
> -	if (!pfn_valid(end_pfn))
> -		return NULL;
> -
> -	start_page = pfn_to_online_page(start_pfn);
> -	if (!start_page)
> -		return NULL;
> -
> -	if (page_zone(start_page) != zone)
> -		return NULL;
> -
> -	end_page = pfn_to_page(end_pfn);
> -
> -	/* This gives a shorter code than deriving page_zone(end_page) */
> -	if (page_zone_id(start_page) != page_zone_id(end_page))
> -		return NULL;
> -
> -	return start_page;
> -}
> -
> -void set_zone_contiguous(struct zone *zone)
> -{
> -	unsigned long block_start_pfn = zone->zone_start_pfn;
> -	unsigned long block_end_pfn;
> -
> -	block_end_pfn = pageblock_end_pfn(block_start_pfn);
> -	for (; block_start_pfn < zone_end_pfn(zone);
> -			block_start_pfn = block_end_pfn,
> -			 block_end_pfn += pageblock_nr_pages) {
> -
> -		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> -
> -		if (!__pageblock_pfn_to_page(block_start_pfn,
> -					     block_end_pfn, zone))
> -			return;
> -		cond_resched();
> -	}
> -
> -	/* We confirm that there is no hole */
> -	zone->contiguous = true;
> -}
> -
> -void clear_zone_contiguous(struct zone *zone)
> -{
> -	zone->contiguous = false;
> -}
> -
>  /*
>   * The order of subdivision here is critical for the IO subsystem.
>   * Please do not alter this order without good reasons and regression

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() into mm_init.c
  2023-05-08  7:12   ` Huang, Ying
@ 2023-05-08  7:27     ` Kefeng Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-08  7:27 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, Mike Rapoport, linux-mm, David Hildenbrand,
	Oscar Salvador, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-pm, linux-fsdevel



On 2023/5/8 15:12, Huang, Ying wrote:
> Kefeng Wang <wangkefeng.wang@huawei.com> writes:
> 
>> set_zone_contiguous() is only used in mm init/hotplug, and
>> clear_zone_contiguous() only used in hotplug, move them from
>> page_alloc.c to the more appropriate file.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   include/linux/memory_hotplug.h |  3 --
>>   mm/internal.h                  |  7 +++
>>   mm/mm_init.c                   | 74 +++++++++++++++++++++++++++++++
>>   mm/page_alloc.c                | 79 ----------------------------------
>>   4 files changed, 81 insertions(+), 82 deletions(-)
>>
...
>>   
>> +/*
>> + * Check that the whole (or subset of) a pageblock given by the interval of
>> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
>> + * with the migration of free compaction scanner.
>> + *
>> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
>> + *
>> + * It's possible on some configurations to have a setup like node0 node1 node0
>> + * i.e. it's possible that all pages within a zones range of pages do not
>> + * belong to a single zone. We assume that a border between node0 and node1
>> + * can occur within a single pageblock, but not a node0 node1 node0
>> + * interleaving within a single pageblock. It is therefore sufficient to check
>> + * the first and last page of a pageblock and avoid checking each individual
>> + * page in a pageblock.
>> + *
>> + * Note: the function may return non-NULL struct page even for a page block
>> + * which contains a memory hole (i.e. there is no physical memory for a subset
>> + * of the pfn range). For example, if the pageblock order is MAX_ORDER, which
>> + * will fall into 2 sub-sections, and the end pfn of the pageblock may be hole
>> + * even though the start pfn is online and valid. This should be safe most of
>> + * the time because struct pages are still initialized via init_unavailable_range()
>> + * and pfn walkers shouldn't touch any physical memory range for which they do
>> + * not recognize any specific metadata in struct pages.
>> + */
>> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>> +				     unsigned long end_pfn, struct zone *zone)
> 
> __pageblock_pfn_to_page() is also called by compaction code too (e.g.,
> isolate_freepages_range() -> pageblock_pfn_to_page() ->
> __pageblock_pfn_to_page()).
> 
> So, it is used not only by initialization and hotplug?
> 

I should drop the move of this function, thanks for your reminder.

> Best Regards,
> Huang, Ying

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 02/12] mm: page_alloc: move init_on_alloc/free() into mm_init.c
  2023-05-08  7:11 ` [PATCH 02/12] mm: page_alloc: move init_on_alloc/free() " Kefeng Wang
@ 2023-05-09 16:38   ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2023-05-09 16:38 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, linux-mm, David Hildenbrand, Oscar Salvador,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel

On Mon, May 08, 2023 at 03:11:50PM +0800, Kefeng Wang wrote:
> Since commit f2fc4b44ec2b ("mm: move init_mem_debugging_and_hardening()
> to mm/mm_init.c"), the init_on_alloc() and init_on_free() define is
> better to move there too.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  mm/mm_init.c    | 6 ++++++
>  mm/page_alloc.c | 5 -----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index da162b7a044c..15201887f8e0 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -2543,6 +2543,12 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
>  	__free_pages_core(page, order);
>  }
>  
> +DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
> +EXPORT_SYMBOL(init_on_alloc);
> +
> +DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
> +EXPORT_SYMBOL(init_on_free);
> +
>  static bool _init_on_alloc_enabled_early __read_mostly
>  				= IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON);
>  static int __init early_init_on_alloc(char *buf)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d1086aeca8f2..4f094ba7c8fb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -233,11 +233,6 @@ unsigned long totalcma_pages __read_mostly;
>  
>  int percpu_pagelist_high_fraction;
>  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> -DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
> -EXPORT_SYMBOL(init_on_alloc);
> -
> -DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
> -EXPORT_SYMBOL(init_on_free);
>  
>  /*
>   * A cached value of the page's pageblock's migratetype, used when the page is
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c
  2023-05-08  7:11 ` [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c Kefeng Wang
@ 2023-05-09 16:38   ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2023-05-09 16:38 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, linux-mm, David Hildenbrand, Oscar Salvador,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel

On Mon, May 08, 2023 at 03:11:49PM +0800, Kefeng Wang wrote:
> Since commit 9420f89db2dd ("mm: move most of core MM initialization
> to mm/mm_init.c"), mirrored_kernelcore should be moved into mm_init.c,
> as most related codes are already there.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  mm/mm_init.c    | 2 ++
>  mm/page_alloc.c | 3 ---
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 7f7f9c677854..da162b7a044c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -259,6 +259,8 @@ static int __init cmdline_parse_core(char *p, unsigned long *core,
>  	return 0;
>  }
>  
> +bool mirrored_kernelcore __initdata_memblock;
> +
>  /*
>   * kernelcore=size sets the amount of memory for use for allocations that
>   * cannot be reclaimed or migrated.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index af9c995d3c1e..d1086aeca8f2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -23,7 +23,6 @@
>  #include <linux/interrupt.h>
>  #include <linux/pagemap.h>
>  #include <linux/jiffies.h>
> -#include <linux/memblock.h>
>  #include <linux/compiler.h>
>  #include <linux/kernel.h>
>  #include <linux/kasan.h>
> @@ -374,8 +373,6 @@ int user_min_free_kbytes = -1;
>  int watermark_boost_factor __read_mostly = 15000;
>  int watermark_scale_factor = 10;
>  
> -bool mirrored_kernelcore __initdata_memblock;
> -
>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>  int movable_zone;
>  EXPORT_SYMBOL(movable_zone);
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 05/12] mm: page_alloc: squash page_is_consistent()
  2023-05-08  7:11 ` [PATCH 05/12] mm: page_alloc: squash page_is_consistent() Kefeng Wang
@ 2023-05-09 16:43   ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2023-05-09 16:43 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, linux-mm, David Hildenbrand, Oscar Salvador,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel

On Mon, May 08, 2023 at 03:11:53PM +0800, Kefeng Wang wrote:
> Squash the page_is_consistent() into bad_range() as there is
> only one caller.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  mm/page_alloc.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9a85238f1140..348dcbaca757 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -517,13 +517,6 @@ static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
>  	return ret;
>  }
>  
> -static int page_is_consistent(struct zone *zone, struct page *page)
> -{
> -	if (zone != page_zone(page))
> -		return 0;
> -
> -	return 1;
> -}
>  /*
>   * Temporary debugging check for pages not lying within a given zone.
>   */
> @@ -531,7 +524,7 @@ static int __maybe_unused bad_range(struct zone *zone, struct page *page)
>  {
>  	if (page_outside_zone_boundaries(zone, page))
>  		return 1;
> -	if (!page_is_consistent(zone, page))
> +	if (zone != page_zone(page))
>  		return 1;
>  
>  	return 0;
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 06/12] mm: page_alloc: remove alloc_contig_dump_pages() stub
  2023-05-08  7:11 ` [PATCH 06/12] mm: page_alloc: remove alloc_contig_dump_pages() stub Kefeng Wang
@ 2023-05-09 16:48   ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2023-05-09 16:48 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, linux-mm, David Hildenbrand, Oscar Salvador,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel

On Mon, May 08, 2023 at 03:11:54PM +0800, Kefeng Wang wrote:
> DEFINE_DYNAMIC_DEBUG_METADATA and DYNAMIC_DEBUG_BRANCH already has
> stub definitions without dynamic debug feature, remove unnecessary
> alloc_contig_dump_pages() stub.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  mm/page_alloc.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 348dcbaca757..bc453edbad21 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6161,8 +6161,6 @@ int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table,
>  }
>  
>  #ifdef CONFIG_CONTIG_ALLOC
> -#if defined(CONFIG_DYNAMIC_DEBUG) || \
> -	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
>  /* Usage: See admin-guide/dynamic-debug-howto.rst */
>  static void alloc_contig_dump_pages(struct list_head *page_list)
>  {
> @@ -6176,11 +6174,6 @@ static void alloc_contig_dump_pages(struct list_head *page_list)
>  			dump_page(page, "migration failure");
>  	}
>  }
> -#else
> -static inline void alloc_contig_dump_pages(struct list_head *page_list)
> -{
> -}
> -#endif
>  
>  /* [start, end) must belong to a single zone. */
>  int __alloc_contig_migrate_range(struct compact_control *cc,
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 03/12] mm: page_alloc: move set_zone_contiguous() into mm_init.c
  2023-05-08  7:11 ` [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() " Kefeng Wang
  2023-05-08  7:12   ` Huang, Ying
@ 2023-05-10  8:01   ` Kefeng Wang
  1 sibling, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-10  8:01 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: David Hildenbrand, Oscar Salvador, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-pm, linux-fsdevel, ying.huang,
	Kefeng Wang

set_zone_contiguous() is only used in mm init/hotplug, and
clear_zone_contiguous() only used in hotplug, move them from
page_alloc.c to the more appropriate file.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
v2: drop move of __pageblock_pfn_to_page(), suggested by Huang Ying

 include/linux/memory_hotplug.h |  3 ---
 mm/internal.h                  |  7 +++++++
 mm/mm_init.c                   | 22 ++++++++++++++++++++++
 mm/page_alloc.c                | 27 ---------------------------
 4 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9fcbf5706595..04bc286eed42 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,6 @@ static inline int remove_memory(u64 start, u64 size)
 static inline void __remove_memory(u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
 extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
diff --git a/mm/internal.h b/mm/internal.h
index e28442c0858a..9482862b28cc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -371,6 +371,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
 }
 
+void set_zone_contiguous(struct zone *zone);
+
+static inline void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
+}
+
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 15201887f8e0..0fd4ddfdfb2e 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2330,6 +2330,28 @@ void __init init_cma_reserved_pageblock(struct page *page)
 }
 #endif
 
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+
+	block_end_pfn = pageblock_end_pfn(block_start_pfn);
+	for (; block_start_pfn < zone_end_pfn(zone);
+			block_start_pfn = block_end_pfn,
+			 block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					     block_end_pfn, zone))
+			return;
+		cond_resched();
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
 void __init page_alloc_init_late(void)
 {
 	struct zone *zone;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4f094ba7c8fb..7bb0d6abfe3d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1532,33 +1532,6 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 	return start_page;
 }
 
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-
-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
-	for (; block_start_pfn < zone_end_pfn(zone);
-			block_start_pfn = block_end_pfn,
-			 block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					     block_end_pfn, zone))
-			return;
-		cond_resched();
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
-void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
 /*
  * The order of subdivision here is critical for the IO subsystem.
  * Please do not alter this order without good reasons and regression
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c
  2023-05-08  7:11 ` [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c Kefeng Wang
@ 2023-05-11  0:04   ` kernel test robot
  2023-05-16  5:30     ` Kefeng Wang
  0 siblings, 1 reply; 22+ messages in thread
From: kernel test robot @ 2023-05-11  0:04 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Mike Rapoport
  Cc: oe-kbuild-all, Linux Memory Management List, David Hildenbrand,
	Oscar Salvador, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-pm, linux-fsdevel, Kefeng Wang

Hi Kefeng,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-page_alloc-move-mirrored_kernelcore-into-mm_init-c/20230508-145724
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230508071200.123962-5-wangkefeng.wang%40huawei.com
patch subject: [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c
config: loongarch-randconfig-s051-20230509 (https://download.01.org/0day-ci/archive/20230511/202305110807.YVsoVagW-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 12.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/be69df472e4d9a6b09a17b854d3aeb9722fc2675
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Kefeng-Wang/mm-page_alloc-move-mirrored_kernelcore-into-mm_init-c/20230508-145724
        git checkout be69df472e4d9a6b09a17b854d3aeb9722fc2675
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=loongarch olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=loongarch SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305110807.YVsoVagW-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
   mm/show_mem.c:336:17: sparse:     expected void *ptr
   mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
   mm/show_mem.c:336:17: sparse:     expected void *ptr
   mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
   mm/show_mem.c:336:17: sparse:     expected void *ptr
   mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
   mm/show_mem.c:336:17: sparse:     expected void *ptr
   mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *

vim +336 mm/show_mem.c

   207	
   208	/*
   209	 * Show free area list (used inside shift_scroll-lock stuff)
   210	 * We also calculate the percentage fragmentation. We do this by counting the
   211	 * memory on each free list with the exception of the first item on the list.
   212	 *
   213	 * Bits in @filter:
   214	 * SHOW_MEM_FILTER_NODES: suppress nodes that are not allowed by current's
   215	 *   cpuset.
   216	 */
   217	void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
   218	{
   219		unsigned long free_pcp = 0;
   220		int cpu, nid;
   221		struct zone *zone;
   222		pg_data_t *pgdat;
   223	
   224		for_each_populated_zone(zone) {
   225			if (zone_idx(zone) > max_zone_idx)
   226				continue;
   227			if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
   228				continue;
   229	
   230			for_each_online_cpu(cpu)
   231				free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
   232		}
   233	
   234		printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
   235			" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
   236			" unevictable:%lu dirty:%lu writeback:%lu\n"
   237			" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
   238			" mapped:%lu shmem:%lu pagetables:%lu\n"
   239			" sec_pagetables:%lu bounce:%lu\n"
   240			" kernel_misc_reclaimable:%lu\n"
   241			" free:%lu free_pcp:%lu free_cma:%lu\n",
   242			global_node_page_state(NR_ACTIVE_ANON),
   243			global_node_page_state(NR_INACTIVE_ANON),
   244			global_node_page_state(NR_ISOLATED_ANON),
   245			global_node_page_state(NR_ACTIVE_FILE),
   246			global_node_page_state(NR_INACTIVE_FILE),
   247			global_node_page_state(NR_ISOLATED_FILE),
   248			global_node_page_state(NR_UNEVICTABLE),
   249			global_node_page_state(NR_FILE_DIRTY),
   250			global_node_page_state(NR_WRITEBACK),
   251			global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B),
   252			global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B),
   253			global_node_page_state(NR_FILE_MAPPED),
   254			global_node_page_state(NR_SHMEM),
   255			global_node_page_state(NR_PAGETABLE),
   256			global_node_page_state(NR_SECONDARY_PAGETABLE),
   257			global_zone_page_state(NR_BOUNCE),
   258			global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
   259			global_zone_page_state(NR_FREE_PAGES),
   260			free_pcp,
   261			global_zone_page_state(NR_FREE_CMA_PAGES));
   262	
   263		for_each_online_pgdat(pgdat) {
   264			if (show_mem_node_skip(filter, pgdat->node_id, nodemask))
   265				continue;
   266			if (!node_has_managed_zones(pgdat, max_zone_idx))
   267				continue;
   268	
   269			printk("Node %d"
   270				" active_anon:%lukB"
   271				" inactive_anon:%lukB"
   272				" active_file:%lukB"
   273				" inactive_file:%lukB"
   274				" unevictable:%lukB"
   275				" isolated(anon):%lukB"
   276				" isolated(file):%lukB"
   277				" mapped:%lukB"
   278				" dirty:%lukB"
   279				" writeback:%lukB"
   280				" shmem:%lukB"
   281	#ifdef CONFIG_TRANSPARENT_HUGEPAGE
   282				" shmem_thp: %lukB"
   283				" shmem_pmdmapped: %lukB"
   284				" anon_thp: %lukB"
   285	#endif
   286				" writeback_tmp:%lukB"
   287				" kernel_stack:%lukB"
   288	#ifdef CONFIG_SHADOW_CALL_STACK
   289				" shadow_call_stack:%lukB"
   290	#endif
   291				" pagetables:%lukB"
   292				" sec_pagetables:%lukB"
   293				" all_unreclaimable? %s"
   294				"\n",
   295				pgdat->node_id,
   296				K(node_page_state(pgdat, NR_ACTIVE_ANON)),
   297				K(node_page_state(pgdat, NR_INACTIVE_ANON)),
   298				K(node_page_state(pgdat, NR_ACTIVE_FILE)),
   299				K(node_page_state(pgdat, NR_INACTIVE_FILE)),
   300				K(node_page_state(pgdat, NR_UNEVICTABLE)),
   301				K(node_page_state(pgdat, NR_ISOLATED_ANON)),
   302				K(node_page_state(pgdat, NR_ISOLATED_FILE)),
   303				K(node_page_state(pgdat, NR_FILE_MAPPED)),
   304				K(node_page_state(pgdat, NR_FILE_DIRTY)),
   305				K(node_page_state(pgdat, NR_WRITEBACK)),
   306				K(node_page_state(pgdat, NR_SHMEM)),
   307	#ifdef CONFIG_TRANSPARENT_HUGEPAGE
   308				K(node_page_state(pgdat, NR_SHMEM_THPS)),
   309				K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)),
   310				K(node_page_state(pgdat, NR_ANON_THPS)),
   311	#endif
   312				K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
   313				node_page_state(pgdat, NR_KERNEL_STACK_KB),
   314	#ifdef CONFIG_SHADOW_CALL_STACK
   315				node_page_state(pgdat, NR_KERNEL_SCS_KB),
   316	#endif
   317				K(node_page_state(pgdat, NR_PAGETABLE)),
   318				K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
   319				pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
   320					"yes" : "no");
   321		}
   322	
   323		for_each_populated_zone(zone) {
   324			int i;
   325	
   326			if (zone_idx(zone) > max_zone_idx)
   327				continue;
   328			if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
   329				continue;
   330	
   331			free_pcp = 0;
   332			for_each_online_cpu(cpu)
   333				free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
   334	
   335			show_node(zone);
 > 336			printk(KERN_CONT

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c
  2023-05-11  0:04   ` kernel test robot
@ 2023-05-16  5:30     ` Kefeng Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2023-05-16  5:30 UTC (permalink / raw)
  To: kernel test robot, Andrew Morton, Mike Rapoport
  Cc: oe-kbuild-all, Linux Memory Management List, David Hildenbrand,
	Oscar Salvador, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-pm, linux-fsdevel



On 2023/5/11 8:04, kernel test robot wrote:
> Hi Kefeng,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on akpm-mm/mm-everything]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-page_alloc-move-mirrored_kernelcore-into-mm_init-c/20230508-145724
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20230508071200.123962-5-wangkefeng.wang%40huawei.com
> patch subject: [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c
> config: loongarch-randconfig-s051-20230509 (https://download.01.org/0day-ci/archive/20230511/202305110807.YVsoVagW-lkp@intel.com/config)
> compiler: loongarch64-linux-gcc (GCC) 12.1.0
> reproduce:
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # apt-get install sparse
>          # sparse version: v0.6.4-39-gce1a6720-dirty
>          # https://github.com/intel-lab-lkp/linux/commit/be69df472e4d9a6b09a17b854d3aeb9722fc2675
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Kefeng-Wang/mm-page_alloc-move-mirrored_kernelcore-into-mm_init-c/20230508-145724
>          git checkout be69df472e4d9a6b09a17b854d3aeb9722fc2675
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=loongarch olddefconfig
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=loongarch SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Link: https://lore.kernel.org/oe-kbuild-all/202305110807.YVsoVagW-lkp@intel.com/
> 
> sparse warnings: (new ones prefixed by >>)
>>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
>     mm/show_mem.c:336:17: sparse:     expected void *ptr
>     mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
>>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
>     mm/show_mem.c:336:17: sparse:     expected void *ptr
>     mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
>>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
>     mm/show_mem.c:336:17: sparse:     expected void *ptr
>     mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
>>> mm/show_mem.c:336:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void *ptr @@     got int [noderef] __percpu * @@
>     mm/show_mem.c:336:17: sparse:     expected void *ptr
>     mm/show_mem.c:336:17: sparse:     got int [noderef] __percpu *
> 
> vim +336 mm/show_mem.c
> 

Thanks, I won't make any change about __show_free_areas() function,
and it better not to fix it, at least not in this patch, the patch is 
only to move some functions. The sparse warning is caused by
K(this_cpu_read(zone->per_cpu_pageset->count)), maybe change it to
__this_cpu_read()?




> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-05-16  5:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-08  7:11 [PATCH -next 00/12] mm: page_alloc: misc cleanup and refector Kefeng Wang
2023-05-08  7:11 ` [PATCH 01/12] mm: page_alloc: move mirrored_kernelcore into mm_init.c Kefeng Wang
2023-05-09 16:38   ` Mike Rapoport
2023-05-08  7:11 ` [PATCH 02/12] mm: page_alloc: move init_on_alloc/free() " Kefeng Wang
2023-05-09 16:38   ` Mike Rapoport
2023-05-08  7:11 ` [PATCH 03/12] mm: page_alloc: move set_zone_contiguous() " Kefeng Wang
2023-05-08  7:12   ` Huang, Ying
2023-05-08  7:27     ` Kefeng Wang
2023-05-10  8:01   ` [PATCH v2 " Kefeng Wang
2023-05-08  7:11 ` [PATCH 04/12] mm: page_alloc: collect mem statistic into show_mem.c Kefeng Wang
2023-05-11  0:04   ` kernel test robot
2023-05-16  5:30     ` Kefeng Wang
2023-05-08  7:11 ` [PATCH 05/12] mm: page_alloc: squash page_is_consistent() Kefeng Wang
2023-05-09 16:43   ` Mike Rapoport
2023-05-08  7:11 ` [PATCH 06/12] mm: page_alloc: remove alloc_contig_dump_pages() stub Kefeng Wang
2023-05-09 16:48   ` Mike Rapoport
2023-05-08  7:11 ` [PATCH 07/12] mm: page_alloc: split out FAIL_PAGE_ALLOC Kefeng Wang
2023-05-08  7:11 ` [PATCH 08/12] mm: page_alloc: split out DEBUG_PAGEALLOC Kefeng Wang
2023-05-08  7:11 ` [PATCH 09/12] mm: page_alloc: move mark_free_page() into snapshot.c Kefeng Wang
2023-05-08  7:11 ` [PATCH 10/12] mm: page_alloc: move pm_* function into power Kefeng Wang
2023-05-08  7:11 ` [PATCH 11/12] mm: vmscan: use gfp_has_io_fs() Kefeng Wang
2023-05-08  7:12 ` [PATCH 12/12] mm: page_alloc: move sysctls into it own fils Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).