linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info
@ 2019-01-11  5:12 Pingfan Liu
  2019-01-11  5:12 ` [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled Pingfan Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

Background
When kaslr kernel can be guaranteed to sit inside unmovable node
after [1]. But if kaslr kernel is located near the end of the movable node,
then bottom-up allocator may create pagetable which crosses the boundary
between unmovable node and movable node.  It is a probability issue,
two factors include -1. how big the gap between kernel end and
unmovable node's end.  -2. how many memory does the system own.
Alternative way to fix this issue is by increasing the gap by
boot/compressed/kaslr*. But taking the scenario of PB level memory,
the pagetable will take server MB even if using 1GB page, different page
attr and fragment will make things worse. So it is hard to decide how much
should the gap increase.
The following figure show the defection of current bottom-up style:
  [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]

If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
can generate pgtable on nodeC, which stain movable node.

This patch makes it certainty instead of a probablity problem. It achieves
this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
Pingfan Liu (7):
  x86/mm: concentrate the code to memblock allocator enabled
  acpi: change the topo of acpi_table_upgrade()
  mm/memblock: introduce allocation boundary for tracing purpose
  x86/setup: parse acpi to get hotplug info before init_mem_mapping()
  x86/mm: set allowed range for memblock allocator
  x86/mm: remove bottom-up allocation style for x86_64
  x86/mm: isolate the bottom-up style to init_32.c

 arch/arm/mm/init.c              |   3 +-
 arch/arm/mm/mmu.c               |   4 +-
 arch/arm/mm/nommu.c             |   2 +-
 arch/arm64/kernel/setup.c       |   2 +-
 arch/csky/kernel/setup.c        |   2 +-
 arch/microblaze/mm/init.c       |   2 +-
 arch/mips/kernel/setup.c        |   2 +-
 arch/powerpc/mm/40x_mmu.c       |   6 +-
 arch/powerpc/mm/44x_mmu.c       |   2 +-
 arch/powerpc/mm/8xx_mmu.c       |   2 +-
 arch/powerpc/mm/fsl_booke_mmu.c |   5 +-
 arch/powerpc/mm/hash_utils_64.c |   4 +-
 arch/powerpc/mm/init_32.c       |   2 +-
 arch/powerpc/mm/pgtable-radix.c |   2 +-
 arch/powerpc/mm/ppc_mmu_32.c    |   8 +-
 arch/powerpc/mm/tlb_nohash.c    |   6 +-
 arch/unicore32/mm/mmu.c         |   2 +-
 arch/x86/kernel/setup.c         |  93 ++++++++++++++---------
 arch/x86/mm/init.c              | 163 +++++-----------------------------------
 arch/x86/mm/init_32.c           | 147 ++++++++++++++++++++++++++++++++++++
 arch/x86/mm/mm_internal.h       |   8 +-
 arch/xtensa/mm/init.c           |   2 +-
 drivers/acpi/tables.c           |   4 +-
 include/linux/acpi.h            |   5 +-
 include/linux/memblock.h        |  10 ++-
 mm/memblock.c                   |  23 ++++--
 26 files changed, 290 insertions(+), 221 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-11  6:12   ` Chao Fan
       [not found]   ` <96233c0c-940d-8d7c-b3be-d8863c026996@intel.com>
  2019-01-11  5:12 ` [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade() Pingfan Liu
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

This patch identifies the point where memblock alloc start. It has no
functional.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/x86/kernel/setup.c | 54 ++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d494b9b..ac432ae 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -962,29 +962,6 @@ void __init setup_arch(char **cmdline_p)
 
 	if (efi_enabled(EFI_BOOT))
 		efi_memblock_x86_reserve_range();
-#ifdef CONFIG_MEMORY_HOTPLUG
-	/*
-	 * Memory used by the kernel cannot be hot-removed because Linux
-	 * cannot migrate the kernel pages. When memory hotplug is
-	 * enabled, we should prevent memblock from allocating memory
-	 * for the kernel.
-	 *
-	 * ACPI SRAT records all hotpluggable memory ranges. But before
-	 * SRAT is parsed, we don't know about it.
-	 *
-	 * The kernel image is loaded into memory at very early time. We
-	 * cannot prevent this anyway. So on NUMA system, we set any
-	 * node the kernel resides in as un-hotpluggable.
-	 *
-	 * Since on modern servers, one node could have double-digit
-	 * gigabytes memory, we can assume the memory around the kernel
-	 * image is also un-hotpluggable. So before SRAT is parsed, just
-	 * allocate memory near the kernel image to try the best to keep
-	 * the kernel away from hotpluggable memory.
-	 */
-	if (movable_node_is_enabled())
-		memblock_set_bottom_up(true);
-#endif
 
 	x86_report_nx();
 
@@ -1096,9 +1073,6 @@ void __init setup_arch(char **cmdline_p)
 
 	cleanup_highmap();
 
-	memblock_set_current_limit(ISA_END_ADDRESS);
-	e820__memblock_setup();
-
 	reserve_bios_regions();
 
 	if (efi_enabled(EFI_MEMMAP)) {
@@ -1113,6 +1087,8 @@ void __init setup_arch(char **cmdline_p)
 		efi_reserve_boot_services();
 	}
 
+	memblock_set_current_limit(0, ISA_END_ADDRESS, false);
+	e820__memblock_setup();
 	/* preallocate 4k for mptable mpc */
 	e820__memblock_alloc_reserved_mpc_new();
 
@@ -1130,7 +1106,31 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+	/*
+	 * Memory used by the kernel cannot be hot-removed because Linux
+	 * cannot migrate the kernel pages. When memory hotplug is
+	 * enabled, we should prevent memblock from allocating memory
+	 * for the kernel.
+	 *
+	 * ACPI SRAT records all hotpluggable memory ranges. But before
+	 * SRAT is parsed, we don't know about it.
+	 *
+	 * The kernel image is loaded into memory at very early time. We
+	 * cannot prevent this anyway. So on NUMA system, we set any
+	 * node the kernel resides in as un-hotpluggable.
+	 *
+	 * Since on modern servers, one node could have double-digit
+	 * gigabytes memory, we can assume the memory around the kernel
+	 * image is also un-hotpluggable. So before SRAT is parsed, just
+	 * allocate memory near the kernel image to try the best to keep
+	 * the kernel away from hotpluggable memory.
+	 */
+	if (movable_node_is_enabled())
+		memblock_set_bottom_up(true);
+#endif
 	init_mem_mapping();
+	memblock_set_current_limit(get_max_mapped());
 
 	idt_setup_early_pf();
 
@@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE;
 
-	memblock_set_current_limit(get_max_mapped());
-
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
 	 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade()
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
  2019-01-11  5:12 ` [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-11  5:30   ` Chao Fan
  2019-01-14 23:12   ` Dave Hansen
  2019-01-11  5:12 ` [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose Pingfan Liu
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

The current acpi_table_upgrade() relies on initrd_start, but this var is
only valid after relocate_initrd(). There is requirement to extract the
acpi info from initrd before memblock-allocator can work(see [2/4]), hence
acpi_table_upgrade() need to accept the input param directly.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/arm64/kernel/setup.c | 2 +-
 arch/x86/kernel/setup.c   | 2 +-
 drivers/acpi/tables.c     | 4 +---
 include/linux/acpi.h      | 4 ++--
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f4fc1e0..bc4b47d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -315,7 +315,7 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	efi_apply_persistent_mem_reservations();
 
-	acpi_table_upgrade();
+	acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
 
 	/* Parse the ACPI tables for possible boot-time configuration */
 	acpi_boot_table_init();
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ac432ae..dc8fc5d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1172,8 +1172,8 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-	acpi_table_upgrade();
 
+	acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
 	vsmp_init();
 
 	io_delay_init();
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 61203ee..84e0a79 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -471,10 +471,8 @@ static DECLARE_BITMAP(acpi_initrd_installed, NR_ACPI_INITRD_TABLES);
 
 #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
 
-void __init acpi_table_upgrade(void)
+void __init acpi_table_upgrade(void *data, size_t size)
 {
-	void *data = (void *)initrd_start;
-	size_t size = initrd_end - initrd_start;
 	int sig, no, table_nr = 0, total_offset = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index ed80f14..0b6e0b6 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1254,9 +1254,9 @@ acpi_graph_get_remote_endpoint(const struct fwnode_handle *fwnode,
 #endif
 
 #ifdef CONFIG_ACPI_TABLE_UPGRADE
-void acpi_table_upgrade(void);
+void acpi_table_upgrade(void *data, size_t size);
 #else
-static inline void acpi_table_upgrade(void) { }
+static inline void acpi_table_upgrade(void *data, size_t size) { }
 #endif
 
 #if defined(CONFIG_ACPI) && defined(CONFIG_ACPI_WATCHDOG)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
  2019-01-11  5:12 ` [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled Pingfan Liu
  2019-01-11  5:12 ` [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade() Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-14  7:51   ` Mike Rapoport
  2019-01-11  5:12 ` [PATCHv2 4/7] x86/setup: parse acpi to get hotplug info before init_mem_mapping() Pingfan Liu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

During boot time, there is requirement to tell whether a series of func
call will consume memory or not. For some reason, a temporary memory
resource can be loan to those func through memblock allocator, but at a
check point, all of the loan memory should be turned back.
A typical using style:
 -1. find a usable range by memblock_find_in_range(), said, [A,B]
 -2. before calling a series of func, memblock_set_current_limit(A,B,true)
 -3. call funcs
 -4. memblock_find_in_range(A,B,B-A,1), if failed, then some memory is not
     turned back.
 -5. reset the original limit

E.g. in the case of hotmovable memory, some acpi routines should be called,
and they are not allowed to own some movable memory. Although at present
these functions do not consume memory, but later, if changed without
awareness, they may do. With the above method, the allocation can be
detected, and pr_warn() to ask people to resolve it.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/arm/mm/init.c              |  3 ++-
 arch/arm/mm/mmu.c               |  4 ++--
 arch/arm/mm/nommu.c             |  2 +-
 arch/csky/kernel/setup.c        |  2 +-
 arch/microblaze/mm/init.c       |  2 +-
 arch/mips/kernel/setup.c        |  2 +-
 arch/powerpc/mm/40x_mmu.c       |  6 ++++--
 arch/powerpc/mm/44x_mmu.c       |  2 +-
 arch/powerpc/mm/8xx_mmu.c       |  2 +-
 arch/powerpc/mm/fsl_booke_mmu.c |  5 +++--
 arch/powerpc/mm/hash_utils_64.c |  4 ++--
 arch/powerpc/mm/init_32.c       |  2 +-
 arch/powerpc/mm/pgtable-radix.c |  2 +-
 arch/powerpc/mm/ppc_mmu_32.c    |  8 ++++++--
 arch/powerpc/mm/tlb_nohash.c    |  6 ++++--
 arch/unicore32/mm/mmu.c         |  2 +-
 arch/x86/kernel/setup.c         |  2 +-
 arch/xtensa/mm/init.c           |  2 +-
 include/linux/memblock.h        | 10 +++++++---
 mm/memblock.c                   | 23 ++++++++++++++++++-----
 20 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 32e4845..58a4342 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -93,7 +93,8 @@ __tagtable(ATAG_INITRD2, parse_tag_initrd2);
 static void __init find_limits(unsigned long *min, unsigned long *max_low,
 			       unsigned long *max_high)
 {
-	*max_low = PFN_DOWN(memblock_get_current_limit());
+	memblock_get_current_limit(NULL, max_low);
+	*max_low = PFN_DOWN(*max_low);
 	*min = PFN_UP(memblock_start_of_DRAM());
 	*max_high = PFN_DOWN(memblock_end_of_DRAM());
 }
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f5cc1cc..9025418 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1240,7 +1240,7 @@ void __init adjust_lowmem_bounds(void)
 		}
 	}
 
-	memblock_set_current_limit(memblock_limit);
+	memblock_set_current_limit(0, memblock_limit, false);
 }
 
 static inline void prepare_page_table(void)
@@ -1625,7 +1625,7 @@ void __init paging_init(const struct machine_desc *mdesc)
 
 	prepare_page_table();
 	map_lowmem();
-	memblock_set_current_limit(arm_lowmem_limit);
+	memblock_set_current_limit(0, arm_lowmem_limit, false);
 	dma_contiguous_remap();
 	early_fixmap_shutdown();
 	devicemaps_init(mdesc);
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index 7d67c70..721535c 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -138,7 +138,7 @@ void __init adjust_lowmem_bounds(void)
 	adjust_lowmem_bounds_mpu();
 	end = memblock_end_of_DRAM();
 	high_memory = __va(end - 1) + 1;
-	memblock_set_current_limit(end);
+	memblock_set_current_limit(0, end, false);
 }
 
 /*
diff --git a/arch/csky/kernel/setup.c b/arch/csky/kernel/setup.c
index dff8b89..e6f88bf 100644
--- a/arch/csky/kernel/setup.c
+++ b/arch/csky/kernel/setup.c
@@ -100,7 +100,7 @@ static void __init csky_memblock_init(void)
 
 	highend_pfn = max_pfn;
 #endif
-	memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+	memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
 
 	dma_contiguous_reserve(0);
 
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index b17fd8a..cee99da 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -353,7 +353,7 @@ asmlinkage void __init mmu_init(void)
 	/* Shortly after that, the entire linear mapping will be available */
 	/* This will also cause that unflatten device tree will be allocated
 	 * inside 768MB limit */
-	memblock_set_current_limit(memory_start + lowmem_size - 1);
+	memblock_set_current_limit(0, memory_start + lowmem_size - 1, false);
 }
 
 /* This is only called until mem_init is done. */
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 8c6c48ed..62dabe1 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -862,7 +862,7 @@ static void __init arch_mem_init(char **cmdline_p)
 	 * with memblock_reserve; memblock_alloc* can be used
 	 * only after this point
 	 */
-	memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+	memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
 
 #ifdef CONFIG_PROC_VMCORE
 	if (setup_elfcorehdr && setup_elfcorehdr_size) {
diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
index 61ac468..427bb56 100644
--- a/arch/powerpc/mm/40x_mmu.c
+++ b/arch/powerpc/mm/40x_mmu.c
@@ -141,7 +141,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
 	 * coverage with normal-sized pages (or other reasons) do not
 	 * attempt to allocate outside the allowed range.
 	 */
-	memblock_set_current_limit(mapped);
+	memblock_set_current_limit(0, mapped, false);
 
 	return mapped;
 }
@@ -155,5 +155,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	BUG_ON(first_memblock_base != 0);
 
 	/* 40x can only access 16MB at the moment (see head_40x.S) */
-	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
+	memblock_set_current_limit(0,
+		min_t(u64, first_memblock_size, 0x00800000),
+		false);
 }
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 12d9251..3cf127d 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -225,7 +225,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 
 	/* 44x has a 256M TLB entry pinned at boot */
 	size = (min_t(u64, first_memblock_size, PPC_PIN_SIZE));
-	memblock_set_current_limit(first_memblock_base + size);
+	memblock_set_current_limit(0, first_memblock_base + size, false);
 }
 
 #ifdef CONFIG_SMP
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 01b7f51..c75bca6 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -135,7 +135,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
 	 * attempt to allocate outside the allowed range.
 	 */
 	if (mapped)
-		memblock_set_current_limit(mapped);
+		memblock_set_current_limit(0, mapped, false);
 
 	block_mapped_ram = mapped;
 
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 080d49b..3be24b8 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -252,7 +252,8 @@ void __init adjust_total_lowmem(void)
 	pr_cont("%lu Mb, residual: %dMb\n", tlbcam_sz(tlbcam_index - 1) >> 20,
 	        (unsigned int)((total_lowmem - __max_low_memory) >> 20));
 
-	memblock_set_current_limit(memstart_addr + __max_low_memory);
+	memblock_set_current_limit(0,
+		memstart_addr + __max_low_memory, false);
 }
 
 void setup_initial_memory_limit(phys_addr_t first_memblock_base,
@@ -261,7 +262,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	phys_addr_t limit = first_memblock_base + first_memblock_size;
 
 	/* 64M mapped initially according to head_fsl_booke.S */
-	memblock_set_current_limit(min_t(u64, limit, 0x04000000));
+	memblock_set_current_limit(0, min_t(u64, limit, 0x04000000), false);
 }
 
 #ifdef CONFIG_RELOCATABLE
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0cc7fbc..30fba80 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -925,7 +925,7 @@ static void __init htab_initialize(void)
 		BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
 				prot, mmu_linear_psize, mmu_kernel_ssize));
 	}
-	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
+	memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
 
 	/*
 	 * If we have a memory_limit and we've allocated TCEs then we need to
@@ -1867,7 +1867,7 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 			ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
 
 		/* Finally limit subsequent allocations */
-		memblock_set_current_limit(ppc64_rma_size);
+		memblock_set_current_limit(0, ppc64_rma_size, false);
 	} else {
 		ppc64_rma_size = ULONG_MAX;
 	}
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 3e59e5d..863d710 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -183,5 +183,5 @@ void __init MMU_init(void)
 #endif
 
 	/* Shortly after that, the entire linear mapping will be available */
-	memblock_set_current_limit(lowmem_end_addr);
+	memblock_set_current_limit(0, lowmem_end_addr, false);
 }
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 9311560..8cd5f2d 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -603,7 +603,7 @@ void __init radix__early_init_mmu(void)
 		radix_init_pseries();
 	}
 
-	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
+	memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
 
 	radix_init_iamr();
 	radix_init_pgtable();
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index f6f575b..80927ad 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -283,7 +283,11 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 
 	/* 601 can only access 16MB at the moment */
 	if (PVR_VER(mfspr(SPRN_PVR)) == 1)
-		memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
+		memblock_set_current_limit(0,
+			min_t(u64, first_memblock_size, 0x01000000),
+			false);
 	else /* Anything else has 256M mapped */
-		memblock_set_current_limit(min_t(u64, first_memblock_size, 0x10000000));
+		memblock_set_current_limit(0,
+			min_t(u64, first_memblock_size, 0x10000000),
+			false);
 }
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index ae5d568..d074362 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -735,7 +735,7 @@ static void __init early_mmu_set_memory_limit(void)
 		 * reduces the memory available to Linux.  We need to
 		 * do this because highmem is not supported on 64-bit.
 		 */
-		memblock_enforce_memory_limit(linear_map_top);
+		memblock_enforce_memory_limit(0, linear_map_top, false);
 	}
 #endif
 
@@ -792,7 +792,9 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 		ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
 
 	/* Finally limit subsequent allocations */
-	memblock_set_current_limit(first_memblock_base + ppc64_rma_size);
+	memblock_set_current_limit(0,
+			first_memblock_base + ppc64_rma_size,
+			false);
 }
 #else /* ! CONFIG_PPC64 */
 void __init early_init_mmu(void)
diff --git a/arch/unicore32/mm/mmu.c b/arch/unicore32/mm/mmu.c
index 040a8c2..6d62529 100644
--- a/arch/unicore32/mm/mmu.c
+++ b/arch/unicore32/mm/mmu.c
@@ -286,7 +286,7 @@ static void __init sanity_check_meminfo(void)
 	int i, j;
 
 	lowmem_limit = __pa(vmalloc_min - 1) + 1;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(0, lowmem_limit, false);
 
 	for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
 		struct membank *bank = &meminfo.bank[j];
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index dc8fc5d..a0122cd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1130,7 +1130,7 @@ void __init setup_arch(char **cmdline_p)
 		memblock_set_bottom_up(true);
 #endif
 	init_mem_mapping();
-	memblock_set_current_limit(get_max_mapped());
+	memblock_set_current_limit(0, get_max_mapped(), false);
 
 	idt_setup_early_pf();
 
diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
index 30a48bb..b924387 100644
--- a/arch/xtensa/mm/init.c
+++ b/arch/xtensa/mm/init.c
@@ -60,7 +60,7 @@ void __init bootmem_init(void)
 	max_pfn = PFN_DOWN(memblock_end_of_DRAM());
 	max_low_pfn = min(max_pfn, MAX_LOW_PFN);
 
-	memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+	memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
 	dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
 
 	memblock_dump_all();
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index aee299a..49676f0 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -88,6 +88,8 @@ struct memblock_type {
  */
 struct memblock {
 	bool bottom_up;  /* is bottom up direction? */
+	bool enforce_checking;
+	phys_addr_t start_limit;
 	phys_addr_t current_limit;
 	struct memblock_type memory;
 	struct memblock_type reserved;
@@ -482,12 +484,14 @@ static inline void memblock_dump_all(void)
  * memblock_set_current_limit - Set the current allocation limit to allow
  *                         limiting allocations to what is currently
  *                         accessible during boot
- * @limit: New limit value (physical address)
+ * [start_limit, end_limit]: New limit value (physical address)
+ * enforcing: whether check against the limit boundary or not
  */
-void memblock_set_current_limit(phys_addr_t limit);
+void memblock_set_current_limit(phys_addr_t start_limit,
+	phys_addr_t end_limit, bool enforcing);
 
 
-phys_addr_t memblock_get_current_limit(void);
+bool memblock_get_current_limit(phys_addr_t *start, phys_addr_t *end);
 
 /*
  * pfn conversion functions
diff --git a/mm/memblock.c b/mm/memblock.c
index 81ae63c..b792be0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -116,6 +116,8 @@ struct memblock memblock __initdata_memblock = {
 #endif
 
 	.bottom_up		= false,
+	.enforce_checking	= false,
+	.start_limit		= 0,
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
@@ -261,8 +263,11 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 {
 	phys_addr_t kernel_end, ret;
 
+	if (unlikely(memblock.enforce_checking)) {
+		start = memblock.start_limit;
+		end = memblock.current_limit;
 	/* pump up @end */
-	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
+	} else if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
 
 	/* avoid allocating the first page */
@@ -1826,14 +1831,22 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
 	}
 }
 
-void __init_memblock memblock_set_current_limit(phys_addr_t limit)
+void __init_memblock memblock_set_current_limit(phys_addr_t start,
+	phys_addr_t end, bool enforcing)
 {
-	memblock.current_limit = limit;
+	memblock.start_limit = start;
+	memblock.current_limit = end;
+	memblock.enforce_checking = enforcing;
 }
 
-phys_addr_t __init_memblock memblock_get_current_limit(void)
+bool __init_memblock memblock_get_current_limit(phys_addr_t *start,
+	phys_addr_t *end)
 {
-	return memblock.current_limit;
+	if (start)
+		*start = memblock.start_limit;
+	if (end)
+		*end = memblock.current_limit;
+	return memblock.enforce_checking;
 }
 
 static void __init_memblock memblock_dump(struct memblock_type *type)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2 4/7] x86/setup: parse acpi to get hotplug info before init_mem_mapping()
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
                   ` (2 preceding siblings ...)
  2019-01-11  5:12 ` [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-11  5:12 ` [PATCHv2 5/7] x86/mm: set allowed range for memblock allocator Pingfan Liu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

At present, memblock bottom-up allocation can help us against staining over
movable node in very high probability. But if the hotplug info has already
been parsed, the memblock allocator can step around the movable node by
itself. This patch pushes the parsing step forward, just ahead of where,
the memblock allocator can work. About how memblock allocator steps around
the movable node, referring to the cond check on memblock_is_hotpluggable()
in __next_mem_range().
Later in this series, the bottom-up allocation style can be removed on x86_64.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/x86/kernel/setup.c | 39 ++++++++++++++++++++++++++++++---------
 include/linux/acpi.h    |  1 +
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a0122cd..9b57e01 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -804,6 +804,35 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 	return 0;
 }
 
+static void early_acpi_parse(void)
+{
+	phys_addr_t start, end, orig_start, orig_end;
+	bool enforcing;
+
+	enforcing = memblock_get_current_limit(&orig_start, &orig_end);
+	/* find a 16MB slot for temporary usage by the following routines. */
+	start = memblock_find_in_range(ISA_END_ADDRESS,
+			max_pfn, 1 << 24, 1);
+	end = start + 1 + (1 << 24);
+	memblock_set_current_limit(start, end, true);
+#ifdef CONFIG_BLK_DEV_INITRD
+	if (get_ramdisk_size())
+		acpi_table_upgrade(__va(get_ramdisk_image()),
+			get_ramdisk_size());
+#endif
+	/*
+	 * Parse the ACPI tables for possible boot-time SMP configuration.
+	 */
+	acpi_boot_table_init();
+	early_acpi_boot_init();
+	initmem_init();
+	/* check whether memory is returned or not */
+	start = memblock_find_in_range(start, end, 1<<24, 1);
+	if (!start)
+		pr_warn("the above acpi routines change and consume memory\n");
+	memblock_set_current_limit(orig_start, orig_end, enforcing);
+}
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -1129,6 +1158,7 @@ void __init setup_arch(char **cmdline_p)
 	if (movable_node_is_enabled())
 		memblock_set_bottom_up(true);
 #endif
+	early_acpi_parse();
 	init_mem_mapping();
 	memblock_set_current_limit(0, get_max_mapped(), false);
 
@@ -1173,21 +1203,12 @@ void __init setup_arch(char **cmdline_p)
 	reserve_initrd();
 
 
-	acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
 	vsmp_init();
 
 	io_delay_init();
 
 	early_platform_quirks();
 
-	/*
-	 * Parse the ACPI tables for possible boot-time SMP configuration.
-	 */
-	acpi_boot_table_init();
-
-	early_acpi_boot_init();
-
-	initmem_init();
 	dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);
 
 	/*
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 0b6e0b6..4f6b391 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -235,6 +235,7 @@ int acpi_mps_check (void);
 int acpi_numa_init (void);
 
 int acpi_table_init (void);
+void acpi_tb_terminate(void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 			      int entry_id,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2 5/7] x86/mm: set allowed range for memblock allocator
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
                   ` (3 preceding siblings ...)
  2019-01-11  5:12 ` [PATCHv2 4/7] x86/setup: parse acpi to get hotplug info before init_mem_mapping() Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-11  5:12 ` [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64 Pingfan Liu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

Due to the incoming divergence of x86_32 and x86_64, there is requirement
to set the allowed allocating range at the early boot stage.
This patch also includes minor change to remove redundat cond check, refer
to memblock_find_in_range_node(), memblock_find_in_range() has already
protect itself from the case: start > end.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/x86/mm/init.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ef99f38..385b9cd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -76,6 +76,14 @@ static unsigned long min_pfn_mapped;
 
 static bool __initdata can_use_brk_pgt = true;
 
+static unsigned long min_pfn_allowed;
+static unsigned long max_pfn_allowed;
+void set_alloc_range(unsigned long low, unsigned long high)
+{
+	min_pfn_allowed = low;
+	max_pfn_allowed = high;
+}
+
 /*
  * Pages returned are already directly mapped.
  *
@@ -100,12 +108,10 @@ __ref void *alloc_low_pages(unsigned int num)
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret = 0;
 
-		if (min_pfn_mapped < max_pfn_mapped) {
-			ret = memblock_find_in_range(
-					min_pfn_mapped << PAGE_SHIFT,
-					max_pfn_mapped << PAGE_SHIFT,
-					PAGE_SIZE * num , PAGE_SIZE);
-		}
+		ret = memblock_find_in_range(
+			min_pfn_allowed << PAGE_SHIFT,
+			max_pfn_allowed << PAGE_SHIFT,
+			PAGE_SIZE * num, PAGE_SIZE);
 		if (ret)
 			memblock_reserve(ret, PAGE_SIZE * num);
 		else if (can_use_brk_pgt)
@@ -588,14 +594,17 @@ static void __init memory_map_top_down(unsigned long map_start,
 			start = map_start;
 		mapped_ram_size += init_range_memory_mapping(start,
 							last_start);
+		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
 		last_start = start;
 		min_pfn_mapped = last_start >> PAGE_SHIFT;
 		if (mapped_ram_size >= step_size)
 			step_size = get_new_step_size(step_size);
 	}
 
-	if (real_end < map_end)
+	if (real_end < map_end) {
 		init_range_memory_mapping(real_end, map_end);
+		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
+	}
 }
 
 /**
@@ -636,6 +645,7 @@ static void __init memory_map_bottom_up(unsigned long map_start,
 		}
 
 		mapped_ram_size += init_range_memory_mapping(start, next);
+		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
 		start = next;
 
 		if (mapped_ram_size >= step_size)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
                   ` (4 preceding siblings ...)
  2019-01-11  5:12 ` [PATCHv2 5/7] x86/mm: set allowed range for memblock allocator Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-14 23:27   ` Dave Hansen
  2019-01-11  5:12 ` [PATCHv2 7/7] x86/mm: isolate the bottom-up style to init_32.c Pingfan Liu
  2019-01-14 23:02 ` [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Dave Hansen
  7 siblings, 1 reply; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

Although kaslr-kernel can avoid to stain the movable node. [1] But the
pgtable can still stain the movable node. That is a probability problem,
although low, but exist. This patch tries to make it certainty by
allocating pgtable on unmovable node, instead of following kernel end.
There are two acheivements by this patch:
-1st. keep the subtree of pgtable away from movable node.
With the previous patch, at the point of init_mem_mapping(),
memblock allocator can work with the knowledge of acpi memory hotmovable
info, and avoid to stain the movable node. As a result,
memory_map_bottom_up() is not needed any more.
The following figure show the defection of current bottom-up style:
  [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]
If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
can generate pgtable on nodeC, which stain movable node.
For more lengthy background, please refer to Background section

-2nd. simplify the logic of memory_map_top_down()
Thanks to the help of early_make_pgtable(), x86_64 can directly set up the
subtree of pgtable at any place, hence the careful iteration in
memory_map_top_down() can be discard.

*Background section*
When kaslr kernel can be guaranteed to sit inside unmovable node
after [1]. But if kaslr kernel is located near the end of the movable node,
then bottom-up allocator may create pagetable which crosses the boundary
between unmovable node and movable node.  It is a probability issue,
two factors include -1. how big the gap between kernel end and
unmovable node's end.  -2. how many memory does the system own.
Alternative way to fix this issue is by increasing the gap by
boot/compressed/kaslr*. But taking the scenario of PB level memory,
the pagetable will take server MB even if using 1GB page, different page
attr and fragment will make things worse. So it is hard to decide how much
should the gap increase.

[1]: https://lore.kernel.org/patchwork/patch/1029376/
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org

---
 arch/x86/kernel/setup.c |  4 ++--
 arch/x86/mm/init.c      | 56 ++++++++++++++++++++++++++++++-------------------
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9b57e01..00a1b84 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -827,7 +827,7 @@ static void early_acpi_parse(void)
 	early_acpi_boot_init();
 	initmem_init();
 	/* check whether memory is returned or not */
-	start = memblock_find_in_range(start, end, 1<<24, 1);
+	start = memblock_find_in_range(start, end, 1 << 24, 1);
 	if (!start)
 		pr_warn("the above acpi routines change and consume memory\n");
 	memblock_set_current_limit(orig_start, orig_end, enforcing);
@@ -1135,7 +1135,7 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_X86_32)
 	/*
 	 * Memory used by the kernel cannot be hot-removed because Linux
 	 * cannot migrate the kernel pages. When memory hotplug is
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 385b9cd..003ad77 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -72,8 +72,6 @@ static unsigned long __initdata pgt_buf_start;
 static unsigned long __initdata pgt_buf_end;
 static unsigned long __initdata pgt_buf_top;
 
-static unsigned long min_pfn_mapped;
-
 static bool __initdata can_use_brk_pgt = true;
 
 static unsigned long min_pfn_allowed;
@@ -532,6 +530,10 @@ static unsigned long __init init_range_memory_mapping(
 	return mapped_ram_size;
 }
 
+#ifdef CONFIG_X86_32
+
+static unsigned long min_pfn_mapped;
+
 static unsigned long __init get_new_step_size(unsigned long step_size)
 {
 	/*
@@ -653,6 +655,32 @@ static void __init memory_map_bottom_up(unsigned long map_start,
 	}
 }
 
+static unsigned long __init init_range_memory_mapping32(
+	unsigned long r_start, unsigned long r_end)
+{
+	/*
+	 * If the allocation is in bottom-up direction, we setup direct mapping
+	 * in bottom-up, otherwise we setup direct mapping in top-down.
+	 */
+	if (memblock_bottom_up()) {
+		unsigned long kernel_end = __pa_symbol(_end);
+
+		/*
+		 * we need two separate calls here. This is because we want to
+		 * allocate page tables above the kernel. So we first map
+		 * [kernel_end, end) to make memory above the kernel be mapped
+		 * as soon as possible. And then use page tables allocated above
+		 * the kernel to map [ISA_END_ADDRESS, kernel_end).
+		 */
+		memory_map_bottom_up(kernel_end, r_end);
+		memory_map_bottom_up(r_start, kernel_end);
+	} else {
+		memory_map_top_down(r_start, r_end);
+	}
+}
+
+#endif
+
 void __init init_mem_mapping(void)
 {
 	unsigned long end;
@@ -663,6 +691,8 @@ void __init init_mem_mapping(void)
 
 #ifdef CONFIG_X86_64
 	end = max_pfn << PAGE_SHIFT;
+	/* allow alloc_low_pages() to allocate from memblock */
+	set_alloc_range(ISA_END_ADDRESS, end);
 #else
 	end = max_low_pfn << PAGE_SHIFT;
 #endif
@@ -673,32 +703,14 @@ void __init init_mem_mapping(void)
 	/* Init the trampoline, possibly with KASLR memory offset */
 	init_trampoline();
 
-	/*
-	 * If the allocation is in bottom-up direction, we setup direct mapping
-	 * in bottom-up, otherwise we setup direct mapping in top-down.
-	 */
-	if (memblock_bottom_up()) {
-		unsigned long kernel_end = __pa_symbol(_end);
-
-		/*
-		 * we need two separate calls here. This is because we want to
-		 * allocate page tables above the kernel. So we first map
-		 * [kernel_end, end) to make memory above the kernel be mapped
-		 * as soon as possible. And then use page tables allocated above
-		 * the kernel to map [ISA_END_ADDRESS, kernel_end).
-		 */
-		memory_map_bottom_up(kernel_end, end);
-		memory_map_bottom_up(ISA_END_ADDRESS, kernel_end);
-	} else {
-		memory_map_top_down(ISA_END_ADDRESS, end);
-	}
-
 #ifdef CONFIG_X86_64
+	init_range_memory_mapping(ISA_END_ADDRESS, end);
 	if (max_pfn > max_low_pfn) {
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}
 #else
+	init_range_memory_mapping32(ISA_END_ADDRESS, end);
 	early_ioremap_page_table_range_init();
 #endif
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2 7/7] x86/mm: isolate the bottom-up style to init_32.c
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
                   ` (5 preceding siblings ...)
  2019-01-11  5:12 ` [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64 Pingfan Liu
@ 2019-01-11  5:12 ` Pingfan Liu
  2019-01-14 23:02 ` [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Dave Hansen
  7 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

bottom-up style is useless in x86_64 any longer, isolate it. Later, it may
be removed completely from x86.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/x86/mm/init.c        | 153 +---------------------------------------------
 arch/x86/mm/init_32.c     | 147 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/mm_internal.h |   8 ++-
 3 files changed, 155 insertions(+), 153 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 003ad77..6a853e4 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -502,7 +502,7 @@ unsigned long __ref init_memory_mapping(unsigned long start,
  * That range would have hole in the middle or ends, and only ram parts
  * will be mapped in init_range_memory_mapping().
  */
-static unsigned long __init init_range_memory_mapping(
+unsigned long __init init_range_memory_mapping(
 					   unsigned long r_start,
 					   unsigned long r_end)
 {
@@ -530,157 +530,6 @@ static unsigned long __init init_range_memory_mapping(
 	return mapped_ram_size;
 }
 
-#ifdef CONFIG_X86_32
-
-static unsigned long min_pfn_mapped;
-
-static unsigned long __init get_new_step_size(unsigned long step_size)
-{
-	/*
-	 * Initial mapped size is PMD_SIZE (2M).
-	 * We can not set step_size to be PUD_SIZE (1G) yet.
-	 * In worse case, when we cross the 1G boundary, and
-	 * PG_LEVEL_2M is not set, we will need 1+1+512 pages (2M + 8k)
-	 * to map 1G range with PTE. Hence we use one less than the
-	 * difference of page table level shifts.
-	 *
-	 * Don't need to worry about overflow in the top-down case, on 32bit,
-	 * when step_size is 0, round_down() returns 0 for start, and that
-	 * turns it into 0x100000000ULL.
-	 * In the bottom-up case, round_up(x, 0) returns 0 though too, which
-	 * needs to be taken into consideration by the code below.
-	 */
-	return step_size << (PMD_SHIFT - PAGE_SHIFT - 1);
-}
-
-/**
- * memory_map_top_down - Map [map_start, map_end) top down
- * @map_start: start address of the target memory range
- * @map_end: end address of the target memory range
- *
- * This function will setup direct mapping for memory range
- * [map_start, map_end) in top-down. That said, the page tables
- * will be allocated at the end of the memory, and we map the
- * memory in top-down.
- */
-static void __init memory_map_top_down(unsigned long map_start,
-				       unsigned long map_end)
-{
-	unsigned long real_end, start, last_start;
-	unsigned long step_size;
-	unsigned long addr;
-	unsigned long mapped_ram_size = 0;
-
-	/* xen has big range in reserved near end of ram, skip it at first.*/
-	addr = memblock_find_in_range(map_start, map_end, PMD_SIZE, PMD_SIZE);
-	real_end = addr + PMD_SIZE;
-
-	/* step_size need to be small so pgt_buf from BRK could cover it */
-	step_size = PMD_SIZE;
-	max_pfn_mapped = 0; /* will get exact value next */
-	min_pfn_mapped = real_end >> PAGE_SHIFT;
-	last_start = start = real_end;
-
-	/*
-	 * We start from the top (end of memory) and go to the bottom.
-	 * The memblock_find_in_range() gets us a block of RAM from the
-	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
-	 * for page table.
-	 */
-	while (last_start > map_start) {
-		if (last_start > step_size) {
-			start = round_down(last_start - 1, step_size);
-			if (start < map_start)
-				start = map_start;
-		} else
-			start = map_start;
-		mapped_ram_size += init_range_memory_mapping(start,
-							last_start);
-		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
-		last_start = start;
-		min_pfn_mapped = last_start >> PAGE_SHIFT;
-		if (mapped_ram_size >= step_size)
-			step_size = get_new_step_size(step_size);
-	}
-
-	if (real_end < map_end) {
-		init_range_memory_mapping(real_end, map_end);
-		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
-	}
-}
-
-/**
- * memory_map_bottom_up - Map [map_start, map_end) bottom up
- * @map_start: start address of the target memory range
- * @map_end: end address of the target memory range
- *
- * This function will setup direct mapping for memory range
- * [map_start, map_end) in bottom-up. Since we have limited the
- * bottom-up allocation above the kernel, the page tables will
- * be allocated just above the kernel and we map the memory
- * in [map_start, map_end) in bottom-up.
- */
-static void __init memory_map_bottom_up(unsigned long map_start,
-					unsigned long map_end)
-{
-	unsigned long next, start;
-	unsigned long mapped_ram_size = 0;
-	/* step_size need to be small so pgt_buf from BRK could cover it */
-	unsigned long step_size = PMD_SIZE;
-
-	start = map_start;
-	min_pfn_mapped = start >> PAGE_SHIFT;
-
-	/*
-	 * We start from the bottom (@map_start) and go to the top (@map_end).
-	 * The memblock_find_in_range() gets us a block of RAM from the
-	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
-	 * for page table.
-	 */
-	while (start < map_end) {
-		if (step_size && map_end - start > step_size) {
-			next = round_up(start + 1, step_size);
-			if (next > map_end)
-				next = map_end;
-		} else {
-			next = map_end;
-		}
-
-		mapped_ram_size += init_range_memory_mapping(start, next);
-		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
-		start = next;
-
-		if (mapped_ram_size >= step_size)
-			step_size = get_new_step_size(step_size);
-	}
-}
-
-static unsigned long __init init_range_memory_mapping32(
-	unsigned long r_start, unsigned long r_end)
-{
-	/*
-	 * If the allocation is in bottom-up direction, we setup direct mapping
-	 * in bottom-up, otherwise we setup direct mapping in top-down.
-	 */
-	if (memblock_bottom_up()) {
-		unsigned long kernel_end = __pa_symbol(_end);
-
-		/*
-		 * we need two separate calls here. This is because we want to
-		 * allocate page tables above the kernel. So we first map
-		 * [kernel_end, end) to make memory above the kernel be mapped
-		 * as soon as possible. And then use page tables allocated above
-		 * the kernel to map [ISA_END_ADDRESS, kernel_end).
-		 */
-		memory_map_bottom_up(kernel_end, r_end);
-		memory_map_bottom_up(r_start, kernel_end);
-	} else {
-		memory_map_top_down(r_start, r_end);
-	}
-}
-
-#endif
-
 void __init init_mem_mapping(void)
 {
 	unsigned long end;
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 49ecf5e..f802678 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -550,6 +550,153 @@ void __init early_ioremap_page_table_range_init(void)
 	early_ioremap_reset();
 }
 
+static unsigned long min_pfn_mapped;
+
+static unsigned long __init get_new_step_size(unsigned long step_size)
+{
+	/*
+	 * Initial mapped size is PMD_SIZE (2M).
+	 * We can not set step_size to be PUD_SIZE (1G) yet.
+	 * In worse case, when we cross the 1G boundary, and
+	 * PG_LEVEL_2M is not set, we will need 1+1+512 pages (2M + 8k)
+	 * to map 1G range with PTE. Hence we use one less than the
+	 * difference of page table level shifts.
+	 *
+	 * Don't need to worry about overflow in the top-down case, on 32bit,
+	 * when step_size is 0, round_down() returns 0 for start, and that
+	 * turns it into 0x100000000ULL.
+	 * In the bottom-up case, round_up(x, 0) returns 0 though too, which
+	 * needs to be taken into consideration by the code below.
+	 */
+	return step_size << (PMD_SHIFT - PAGE_SHIFT - 1);
+}
+
+/**
+ * memory_map_top_down - Map [map_start, map_end) top down
+ * @map_start: start address of the target memory range
+ * @map_end: end address of the target memory range
+ *
+ * This function will setup direct mapping for memory range
+ * [map_start, map_end) in top-down. That said, the page tables
+ * will be allocated at the end of the memory, and we map the
+ * memory in top-down.
+ */
+static void __init memory_map_top_down(unsigned long map_start,
+				       unsigned long map_end)
+{
+	unsigned long real_end, start, last_start;
+	unsigned long step_size;
+	unsigned long addr;
+	unsigned long mapped_ram_size = 0;
+
+	/* xen has big range in reserved near end of ram, skip it at first.*/
+	addr = memblock_find_in_range(map_start, map_end, PMD_SIZE, PMD_SIZE);
+	real_end = addr + PMD_SIZE;
+
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	step_size = PMD_SIZE;
+	max_pfn_mapped = 0; /* will get exact value next */
+	min_pfn_mapped = real_end >> PAGE_SHIFT;
+	last_start = start = real_end;
+
+	/*
+	 * We start from the top (end of memory) and go to the bottom.
+	 * The memblock_find_in_range() gets us a block of RAM from the
+	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
+	 * for page table.
+	 */
+	while (last_start > map_start) {
+		if (last_start > step_size) {
+			start = round_down(last_start - 1, step_size);
+			if (start < map_start)
+				start = map_start;
+		} else
+			start = map_start;
+		mapped_ram_size += init_range_memory_mapping(start,
+							last_start);
+		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
+		last_start = start;
+		min_pfn_mapped = last_start >> PAGE_SHIFT;
+		if (mapped_ram_size >= step_size)
+			step_size = get_new_step_size(step_size);
+	}
+
+	if (real_end < map_end) {
+		init_range_memory_mapping(real_end, map_end);
+		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
+	}
+}
+
+/**
+ * memory_map_bottom_up - Map [map_start, map_end) bottom up
+ * @map_start: start address of the target memory range
+ * @map_end: end address of the target memory range
+ *
+ * This function will setup direct mapping for memory range
+ * [map_start, map_end) in bottom-up. Since we have limited the
+ * bottom-up allocation above the kernel, the page tables will
+ * be allocated just above the kernel and we map the memory
+ * in [map_start, map_end) in bottom-up.
+ */
+static void __init memory_map_bottom_up(unsigned long map_start,
+					unsigned long map_end)
+{
+	unsigned long next, start;
+	unsigned long mapped_ram_size = 0;
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	unsigned long step_size = PMD_SIZE;
+
+	start = map_start;
+	min_pfn_mapped = start >> PAGE_SHIFT;
+
+	/*
+	 * We start from the bottom (@map_start) and go to the top (@map_end).
+	 * The memblock_find_in_range() gets us a block of RAM from the
+	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
+	 * for page table.
+	 */
+	while (start < map_end) {
+		if (step_size && map_end - start > step_size) {
+			next = round_up(start + 1, step_size);
+			if (next > map_end)
+				next = map_end;
+		} else {
+			next = map_end;
+		}
+
+		mapped_ram_size += init_range_memory_mapping(start, next);
+		set_alloc_range(min_pfn_mapped, max_pfn_mapped);
+		start = next;
+
+		if (mapped_ram_size >= step_size)
+			step_size = get_new_step_size(step_size);
+	}
+}
+
+void __init init_range_memory_mapping32(
+	unsigned long r_start, unsigned long r_end)
+{
+	/*
+	 * If the allocation is in bottom-up direction, we setup direct mapping
+	 * in bottom-up, otherwise we setup direct mapping in top-down.
+	 */
+	if (memblock_bottom_up()) {
+		unsigned long kernel_end = __pa_symbol(_end);
+
+		/*
+		 * we need two separate calls here. This is because we want to
+		 * allocate page tables above the kernel. So we first map
+		 * [kernel_end, end) to make memory above the kernel be mapped
+		 * as soon as possible. And then use page tables allocated above
+		 * the kernel to map [ISA_END_ADDRESS, kernel_end).
+		 */
+		memory_map_bottom_up(kernel_end, r_end);
+		memory_map_bottom_up(r_start, kernel_end);
+	} else {
+		memory_map_top_down(r_start, r_end);
+	}
+}
+
 static void __init pagetable_init(void)
 {
 	pgd_t *pgd_base = swapper_pg_dir;
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
index 4e1f6e1..5ab133c 100644
--- a/arch/x86/mm/mm_internal.h
+++ b/arch/x86/mm/mm_internal.h
@@ -9,7 +9,13 @@ static inline void *alloc_low_page(void)
 }
 
 void early_ioremap_page_table_range_init(void);
-
+void init_range_memory_mapping32(
+					unsigned long r_start,
+					unsigned long r_end);
+void set_alloc_range(unsigned long low, unsigned long high);
+unsigned long __init init_range_memory_mapping(
+					unsigned long r_start,
+					unsigned long r_end);
 unsigned long kernel_physical_mapping_init(unsigned long start,
 					     unsigned long end,
 					     unsigned long page_size_mask);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade()
  2019-01-11  5:12 ` [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade() Pingfan Liu
@ 2019-01-11  5:30   ` Chao Fan
  2019-01-11 10:08     ` Pingfan Liu
  2019-01-14 23:12   ` Dave Hansen
  1 sibling, 1 reply; 23+ messages in thread
From: Chao Fan @ 2019-01-11  5:30 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On Fri, Jan 11, 2019 at 01:12:52PM +0800, Pingfan Liu wrote:
>The current acpi_table_upgrade() relies on initrd_start, but this var is
>only valid after relocate_initrd(). There is requirement to extract the
>acpi info from initrd before memblock-allocator can work(see [2/4]), hence
>acpi_table_upgrade() need to accept the input param directly.
>
>Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
>Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>Cc: Len Brown <lenb@kernel.org>
>Cc: Yinghai Lu <yinghai@kernel.org>
>Cc: Tejun Heo <tj@kernel.org>
>Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
>Cc: Baoquan He <bhe@redhat.com>
>Cc: Juergen Gross <jgross@suse.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>Cc: Vlastimil Babka <vbabka@suse.cz>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: x86@kernel.org
>Cc: linux-acpi@vger.kernel.org
>Cc: linux-mm@kvack.org
>---
> arch/arm64/kernel/setup.c | 2 +-
> arch/x86/kernel/setup.c   | 2 +-
> drivers/acpi/tables.c     | 4 +---
> include/linux/acpi.h      | 4 ++--
> 4 files changed, 5 insertions(+), 7 deletions(-)
>
>diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>index f4fc1e0..bc4b47d 100644
>--- a/arch/arm64/kernel/setup.c
>+++ b/arch/arm64/kernel/setup.c
>@@ -315,7 +315,7 @@ void __init setup_arch(char **cmdline_p)
> 	paging_init();
> 	efi_apply_persistent_mem_reservations();
> 
>-	acpi_table_upgrade();
>+	acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
> 
> 	/* Parse the ACPI tables for possible boot-time configuration */
> 	acpi_boot_table_init();
>diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>index ac432ae..dc8fc5d 100644
>--- a/arch/x86/kernel/setup.c
>+++ b/arch/x86/kernel/setup.c
>@@ -1172,8 +1172,8 @@ void __init setup_arch(char **cmdline_p)
> 
> 	reserve_initrd();
> 
>-	acpi_table_upgrade();
> 
I wonder whether this will cause two blank lines together.

Thanks,
Chao Fan

>+	acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
> 	vsmp_init();
> 
> 	io_delay_init();
>diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>index 61203ee..84e0a79 100644
>--- a/drivers/acpi/tables.c
>+++ b/drivers/acpi/tables.c
>@@ -471,10 +471,8 @@ static DECLARE_BITMAP(acpi_initrd_installed, NR_ACPI_INITRD_TABLES);
> 
> #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
> 
>-void __init acpi_table_upgrade(void)
>+void __init acpi_table_upgrade(void *data, size_t size)
> {
>-	void *data = (void *)initrd_start;
>-	size_t size = initrd_end - initrd_start;
> 	int sig, no, table_nr = 0, total_offset = 0;
> 	long offset = 0;
> 	struct acpi_table_header *table;
>diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>index ed80f14..0b6e0b6 100644
>--- a/include/linux/acpi.h
>+++ b/include/linux/acpi.h
>@@ -1254,9 +1254,9 @@ acpi_graph_get_remote_endpoint(const struct fwnode_handle *fwnode,
> #endif
> 
> #ifdef CONFIG_ACPI_TABLE_UPGRADE
>-void acpi_table_upgrade(void);
>+void acpi_table_upgrade(void *data, size_t size);
> #else
>-static inline void acpi_table_upgrade(void) { }
>+static inline void acpi_table_upgrade(void *data, size_t size) { }
> #endif
> 
> #if defined(CONFIG_ACPI) && defined(CONFIG_ACPI_WATCHDOG)
>-- 
>2.7.4
>
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled
  2019-01-11  5:12 ` [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled Pingfan Liu
@ 2019-01-11  6:12   ` Chao Fan
  2019-01-11 10:06     ` Pingfan Liu
       [not found]   ` <96233c0c-940d-8d7c-b3be-d8863c026996@intel.com>
  1 sibling, 1 reply; 23+ messages in thread
From: Chao Fan @ 2019-01-11  6:12 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On Fri, Jan 11, 2019 at 01:12:51PM +0800, Pingfan Liu wrote:
>This patch identifies the point where memblock alloc start. It has no
>functional.
[...]
>+#ifdef CONFIG_MEMORY_HOTPLUG
>+	/*
>+	 * Memory used by the kernel cannot be hot-removed because Linux
>+	 * cannot migrate the kernel pages. When memory hotplug is
>+	 * enabled, we should prevent memblock from allocating memory
>+	 * for the kernel.
>+	 *
>+	 * ACPI SRAT records all hotpluggable memory ranges. But before
>+	 * SRAT is parsed, we don't know about it.
>+	 *
>+	 * The kernel image is loaded into memory at very early time. We
>+	 * cannot prevent this anyway. So on NUMA system, we set any
>+	 * node the kernel resides in as un-hotpluggable.
>+	 *
>+	 * Since on modern servers, one node could have double-digit
>+	 * gigabytes memory, we can assume the memory around the kernel
>+	 * image is also un-hotpluggable. So before SRAT is parsed, just
>+	 * allocate memory near the kernel image to try the best to keep
>+	 * the kernel away from hotpluggable memory.
>+	 */
>+	if (movable_node_is_enabled())
>+		memblock_set_bottom_up(true);

Hi Pingfan,

In my understanding, 'movable_node' is based on the that memory near
kernel is considered as in the same node as kernel in high possibility.

If SRAT has been parsed early, do we still need the kernel parameter
'movable_node'? Since you have got the memory information about hot-remove,
so I wonder if it's OK to drop 'movable_node', and if memory-hotremove is
enabled, change memblock allocation according to SRAT.

If there is something wrong in my understanding, please let me know.

Thanks,
Chao Fan

>+#endif
> 	init_mem_mapping();
>+	memblock_set_current_limit(get_max_mapped());
> 
> 	idt_setup_early_pf();
> 
>@@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p)
> 	 */
> 	mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE;
> 
>-	memblock_set_current_limit(get_max_mapped());
>-
> 	/*
> 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
> 	 */
>-- 
>2.7.4
>
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled
  2019-01-11  6:12   ` Chao Fan
@ 2019-01-11 10:06     ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11 10:06 UTC (permalink / raw)
  To: Chao Fan
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On Fri, Jan 11, 2019 at 2:13 PM Chao Fan <fanc.fnst@cn.fujitsu.com> wrote:
>
> On Fri, Jan 11, 2019 at 01:12:51PM +0800, Pingfan Liu wrote:
> >This patch identifies the point where memblock alloc start. It has no
> >functional.
> [...]
> >+#ifdef CONFIG_MEMORY_HOTPLUG
> >+      /*
> >+       * Memory used by the kernel cannot be hot-removed because Linux
> >+       * cannot migrate the kernel pages. When memory hotplug is
> >+       * enabled, we should prevent memblock from allocating memory
> >+       * for the kernel.
> >+       *
> >+       * ACPI SRAT records all hotpluggable memory ranges. But before
> >+       * SRAT is parsed, we don't know about it.
> >+       *
> >+       * The kernel image is loaded into memory at very early time. We
> >+       * cannot prevent this anyway. So on NUMA system, we set any
> >+       * node the kernel resides in as un-hotpluggable.
> >+       *
> >+       * Since on modern servers, one node could have double-digit
> >+       * gigabytes memory, we can assume the memory around the kernel
> >+       * image is also un-hotpluggable. So before SRAT is parsed, just
> >+       * allocate memory near the kernel image to try the best to keep
> >+       * the kernel away from hotpluggable memory.
> >+       */
> >+      if (movable_node_is_enabled())
> >+              memblock_set_bottom_up(true);
>
> Hi Pingfan,
>
> In my understanding, 'movable_node' is based on the that memory near
> kernel is considered as in the same node as kernel in high possibility.
>
> If SRAT has been parsed early, do we still need the kernel parameter
> 'movable_node'? Since you have got the memory information about hot-remove,
> so I wonder if it's OK to drop 'movable_node', and if memory-hotremove is
> enabled, change memblock allocation according to SRAT.
>
x86_32 still need this logic. Maybe it can be doable later.

Thanks,
Pingfan
> If there is something wrong in my understanding, please let me know.
>
> Thanks,
> Chao Fan
>
> >+#endif
> >       init_mem_mapping();
> >+      memblock_set_current_limit(get_max_mapped());
> >
> >       idt_setup_early_pf();
> >
> >@@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p)
> >        */
> >       mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE;
> >
> >-      memblock_set_current_limit(get_max_mapped());
> >-
> >       /*
> >        * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
> >        */
> >--
> >2.7.4
> >
> >
> >
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade()
  2019-01-11  5:30   ` Chao Fan
@ 2019-01-11 10:08     ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-11 10:08 UTC (permalink / raw)
  To: Chao Fan
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On Fri, Jan 11, 2019 at 1:31 PM Chao Fan <fanc.fnst@cn.fujitsu.com> wrote:
>
> On Fri, Jan 11, 2019 at 01:12:52PM +0800, Pingfan Liu wrote:
> >The current acpi_table_upgrade() relies on initrd_start, but this var is
> >only valid after relocate_initrd(). There is requirement to extract the
> >acpi info from initrd before memblock-allocator can work(see [2/4]), hence
> >acpi_table_upgrade() need to accept the input param directly.
> >
> >Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> >Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> >Cc: Thomas Gleixner <tglx@linutronix.de>
> >Cc: Ingo Molnar <mingo@redhat.com>
> >Cc: Borislav Petkov <bp@alien8.de>
> >Cc: "H. Peter Anvin" <hpa@zytor.com>
> >Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >Cc: Andy Lutomirski <luto@kernel.org>
> >Cc: Peter Zijlstra <peterz@infradead.org>
> >Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> >Cc: Len Brown <lenb@kernel.org>
> >Cc: Yinghai Lu <yinghai@kernel.org>
> >Cc: Tejun Heo <tj@kernel.org>
> >Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
> >Cc: Baoquan He <bhe@redhat.com>
> >Cc: Juergen Gross <jgross@suse.com>
> >Cc: Andrew Morton <akpm@linux-foundation.org>
> >Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> >Cc: Vlastimil Babka <vbabka@suse.cz>
> >Cc: Michal Hocko <mhocko@suse.com>
> >Cc: x86@kernel.org
> >Cc: linux-acpi@vger.kernel.org
> >Cc: linux-mm@kvack.org
> >---
> > arch/arm64/kernel/setup.c | 2 +-
> > arch/x86/kernel/setup.c   | 2 +-
> > drivers/acpi/tables.c     | 4 +---
> > include/linux/acpi.h      | 4 ++--
> > 4 files changed, 5 insertions(+), 7 deletions(-)
> >
> >diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >index f4fc1e0..bc4b47d 100644
> >--- a/arch/arm64/kernel/setup.c
> >+++ b/arch/arm64/kernel/setup.c
> >@@ -315,7 +315,7 @@ void __init setup_arch(char **cmdline_p)
> >       paging_init();
> >       efi_apply_persistent_mem_reservations();
> >
> >-      acpi_table_upgrade();
> >+      acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
> >
> >       /* Parse the ACPI tables for possible boot-time configuration */
> >       acpi_boot_table_init();
> >diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >index ac432ae..dc8fc5d 100644
> >--- a/arch/x86/kernel/setup.c
> >+++ b/arch/x86/kernel/setup.c
> >@@ -1172,8 +1172,8 @@ void __init setup_arch(char **cmdline_p)
> >
> >       reserve_initrd();
> >
> >-      acpi_table_upgrade();
> >
> I wonder whether this will cause two blank lines together.
>
Yes, will fix it in next version.

Thanks,
Pingfan
> Thanks,
> Chao Fan
>
> >+      acpi_table_upgrade((void *)initrd_start, initrd_end - initrd_start);
> >       vsmp_init();
> >
> >       io_delay_init();
> >diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> >index 61203ee..84e0a79 100644
> >--- a/drivers/acpi/tables.c
> >+++ b/drivers/acpi/tables.c
> >@@ -471,10 +471,8 @@ static DECLARE_BITMAP(acpi_initrd_installed, NR_ACPI_INITRD_TABLES);
> >
> > #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
> >
> >-void __init acpi_table_upgrade(void)
> >+void __init acpi_table_upgrade(void *data, size_t size)
> > {
> >-      void *data = (void *)initrd_start;
> >-      size_t size = initrd_end - initrd_start;
> >       int sig, no, table_nr = 0, total_offset = 0;
> >       long offset = 0;
> >       struct acpi_table_header *table;
> >diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> >index ed80f14..0b6e0b6 100644
> >--- a/include/linux/acpi.h
> >+++ b/include/linux/acpi.h
> >@@ -1254,9 +1254,9 @@ acpi_graph_get_remote_endpoint(const struct fwnode_handle *fwnode,
> > #endif
> >
> > #ifdef CONFIG_ACPI_TABLE_UPGRADE
> >-void acpi_table_upgrade(void);
> >+void acpi_table_upgrade(void *data, size_t size);
> > #else
> >-static inline void acpi_table_upgrade(void) { }
> >+static inline void acpi_table_upgrade(void *data, size_t size) { }
> > #endif
> >
> > #if defined(CONFIG_ACPI) && defined(CONFIG_ACPI_WATCHDOG)
> >--
> >2.7.4
> >
> >
> >
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose
  2019-01-11  5:12 ` [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose Pingfan Liu
@ 2019-01-14  7:51   ` Mike Rapoport
  2019-01-14  8:33     ` Pingfan Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Rapoport @ 2019-01-14  7:51 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

Hi Pingfan,

On Fri, Jan 11, 2019 at 01:12:53PM +0800, Pingfan Liu wrote:
> During boot time, there is requirement to tell whether a series of func
> call will consume memory or not. For some reason, a temporary memory
> resource can be loan to those func through memblock allocator, but at a
> check point, all of the loan memory should be turned back.
> A typical using style:
>  -1. find a usable range by memblock_find_in_range(), said, [A,B]
>  -2. before calling a series of func, memblock_set_current_limit(A,B,true)
>  -3. call funcs
>  -4. memblock_find_in_range(A,B,B-A,1), if failed, then some memory is not
>      turned back.
>  -5. reset the original limit
> 
> E.g. in the case of hotmovable memory, some acpi routines should be called,
> and they are not allowed to own some movable memory. Although at present
> these functions do not consume memory, but later, if changed without
> awareness, they may do. With the above method, the allocation can be
> detected, and pr_warn() to ask people to resolve it.

To ensure there were that a sequence of function calls didn't create new
memblock allocations you can simply check the number of the reserved
regions before and after that sequence.

Still, I'm not sure it would be practical to try tracking what code that's called
from x86::setup_arch() did memory allocation.
Probably a better approach is to verify no memory ended up in the movable
areas after their extents are known.
 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: x86@kernel.org
> Cc: linux-acpi@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
>  arch/arm/mm/init.c              |  3 ++-
>  arch/arm/mm/mmu.c               |  4 ++--
>  arch/arm/mm/nommu.c             |  2 +-
>  arch/csky/kernel/setup.c        |  2 +-
>  arch/microblaze/mm/init.c       |  2 +-
>  arch/mips/kernel/setup.c        |  2 +-
>  arch/powerpc/mm/40x_mmu.c       |  6 ++++--
>  arch/powerpc/mm/44x_mmu.c       |  2 +-
>  arch/powerpc/mm/8xx_mmu.c       |  2 +-
>  arch/powerpc/mm/fsl_booke_mmu.c |  5 +++--
>  arch/powerpc/mm/hash_utils_64.c |  4 ++--
>  arch/powerpc/mm/init_32.c       |  2 +-
>  arch/powerpc/mm/pgtable-radix.c |  2 +-
>  arch/powerpc/mm/ppc_mmu_32.c    |  8 ++++++--
>  arch/powerpc/mm/tlb_nohash.c    |  6 ++++--
>  arch/unicore32/mm/mmu.c         |  2 +-
>  arch/x86/kernel/setup.c         |  2 +-
>  arch/xtensa/mm/init.c           |  2 +-
>  include/linux/memblock.h        | 10 +++++++---
>  mm/memblock.c                   | 23 ++++++++++++++++++-----
>  20 files changed, 59 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 32e4845..58a4342 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -93,7 +93,8 @@ __tagtable(ATAG_INITRD2, parse_tag_initrd2);
>  static void __init find_limits(unsigned long *min, unsigned long *max_low,
>  			       unsigned long *max_high)
>  {
> -	*max_low = PFN_DOWN(memblock_get_current_limit());
> +	memblock_get_current_limit(NULL, max_low);
> +	*max_low = PFN_DOWN(*max_low);
>  	*min = PFN_UP(memblock_start_of_DRAM());
>  	*max_high = PFN_DOWN(memblock_end_of_DRAM());
>  }
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index f5cc1cc..9025418 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1240,7 +1240,7 @@ void __init adjust_lowmem_bounds(void)
>  		}
>  	}
> 
> -	memblock_set_current_limit(memblock_limit);
> +	memblock_set_current_limit(0, memblock_limit, false);
>  }
> 
>  static inline void prepare_page_table(void)
> @@ -1625,7 +1625,7 @@ void __init paging_init(const struct machine_desc *mdesc)
> 
>  	prepare_page_table();
>  	map_lowmem();
> -	memblock_set_current_limit(arm_lowmem_limit);
> +	memblock_set_current_limit(0, arm_lowmem_limit, false);
>  	dma_contiguous_remap();
>  	early_fixmap_shutdown();
>  	devicemaps_init(mdesc);
> diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
> index 7d67c70..721535c 100644
> --- a/arch/arm/mm/nommu.c
> +++ b/arch/arm/mm/nommu.c
> @@ -138,7 +138,7 @@ void __init adjust_lowmem_bounds(void)
>  	adjust_lowmem_bounds_mpu();
>  	end = memblock_end_of_DRAM();
>  	high_memory = __va(end - 1) + 1;
> -	memblock_set_current_limit(end);
> +	memblock_set_current_limit(0, end, false);
>  }
> 
>  /*
> diff --git a/arch/csky/kernel/setup.c b/arch/csky/kernel/setup.c
> index dff8b89..e6f88bf 100644
> --- a/arch/csky/kernel/setup.c
> +++ b/arch/csky/kernel/setup.c
> @@ -100,7 +100,7 @@ static void __init csky_memblock_init(void)
> 
>  	highend_pfn = max_pfn;
>  #endif
> -	memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> +	memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> 
>  	dma_contiguous_reserve(0);
> 
> diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
> index b17fd8a..cee99da 100644
> --- a/arch/microblaze/mm/init.c
> +++ b/arch/microblaze/mm/init.c
> @@ -353,7 +353,7 @@ asmlinkage void __init mmu_init(void)
>  	/* Shortly after that, the entire linear mapping will be available */
>  	/* This will also cause that unflatten device tree will be allocated
>  	 * inside 768MB limit */
> -	memblock_set_current_limit(memory_start + lowmem_size - 1);
> +	memblock_set_current_limit(0, memory_start + lowmem_size - 1, false);
>  }
> 
>  /* This is only called until mem_init is done. */
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 8c6c48ed..62dabe1 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -862,7 +862,7 @@ static void __init arch_mem_init(char **cmdline_p)
>  	 * with memblock_reserve; memblock_alloc* can be used
>  	 * only after this point
>  	 */
> -	memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> +	memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> 
>  #ifdef CONFIG_PROC_VMCORE
>  	if (setup_elfcorehdr && setup_elfcorehdr_size) {
> diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
> index 61ac468..427bb56 100644
> --- a/arch/powerpc/mm/40x_mmu.c
> +++ b/arch/powerpc/mm/40x_mmu.c
> @@ -141,7 +141,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
>  	 * coverage with normal-sized pages (or other reasons) do not
>  	 * attempt to allocate outside the allowed range.
>  	 */
> -	memblock_set_current_limit(mapped);
> +	memblock_set_current_limit(0, mapped, false);
> 
>  	return mapped;
>  }
> @@ -155,5 +155,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  	BUG_ON(first_memblock_base != 0);
> 
>  	/* 40x can only access 16MB at the moment (see head_40x.S) */
> -	memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
> +	memblock_set_current_limit(0,
> +		min_t(u64, first_memblock_size, 0x00800000),
> +		false);
>  }
> diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> index 12d9251..3cf127d 100644
> --- a/arch/powerpc/mm/44x_mmu.c
> +++ b/arch/powerpc/mm/44x_mmu.c
> @@ -225,7 +225,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> 
>  	/* 44x has a 256M TLB entry pinned at boot */
>  	size = (min_t(u64, first_memblock_size, PPC_PIN_SIZE));
> -	memblock_set_current_limit(first_memblock_base + size);
> +	memblock_set_current_limit(0, first_memblock_base + size, false);
>  }
> 
>  #ifdef CONFIG_SMP
> diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
> index 01b7f51..c75bca6 100644
> --- a/arch/powerpc/mm/8xx_mmu.c
> +++ b/arch/powerpc/mm/8xx_mmu.c
> @@ -135,7 +135,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
>  	 * attempt to allocate outside the allowed range.
>  	 */
>  	if (mapped)
> -		memblock_set_current_limit(mapped);
> +		memblock_set_current_limit(0, mapped, false);
> 
>  	block_mapped_ram = mapped;
> 
> diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
> index 080d49b..3be24b8 100644
> --- a/arch/powerpc/mm/fsl_booke_mmu.c
> +++ b/arch/powerpc/mm/fsl_booke_mmu.c
> @@ -252,7 +252,8 @@ void __init adjust_total_lowmem(void)
>  	pr_cont("%lu Mb, residual: %dMb\n", tlbcam_sz(tlbcam_index - 1) >> 20,
>  	        (unsigned int)((total_lowmem - __max_low_memory) >> 20));
> 
> -	memblock_set_current_limit(memstart_addr + __max_low_memory);
> +	memblock_set_current_limit(0,
> +		memstart_addr + __max_low_memory, false);
>  }
> 
>  void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> @@ -261,7 +262,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  	phys_addr_t limit = first_memblock_base + first_memblock_size;
> 
>  	/* 64M mapped initially according to head_fsl_booke.S */
> -	memblock_set_current_limit(min_t(u64, limit, 0x04000000));
> +	memblock_set_current_limit(0, min_t(u64, limit, 0x04000000), false);
>  }
> 
>  #ifdef CONFIG_RELOCATABLE
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0cc7fbc..30fba80 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -925,7 +925,7 @@ static void __init htab_initialize(void)
>  		BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
>  				prot, mmu_linear_psize, mmu_kernel_ssize));
>  	}
> -	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> +	memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
> 
>  	/*
>  	 * If we have a memory_limit and we've allocated TCEs then we need to
> @@ -1867,7 +1867,7 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  			ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> 
>  		/* Finally limit subsequent allocations */
> -		memblock_set_current_limit(ppc64_rma_size);
> +		memblock_set_current_limit(0, ppc64_rma_size, false);
>  	} else {
>  		ppc64_rma_size = ULONG_MAX;
>  	}
> diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
> index 3e59e5d..863d710 100644
> --- a/arch/powerpc/mm/init_32.c
> +++ b/arch/powerpc/mm/init_32.c
> @@ -183,5 +183,5 @@ void __init MMU_init(void)
>  #endif
> 
>  	/* Shortly after that, the entire linear mapping will be available */
> -	memblock_set_current_limit(lowmem_end_addr);
> +	memblock_set_current_limit(0, lowmem_end_addr, false);
>  }
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 9311560..8cd5f2d 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -603,7 +603,7 @@ void __init radix__early_init_mmu(void)
>  		radix_init_pseries();
>  	}
> 
> -	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> +	memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
> 
>  	radix_init_iamr();
>  	radix_init_pgtable();
> diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
> index f6f575b..80927ad 100644
> --- a/arch/powerpc/mm/ppc_mmu_32.c
> +++ b/arch/powerpc/mm/ppc_mmu_32.c
> @@ -283,7 +283,11 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> 
>  	/* 601 can only access 16MB at the moment */
>  	if (PVR_VER(mfspr(SPRN_PVR)) == 1)
> -		memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
> +		memblock_set_current_limit(0,
> +			min_t(u64, first_memblock_size, 0x01000000),
> +			false);
>  	else /* Anything else has 256M mapped */
> -		memblock_set_current_limit(min_t(u64, first_memblock_size, 0x10000000));
> +		memblock_set_current_limit(0,
> +			min_t(u64, first_memblock_size, 0x10000000),
> +			false);
>  }
> diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
> index ae5d568..d074362 100644
> --- a/arch/powerpc/mm/tlb_nohash.c
> +++ b/arch/powerpc/mm/tlb_nohash.c
> @@ -735,7 +735,7 @@ static void __init early_mmu_set_memory_limit(void)
>  		 * reduces the memory available to Linux.  We need to
>  		 * do this because highmem is not supported on 64-bit.
>  		 */
> -		memblock_enforce_memory_limit(linear_map_top);
> +		memblock_enforce_memory_limit(0, linear_map_top, false);
>  	}
>  #endif
> 
> @@ -792,7 +792,9 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  		ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> 
>  	/* Finally limit subsequent allocations */
> -	memblock_set_current_limit(first_memblock_base + ppc64_rma_size);
> +	memblock_set_current_limit(0,
> +			first_memblock_base + ppc64_rma_size,
> +			false);
>  }
>  #else /* ! CONFIG_PPC64 */
>  void __init early_init_mmu(void)
> diff --git a/arch/unicore32/mm/mmu.c b/arch/unicore32/mm/mmu.c
> index 040a8c2..6d62529 100644
> --- a/arch/unicore32/mm/mmu.c
> +++ b/arch/unicore32/mm/mmu.c
> @@ -286,7 +286,7 @@ static void __init sanity_check_meminfo(void)
>  	int i, j;
> 
>  	lowmem_limit = __pa(vmalloc_min - 1) + 1;
> -	memblock_set_current_limit(lowmem_limit);
> +	memblock_set_current_limit(0, lowmem_limit, false);
> 
>  	for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
>  		struct membank *bank = &meminfo.bank[j];
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index dc8fc5d..a0122cd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1130,7 +1130,7 @@ void __init setup_arch(char **cmdline_p)
>  		memblock_set_bottom_up(true);
>  #endif
>  	init_mem_mapping();
> -	memblock_set_current_limit(get_max_mapped());
> +	memblock_set_current_limit(0, get_max_mapped(), false);
> 
>  	idt_setup_early_pf();
> 
> diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
> index 30a48bb..b924387 100644
> --- a/arch/xtensa/mm/init.c
> +++ b/arch/xtensa/mm/init.c
> @@ -60,7 +60,7 @@ void __init bootmem_init(void)
>  	max_pfn = PFN_DOWN(memblock_end_of_DRAM());
>  	max_low_pfn = min(max_pfn, MAX_LOW_PFN);
> 
> -	memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> +	memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
>  	dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
> 
>  	memblock_dump_all();
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index aee299a..49676f0 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -88,6 +88,8 @@ struct memblock_type {
>   */
>  struct memblock {
>  	bool bottom_up;  /* is bottom up direction? */
> +	bool enforce_checking;
> +	phys_addr_t start_limit;
>  	phys_addr_t current_limit;
>  	struct memblock_type memory;
>  	struct memblock_type reserved;
> @@ -482,12 +484,14 @@ static inline void memblock_dump_all(void)
>   * memblock_set_current_limit - Set the current allocation limit to allow
>   *                         limiting allocations to what is currently
>   *                         accessible during boot
> - * @limit: New limit value (physical address)
> + * [start_limit, end_limit]: New limit value (physical address)
> + * enforcing: whether check against the limit boundary or not
>   */
> -void memblock_set_current_limit(phys_addr_t limit);
> +void memblock_set_current_limit(phys_addr_t start_limit,
> +	phys_addr_t end_limit, bool enforcing);
> 
> 
> -phys_addr_t memblock_get_current_limit(void);
> +bool memblock_get_current_limit(phys_addr_t *start, phys_addr_t *end);
> 
>  /*
>   * pfn conversion functions
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 81ae63c..b792be0 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -116,6 +116,8 @@ struct memblock memblock __initdata_memblock = {
>  #endif
> 
>  	.bottom_up		= false,
> +	.enforce_checking	= false,
> +	.start_limit		= 0,
>  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
>  };
> 
> @@ -261,8 +263,11 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>  {
>  	phys_addr_t kernel_end, ret;
> 
> +	if (unlikely(memblock.enforce_checking)) {
> +		start = memblock.start_limit;
> +		end = memblock.current_limit;
>  	/* pump up @end */
> -	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
> +	} else if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
>  		end = memblock.current_limit;
> 
>  	/* avoid allocating the first page */
> @@ -1826,14 +1831,22 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
>  	}
>  }
> 
> -void __init_memblock memblock_set_current_limit(phys_addr_t limit)
> +void __init_memblock memblock_set_current_limit(phys_addr_t start,
> +	phys_addr_t end, bool enforcing)
>  {
> -	memblock.current_limit = limit;
> +	memblock.start_limit = start;
> +	memblock.current_limit = end;
> +	memblock.enforce_checking = enforcing;
>  }
> 
> -phys_addr_t __init_memblock memblock_get_current_limit(void)
> +bool __init_memblock memblock_get_current_limit(phys_addr_t *start,
> +	phys_addr_t *end)
>  {
> -	return memblock.current_limit;
> +	if (start)
> +		*start = memblock.start_limit;
> +	if (end)
> +		*end = memblock.current_limit;
> +	return memblock.enforce_checking;
>  }
> 
>  static void __init_memblock memblock_dump(struct memblock_type *type)
> -- 
> 2.7.4
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose
  2019-01-14  7:51   ` Mike Rapoport
@ 2019-01-14  8:33     ` Pingfan Liu
  2019-01-14  8:50       ` Mike Rapoport
  0 siblings, 1 reply; 23+ messages in thread
From: Pingfan Liu @ 2019-01-14  8:33 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Mon, Jan 14, 2019 at 3:51 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> Hi Pingfan,
>
> On Fri, Jan 11, 2019 at 01:12:53PM +0800, Pingfan Liu wrote:
> > During boot time, there is requirement to tell whether a series of func
> > call will consume memory or not. For some reason, a temporary memory
> > resource can be loan to those func through memblock allocator, but at a
> > check point, all of the loan memory should be turned back.
> > A typical using style:
> >  -1. find a usable range by memblock_find_in_range(), said, [A,B]
> >  -2. before calling a series of func, memblock_set_current_limit(A,B,true)
> >  -3. call funcs
> >  -4. memblock_find_in_range(A,B,B-A,1), if failed, then some memory is not
> >      turned back.
> >  -5. reset the original limit
> >
> > E.g. in the case of hotmovable memory, some acpi routines should be called,
> > and they are not allowed to own some movable memory. Although at present
> > these functions do not consume memory, but later, if changed without
> > awareness, they may do. With the above method, the allocation can be
> > detected, and pr_warn() to ask people to resolve it.
>
> To ensure there were that a sequence of function calls didn't create new
> memblock allocations you can simply check the number of the reserved
> regions before and after that sequence.
>
Yes, thank you point out it.

> Still, I'm not sure it would be practical to try tracking what code that's called
> from x86::setup_arch() did memory allocation.
> Probably a better approach is to verify no memory ended up in the movable
> areas after their extents are known.
>
It is a probability problem whether allocated memory sit on hotmovable
memory or not. And if warning based on the verification, then it is
also a probability problem and maybe we will miss it.

Thanks and regards,
Pingfan

> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Yinghai Lu <yinghai@kernel.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: x86@kernel.org
> > Cc: linux-acpi@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > ---
> >  arch/arm/mm/init.c              |  3 ++-
> >  arch/arm/mm/mmu.c               |  4 ++--
> >  arch/arm/mm/nommu.c             |  2 +-
> >  arch/csky/kernel/setup.c        |  2 +-
> >  arch/microblaze/mm/init.c       |  2 +-
> >  arch/mips/kernel/setup.c        |  2 +-
> >  arch/powerpc/mm/40x_mmu.c       |  6 ++++--
> >  arch/powerpc/mm/44x_mmu.c       |  2 +-
> >  arch/powerpc/mm/8xx_mmu.c       |  2 +-
> >  arch/powerpc/mm/fsl_booke_mmu.c |  5 +++--
> >  arch/powerpc/mm/hash_utils_64.c |  4 ++--
> >  arch/powerpc/mm/init_32.c       |  2 +-
> >  arch/powerpc/mm/pgtable-radix.c |  2 +-
> >  arch/powerpc/mm/ppc_mmu_32.c    |  8 ++++++--
> >  arch/powerpc/mm/tlb_nohash.c    |  6 ++++--
> >  arch/unicore32/mm/mmu.c         |  2 +-
> >  arch/x86/kernel/setup.c         |  2 +-
> >  arch/xtensa/mm/init.c           |  2 +-
> >  include/linux/memblock.h        | 10 +++++++---
> >  mm/memblock.c                   | 23 ++++++++++++++++++-----
> >  20 files changed, 59 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index 32e4845..58a4342 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -93,7 +93,8 @@ __tagtable(ATAG_INITRD2, parse_tag_initrd2);
> >  static void __init find_limits(unsigned long *min, unsigned long *max_low,
> >                              unsigned long *max_high)
> >  {
> > -     *max_low = PFN_DOWN(memblock_get_current_limit());
> > +     memblock_get_current_limit(NULL, max_low);
> > +     *max_low = PFN_DOWN(*max_low);
> >       *min = PFN_UP(memblock_start_of_DRAM());
> >       *max_high = PFN_DOWN(memblock_end_of_DRAM());
> >  }
> > diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> > index f5cc1cc..9025418 100644
> > --- a/arch/arm/mm/mmu.c
> > +++ b/arch/arm/mm/mmu.c
> > @@ -1240,7 +1240,7 @@ void __init adjust_lowmem_bounds(void)
> >               }
> >       }
> >
> > -     memblock_set_current_limit(memblock_limit);
> > +     memblock_set_current_limit(0, memblock_limit, false);
> >  }
> >
> >  static inline void prepare_page_table(void)
> > @@ -1625,7 +1625,7 @@ void __init paging_init(const struct machine_desc *mdesc)
> >
> >       prepare_page_table();
> >       map_lowmem();
> > -     memblock_set_current_limit(arm_lowmem_limit);
> > +     memblock_set_current_limit(0, arm_lowmem_limit, false);
> >       dma_contiguous_remap();
> >       early_fixmap_shutdown();
> >       devicemaps_init(mdesc);
> > diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
> > index 7d67c70..721535c 100644
> > --- a/arch/arm/mm/nommu.c
> > +++ b/arch/arm/mm/nommu.c
> > @@ -138,7 +138,7 @@ void __init adjust_lowmem_bounds(void)
> >       adjust_lowmem_bounds_mpu();
> >       end = memblock_end_of_DRAM();
> >       high_memory = __va(end - 1) + 1;
> > -     memblock_set_current_limit(end);
> > +     memblock_set_current_limit(0, end, false);
> >  }
> >
> >  /*
> > diff --git a/arch/csky/kernel/setup.c b/arch/csky/kernel/setup.c
> > index dff8b89..e6f88bf 100644
> > --- a/arch/csky/kernel/setup.c
> > +++ b/arch/csky/kernel/setup.c
> > @@ -100,7 +100,7 @@ static void __init csky_memblock_init(void)
> >
> >       highend_pfn = max_pfn;
> >  #endif
> > -     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > +     memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> >
> >       dma_contiguous_reserve(0);
> >
> > diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
> > index b17fd8a..cee99da 100644
> > --- a/arch/microblaze/mm/init.c
> > +++ b/arch/microblaze/mm/init.c
> > @@ -353,7 +353,7 @@ asmlinkage void __init mmu_init(void)
> >       /* Shortly after that, the entire linear mapping will be available */
> >       /* This will also cause that unflatten device tree will be allocated
> >        * inside 768MB limit */
> > -     memblock_set_current_limit(memory_start + lowmem_size - 1);
> > +     memblock_set_current_limit(0, memory_start + lowmem_size - 1, false);
> >  }
> >
> >  /* This is only called until mem_init is done. */
> > diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> > index 8c6c48ed..62dabe1 100644
> > --- a/arch/mips/kernel/setup.c
> > +++ b/arch/mips/kernel/setup.c
> > @@ -862,7 +862,7 @@ static void __init arch_mem_init(char **cmdline_p)
> >        * with memblock_reserve; memblock_alloc* can be used
> >        * only after this point
> >        */
> > -     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > +     memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> >
> >  #ifdef CONFIG_PROC_VMCORE
> >       if (setup_elfcorehdr && setup_elfcorehdr_size) {
> > diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
> > index 61ac468..427bb56 100644
> > --- a/arch/powerpc/mm/40x_mmu.c
> > +++ b/arch/powerpc/mm/40x_mmu.c
> > @@ -141,7 +141,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
> >        * coverage with normal-sized pages (or other reasons) do not
> >        * attempt to allocate outside the allowed range.
> >        */
> > -     memblock_set_current_limit(mapped);
> > +     memblock_set_current_limit(0, mapped, false);
> >
> >       return mapped;
> >  }
> > @@ -155,5 +155,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >       BUG_ON(first_memblock_base != 0);
> >
> >       /* 40x can only access 16MB at the moment (see head_40x.S) */
> > -     memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
> > +     memblock_set_current_limit(0,
> > +             min_t(u64, first_memblock_size, 0x00800000),
> > +             false);
> >  }
> > diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> > index 12d9251..3cf127d 100644
> > --- a/arch/powerpc/mm/44x_mmu.c
> > +++ b/arch/powerpc/mm/44x_mmu.c
> > @@ -225,7 +225,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >
> >       /* 44x has a 256M TLB entry pinned at boot */
> >       size = (min_t(u64, first_memblock_size, PPC_PIN_SIZE));
> > -     memblock_set_current_limit(first_memblock_base + size);
> > +     memblock_set_current_limit(0, first_memblock_base + size, false);
> >  }
> >
> >  #ifdef CONFIG_SMP
> > diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
> > index 01b7f51..c75bca6 100644
> > --- a/arch/powerpc/mm/8xx_mmu.c
> > +++ b/arch/powerpc/mm/8xx_mmu.c
> > @@ -135,7 +135,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
> >        * attempt to allocate outside the allowed range.
> >        */
> >       if (mapped)
> > -             memblock_set_current_limit(mapped);
> > +             memblock_set_current_limit(0, mapped, false);
> >
> >       block_mapped_ram = mapped;
> >
> > diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
> > index 080d49b..3be24b8 100644
> > --- a/arch/powerpc/mm/fsl_booke_mmu.c
> > +++ b/arch/powerpc/mm/fsl_booke_mmu.c
> > @@ -252,7 +252,8 @@ void __init adjust_total_lowmem(void)
> >       pr_cont("%lu Mb, residual: %dMb\n", tlbcam_sz(tlbcam_index - 1) >> 20,
> >               (unsigned int)((total_lowmem - __max_low_memory) >> 20));
> >
> > -     memblock_set_current_limit(memstart_addr + __max_low_memory);
> > +     memblock_set_current_limit(0,
> > +             memstart_addr + __max_low_memory, false);
> >  }
> >
> >  void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > @@ -261,7 +262,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >       phys_addr_t limit = first_memblock_base + first_memblock_size;
> >
> >       /* 64M mapped initially according to head_fsl_booke.S */
> > -     memblock_set_current_limit(min_t(u64, limit, 0x04000000));
> > +     memblock_set_current_limit(0, min_t(u64, limit, 0x04000000), false);
> >  }
> >
> >  #ifdef CONFIG_RELOCATABLE
> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> > index 0cc7fbc..30fba80 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -925,7 +925,7 @@ static void __init htab_initialize(void)
> >               BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
> >                               prot, mmu_linear_psize, mmu_kernel_ssize));
> >       }
> > -     memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> > +     memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
> >
> >       /*
> >        * If we have a memory_limit and we've allocated TCEs then we need to
> > @@ -1867,7 +1867,7 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >                       ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> >
> >               /* Finally limit subsequent allocations */
> > -             memblock_set_current_limit(ppc64_rma_size);
> > +             memblock_set_current_limit(0, ppc64_rma_size, false);
> >       } else {
> >               ppc64_rma_size = ULONG_MAX;
> >       }
> > diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
> > index 3e59e5d..863d710 100644
> > --- a/arch/powerpc/mm/init_32.c
> > +++ b/arch/powerpc/mm/init_32.c
> > @@ -183,5 +183,5 @@ void __init MMU_init(void)
> >  #endif
> >
> >       /* Shortly after that, the entire linear mapping will be available */
> > -     memblock_set_current_limit(lowmem_end_addr);
> > +     memblock_set_current_limit(0, lowmem_end_addr, false);
> >  }
> > diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> > index 9311560..8cd5f2d 100644
> > --- a/arch/powerpc/mm/pgtable-radix.c
> > +++ b/arch/powerpc/mm/pgtable-radix.c
> > @@ -603,7 +603,7 @@ void __init radix__early_init_mmu(void)
> >               radix_init_pseries();
> >       }
> >
> > -     memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> > +     memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
> >
> >       radix_init_iamr();
> >       radix_init_pgtable();
> > diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
> > index f6f575b..80927ad 100644
> > --- a/arch/powerpc/mm/ppc_mmu_32.c
> > +++ b/arch/powerpc/mm/ppc_mmu_32.c
> > @@ -283,7 +283,11 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >
> >       /* 601 can only access 16MB at the moment */
> >       if (PVR_VER(mfspr(SPRN_PVR)) == 1)
> > -             memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
> > +             memblock_set_current_limit(0,
> > +                     min_t(u64, first_memblock_size, 0x01000000),
> > +                     false);
> >       else /* Anything else has 256M mapped */
> > -             memblock_set_current_limit(min_t(u64, first_memblock_size, 0x10000000));
> > +             memblock_set_current_limit(0,
> > +                     min_t(u64, first_memblock_size, 0x10000000),
> > +                     false);
> >  }
> > diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
> > index ae5d568..d074362 100644
> > --- a/arch/powerpc/mm/tlb_nohash.c
> > +++ b/arch/powerpc/mm/tlb_nohash.c
> > @@ -735,7 +735,7 @@ static void __init early_mmu_set_memory_limit(void)
> >                * reduces the memory available to Linux.  We need to
> >                * do this because highmem is not supported on 64-bit.
> >                */
> > -             memblock_enforce_memory_limit(linear_map_top);
> > +             memblock_enforce_memory_limit(0, linear_map_top, false);
> >       }
> >  #endif
> >
> > @@ -792,7 +792,9 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >               ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> >
> >       /* Finally limit subsequent allocations */
> > -     memblock_set_current_limit(first_memblock_base + ppc64_rma_size);
> > +     memblock_set_current_limit(0,
> > +                     first_memblock_base + ppc64_rma_size,
> > +                     false);
> >  }
> >  #else /* ! CONFIG_PPC64 */
> >  void __init early_init_mmu(void)
> > diff --git a/arch/unicore32/mm/mmu.c b/arch/unicore32/mm/mmu.c
> > index 040a8c2..6d62529 100644
> > --- a/arch/unicore32/mm/mmu.c
> > +++ b/arch/unicore32/mm/mmu.c
> > @@ -286,7 +286,7 @@ static void __init sanity_check_meminfo(void)
> >       int i, j;
> >
> >       lowmem_limit = __pa(vmalloc_min - 1) + 1;
> > -     memblock_set_current_limit(lowmem_limit);
> > +     memblock_set_current_limit(0, lowmem_limit, false);
> >
> >       for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
> >               struct membank *bank = &meminfo.bank[j];
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index dc8fc5d..a0122cd 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -1130,7 +1130,7 @@ void __init setup_arch(char **cmdline_p)
> >               memblock_set_bottom_up(true);
> >  #endif
> >       init_mem_mapping();
> > -     memblock_set_current_limit(get_max_mapped());
> > +     memblock_set_current_limit(0, get_max_mapped(), false);
> >
> >       idt_setup_early_pf();
> >
> > diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
> > index 30a48bb..b924387 100644
> > --- a/arch/xtensa/mm/init.c
> > +++ b/arch/xtensa/mm/init.c
> > @@ -60,7 +60,7 @@ void __init bootmem_init(void)
> >       max_pfn = PFN_DOWN(memblock_end_of_DRAM());
> >       max_low_pfn = min(max_pfn, MAX_LOW_PFN);
> >
> > -     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > +     memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> >       dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
> >
> >       memblock_dump_all();
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index aee299a..49676f0 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -88,6 +88,8 @@ struct memblock_type {
> >   */
> >  struct memblock {
> >       bool bottom_up;  /* is bottom up direction? */
> > +     bool enforce_checking;
> > +     phys_addr_t start_limit;
> >       phys_addr_t current_limit;
> >       struct memblock_type memory;
> >       struct memblock_type reserved;
> > @@ -482,12 +484,14 @@ static inline void memblock_dump_all(void)
> >   * memblock_set_current_limit - Set the current allocation limit to allow
> >   *                         limiting allocations to what is currently
> >   *                         accessible during boot
> > - * @limit: New limit value (physical address)
> > + * [start_limit, end_limit]: New limit value (physical address)
> > + * enforcing: whether check against the limit boundary or not
> >   */
> > -void memblock_set_current_limit(phys_addr_t limit);
> > +void memblock_set_current_limit(phys_addr_t start_limit,
> > +     phys_addr_t end_limit, bool enforcing);
> >
> >
> > -phys_addr_t memblock_get_current_limit(void);
> > +bool memblock_get_current_limit(phys_addr_t *start, phys_addr_t *end);
> >
> >  /*
> >   * pfn conversion functions
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 81ae63c..b792be0 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -116,6 +116,8 @@ struct memblock memblock __initdata_memblock = {
> >  #endif
> >
> >       .bottom_up              = false,
> > +     .enforce_checking       = false,
> > +     .start_limit            = 0,
> >       .current_limit          = MEMBLOCK_ALLOC_ANYWHERE,
> >  };
> >
> > @@ -261,8 +263,11 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> >  {
> >       phys_addr_t kernel_end, ret;
> >
> > +     if (unlikely(memblock.enforce_checking)) {
> > +             start = memblock.start_limit;
> > +             end = memblock.current_limit;
> >       /* pump up @end */
> > -     if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
> > +     } else if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
> >               end = memblock.current_limit;
> >
> >       /* avoid allocating the first page */
> > @@ -1826,14 +1831,22 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
> >       }
> >  }
> >
> > -void __init_memblock memblock_set_current_limit(phys_addr_t limit)
> > +void __init_memblock memblock_set_current_limit(phys_addr_t start,
> > +     phys_addr_t end, bool enforcing)
> >  {
> > -     memblock.current_limit = limit;
> > +     memblock.start_limit = start;
> > +     memblock.current_limit = end;
> > +     memblock.enforce_checking = enforcing;
> >  }
> >
> > -phys_addr_t __init_memblock memblock_get_current_limit(void)
> > +bool __init_memblock memblock_get_current_limit(phys_addr_t *start,
> > +     phys_addr_t *end)
> >  {
> > -     return memblock.current_limit;
> > +     if (start)
> > +             *start = memblock.start_limit;
> > +     if (end)
> > +             *end = memblock.current_limit;
> > +     return memblock.enforce_checking;
> >  }
> >
> >  static void __init_memblock memblock_dump(struct memblock_type *type)
> > --
> > 2.7.4
> >
>
> --
> Sincerely yours,
> Mike.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose
  2019-01-14  8:33     ` Pingfan Liu
@ 2019-01-14  8:50       ` Mike Rapoport
  2019-01-14  9:13         ` Pingfan Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Rapoport @ 2019-01-14  8:50 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Mon, Jan 14, 2019 at 04:33:50PM +0800, Pingfan Liu wrote:
> On Mon, Jan 14, 2019 at 3:51 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > Hi Pingfan,
> >
> > On Fri, Jan 11, 2019 at 01:12:53PM +0800, Pingfan Liu wrote:
> > > During boot time, there is requirement to tell whether a series of func
> > > call will consume memory or not. For some reason, a temporary memory
> > > resource can be loan to those func through memblock allocator, but at a
> > > check point, all of the loan memory should be turned back.
> > > A typical using style:
> > >  -1. find a usable range by memblock_find_in_range(), said, [A,B]
> > >  -2. before calling a series of func, memblock_set_current_limit(A,B,true)
> > >  -3. call funcs
> > >  -4. memblock_find_in_range(A,B,B-A,1), if failed, then some memory is not
> > >      turned back.
> > >  -5. reset the original limit
> > >
> > > E.g. in the case of hotmovable memory, some acpi routines should be called,
> > > and they are not allowed to own some movable memory. Although at present
> > > these functions do not consume memory, but later, if changed without
> > > awareness, they may do. With the above method, the allocation can be
> > > detected, and pr_warn() to ask people to resolve it.
> >
> > To ensure there were that a sequence of function calls didn't create new
> > memblock allocations you can simply check the number of the reserved
> > regions before and after that sequence.
> >
> Yes, thank you point out it.
> 
> > Still, I'm not sure it would be practical to try tracking what code that's called
> > from x86::setup_arch() did memory allocation.
> > Probably a better approach is to verify no memory ended up in the movable
> > areas after their extents are known.
> >
> It is a probability problem whether allocated memory sit on hotmovable
> memory or not. And if warning based on the verification, then it is
> also a probability problem and maybe we will miss it.

I'm not sure I'm following you here.

After the hotmovable memory configuration is detected it is possible to
traverse reserved memblock areas and warn if some of them reside in the
hotmovable memory.

> Thanks and regards,
> Pingfan
> 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > Cc: Andy Lutomirski <luto@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > Cc: Len Brown <lenb@kernel.org>
> > > Cc: Yinghai Lu <yinghai@kernel.org>
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
> > > Cc: Baoquan He <bhe@redhat.com>
> > > Cc: Juergen Gross <jgross@suse.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: x86@kernel.org
> > > Cc: linux-acpi@vger.kernel.org
> > > Cc: linux-mm@kvack.org
> > > ---
> > >  arch/arm/mm/init.c              |  3 ++-
> > >  arch/arm/mm/mmu.c               |  4 ++--
> > >  arch/arm/mm/nommu.c             |  2 +-
> > >  arch/csky/kernel/setup.c        |  2 +-
> > >  arch/microblaze/mm/init.c       |  2 +-
> > >  arch/mips/kernel/setup.c        |  2 +-
> > >  arch/powerpc/mm/40x_mmu.c       |  6 ++++--
> > >  arch/powerpc/mm/44x_mmu.c       |  2 +-
> > >  arch/powerpc/mm/8xx_mmu.c       |  2 +-
> > >  arch/powerpc/mm/fsl_booke_mmu.c |  5 +++--
> > >  arch/powerpc/mm/hash_utils_64.c |  4 ++--
> > >  arch/powerpc/mm/init_32.c       |  2 +-
> > >  arch/powerpc/mm/pgtable-radix.c |  2 +-
> > >  arch/powerpc/mm/ppc_mmu_32.c    |  8 ++++++--
> > >  arch/powerpc/mm/tlb_nohash.c    |  6 ++++--
> > >  arch/unicore32/mm/mmu.c         |  2 +-
> > >  arch/x86/kernel/setup.c         |  2 +-
> > >  arch/xtensa/mm/init.c           |  2 +-
> > >  include/linux/memblock.h        | 10 +++++++---
> > >  mm/memblock.c                   | 23 ++++++++++++++++++-----
> > >  20 files changed, 59 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > > index 32e4845..58a4342 100644
> > > --- a/arch/arm/mm/init.c
> > > +++ b/arch/arm/mm/init.c
> > > @@ -93,7 +93,8 @@ __tagtable(ATAG_INITRD2, parse_tag_initrd2);
> > >  static void __init find_limits(unsigned long *min, unsigned long *max_low,
> > >                              unsigned long *max_high)
> > >  {
> > > -     *max_low = PFN_DOWN(memblock_get_current_limit());
> > > +     memblock_get_current_limit(NULL, max_low);
> > > +     *max_low = PFN_DOWN(*max_low);
> > >       *min = PFN_UP(memblock_start_of_DRAM());
> > >       *max_high = PFN_DOWN(memblock_end_of_DRAM());
> > >  }
> > > diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> > > index f5cc1cc..9025418 100644
> > > --- a/arch/arm/mm/mmu.c
> > > +++ b/arch/arm/mm/mmu.c
> > > @@ -1240,7 +1240,7 @@ void __init adjust_lowmem_bounds(void)
> > >               }
> > >       }
> > >
> > > -     memblock_set_current_limit(memblock_limit);
> > > +     memblock_set_current_limit(0, memblock_limit, false);
> > >  }
> > >
> > >  static inline void prepare_page_table(void)
> > > @@ -1625,7 +1625,7 @@ void __init paging_init(const struct machine_desc *mdesc)
> > >
> > >       prepare_page_table();
> > >       map_lowmem();
> > > -     memblock_set_current_limit(arm_lowmem_limit);
> > > +     memblock_set_current_limit(0, arm_lowmem_limit, false);
> > >       dma_contiguous_remap();
> > >       early_fixmap_shutdown();
> > >       devicemaps_init(mdesc);
> > > diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
> > > index 7d67c70..721535c 100644
> > > --- a/arch/arm/mm/nommu.c
> > > +++ b/arch/arm/mm/nommu.c
> > > @@ -138,7 +138,7 @@ void __init adjust_lowmem_bounds(void)
> > >       adjust_lowmem_bounds_mpu();
> > >       end = memblock_end_of_DRAM();
> > >       high_memory = __va(end - 1) + 1;
> > > -     memblock_set_current_limit(end);
> > > +     memblock_set_current_limit(0, end, false);
> > >  }
> > >
> > >  /*
> > > diff --git a/arch/csky/kernel/setup.c b/arch/csky/kernel/setup.c
> > > index dff8b89..e6f88bf 100644
> > > --- a/arch/csky/kernel/setup.c
> > > +++ b/arch/csky/kernel/setup.c
> > > @@ -100,7 +100,7 @@ static void __init csky_memblock_init(void)
> > >
> > >       highend_pfn = max_pfn;
> > >  #endif
> > > -     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > > +     memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> > >
> > >       dma_contiguous_reserve(0);
> > >
> > > diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
> > > index b17fd8a..cee99da 100644
> > > --- a/arch/microblaze/mm/init.c
> > > +++ b/arch/microblaze/mm/init.c
> > > @@ -353,7 +353,7 @@ asmlinkage void __init mmu_init(void)
> > >       /* Shortly after that, the entire linear mapping will be available */
> > >       /* This will also cause that unflatten device tree will be allocated
> > >        * inside 768MB limit */
> > > -     memblock_set_current_limit(memory_start + lowmem_size - 1);
> > > +     memblock_set_current_limit(0, memory_start + lowmem_size - 1, false);
> > >  }
> > >
> > >  /* This is only called until mem_init is done. */
> > > diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> > > index 8c6c48ed..62dabe1 100644
> > > --- a/arch/mips/kernel/setup.c
> > > +++ b/arch/mips/kernel/setup.c
> > > @@ -862,7 +862,7 @@ static void __init arch_mem_init(char **cmdline_p)
> > >        * with memblock_reserve; memblock_alloc* can be used
> > >        * only after this point
> > >        */
> > > -     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > > +     memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> > >
> > >  #ifdef CONFIG_PROC_VMCORE
> > >       if (setup_elfcorehdr && setup_elfcorehdr_size) {
> > > diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
> > > index 61ac468..427bb56 100644
> > > --- a/arch/powerpc/mm/40x_mmu.c
> > > +++ b/arch/powerpc/mm/40x_mmu.c
> > > @@ -141,7 +141,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
> > >        * coverage with normal-sized pages (or other reasons) do not
> > >        * attempt to allocate outside the allowed range.
> > >        */
> > > -     memblock_set_current_limit(mapped);
> > > +     memblock_set_current_limit(0, mapped, false);
> > >
> > >       return mapped;
> > >  }
> > > @@ -155,5 +155,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > >       BUG_ON(first_memblock_base != 0);
> > >
> > >       /* 40x can only access 16MB at the moment (see head_40x.S) */
> > > -     memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
> > > +     memblock_set_current_limit(0,
> > > +             min_t(u64, first_memblock_size, 0x00800000),
> > > +             false);
> > >  }
> > > diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> > > index 12d9251..3cf127d 100644
> > > --- a/arch/powerpc/mm/44x_mmu.c
> > > +++ b/arch/powerpc/mm/44x_mmu.c
> > > @@ -225,7 +225,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > >
> > >       /* 44x has a 256M TLB entry pinned at boot */
> > >       size = (min_t(u64, first_memblock_size, PPC_PIN_SIZE));
> > > -     memblock_set_current_limit(first_memblock_base + size);
> > > +     memblock_set_current_limit(0, first_memblock_base + size, false);
> > >  }
> > >
> > >  #ifdef CONFIG_SMP
> > > diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
> > > index 01b7f51..c75bca6 100644
> > > --- a/arch/powerpc/mm/8xx_mmu.c
> > > +++ b/arch/powerpc/mm/8xx_mmu.c
> > > @@ -135,7 +135,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
> > >        * attempt to allocate outside the allowed range.
> > >        */
> > >       if (mapped)
> > > -             memblock_set_current_limit(mapped);
> > > +             memblock_set_current_limit(0, mapped, false);
> > >
> > >       block_mapped_ram = mapped;
> > >
> > > diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
> > > index 080d49b..3be24b8 100644
> > > --- a/arch/powerpc/mm/fsl_booke_mmu.c
> > > +++ b/arch/powerpc/mm/fsl_booke_mmu.c
> > > @@ -252,7 +252,8 @@ void __init adjust_total_lowmem(void)
> > >       pr_cont("%lu Mb, residual: %dMb\n", tlbcam_sz(tlbcam_index - 1) >> 20,
> > >               (unsigned int)((total_lowmem - __max_low_memory) >> 20));
> > >
> > > -     memblock_set_current_limit(memstart_addr + __max_low_memory);
> > > +     memblock_set_current_limit(0,
> > > +             memstart_addr + __max_low_memory, false);
> > >  }
> > >
> > >  void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > > @@ -261,7 +262,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > >       phys_addr_t limit = first_memblock_base + first_memblock_size;
> > >
> > >       /* 64M mapped initially according to head_fsl_booke.S */
> > > -     memblock_set_current_limit(min_t(u64, limit, 0x04000000));
> > > +     memblock_set_current_limit(0, min_t(u64, limit, 0x04000000), false);
> > >  }
> > >
> > >  #ifdef CONFIG_RELOCATABLE
> > > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> > > index 0cc7fbc..30fba80 100644
> > > --- a/arch/powerpc/mm/hash_utils_64.c
> > > +++ b/arch/powerpc/mm/hash_utils_64.c
> > > @@ -925,7 +925,7 @@ static void __init htab_initialize(void)
> > >               BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
> > >                               prot, mmu_linear_psize, mmu_kernel_ssize));
> > >       }
> > > -     memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> > > +     memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
> > >
> > >       /*
> > >        * If we have a memory_limit and we've allocated TCEs then we need to
> > > @@ -1867,7 +1867,7 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > >                       ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> > >
> > >               /* Finally limit subsequent allocations */
> > > -             memblock_set_current_limit(ppc64_rma_size);
> > > +             memblock_set_current_limit(0, ppc64_rma_size, false);
> > >       } else {
> > >               ppc64_rma_size = ULONG_MAX;
> > >       }
> > > diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
> > > index 3e59e5d..863d710 100644
> > > --- a/arch/powerpc/mm/init_32.c
> > > +++ b/arch/powerpc/mm/init_32.c
> > > @@ -183,5 +183,5 @@ void __init MMU_init(void)
> > >  #endif
> > >
> > >       /* Shortly after that, the entire linear mapping will be available */
> > > -     memblock_set_current_limit(lowmem_end_addr);
> > > +     memblock_set_current_limit(0, lowmem_end_addr, false);
> > >  }
> > > diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> > > index 9311560..8cd5f2d 100644
> > > --- a/arch/powerpc/mm/pgtable-radix.c
> > > +++ b/arch/powerpc/mm/pgtable-radix.c
> > > @@ -603,7 +603,7 @@ void __init radix__early_init_mmu(void)
> > >               radix_init_pseries();
> > >       }
> > >
> > > -     memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> > > +     memblock_set_current_limit(0, MEMBLOCK_ALLOC_ANYWHERE, false);
> > >
> > >       radix_init_iamr();
> > >       radix_init_pgtable();
> > > diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
> > > index f6f575b..80927ad 100644
> > > --- a/arch/powerpc/mm/ppc_mmu_32.c
> > > +++ b/arch/powerpc/mm/ppc_mmu_32.c
> > > @@ -283,7 +283,11 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > >
> > >       /* 601 can only access 16MB at the moment */
> > >       if (PVR_VER(mfspr(SPRN_PVR)) == 1)
> > > -             memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
> > > +             memblock_set_current_limit(0,
> > > +                     min_t(u64, first_memblock_size, 0x01000000),
> > > +                     false);
> > >       else /* Anything else has 256M mapped */
> > > -             memblock_set_current_limit(min_t(u64, first_memblock_size, 0x10000000));
> > > +             memblock_set_current_limit(0,
> > > +                     min_t(u64, first_memblock_size, 0x10000000),
> > > +                     false);
> > >  }
> > > diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
> > > index ae5d568..d074362 100644
> > > --- a/arch/powerpc/mm/tlb_nohash.c
> > > +++ b/arch/powerpc/mm/tlb_nohash.c
> > > @@ -735,7 +735,7 @@ static void __init early_mmu_set_memory_limit(void)
> > >                * reduces the memory available to Linux.  We need to
> > >                * do this because highmem is not supported on 64-bit.
> > >                */
> > > -             memblock_enforce_memory_limit(linear_map_top);
> > > +             memblock_enforce_memory_limit(0, linear_map_top, false);
> > >       }
> > >  #endif
> > >
> > > @@ -792,7 +792,9 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > >               ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> > >
> > >       /* Finally limit subsequent allocations */
> > > -     memblock_set_current_limit(first_memblock_base + ppc64_rma_size);
> > > +     memblock_set_current_limit(0,
> > > +                     first_memblock_base + ppc64_rma_size,
> > > +                     false);
> > >  }
> > >  #else /* ! CONFIG_PPC64 */
> > >  void __init early_init_mmu(void)
> > > diff --git a/arch/unicore32/mm/mmu.c b/arch/unicore32/mm/mmu.c
> > > index 040a8c2..6d62529 100644
> > > --- a/arch/unicore32/mm/mmu.c
> > > +++ b/arch/unicore32/mm/mmu.c
> > > @@ -286,7 +286,7 @@ static void __init sanity_check_meminfo(void)
> > >       int i, j;
> > >
> > >       lowmem_limit = __pa(vmalloc_min - 1) + 1;
> > > -     memblock_set_current_limit(lowmem_limit);
> > > +     memblock_set_current_limit(0, lowmem_limit, false);
> > >
> > >       for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
> > >               struct membank *bank = &meminfo.bank[j];
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index dc8fc5d..a0122cd 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -1130,7 +1130,7 @@ void __init setup_arch(char **cmdline_p)
> > >               memblock_set_bottom_up(true);
> > >  #endif
> > >       init_mem_mapping();
> > > -     memblock_set_current_limit(get_max_mapped());
> > > +     memblock_set_current_limit(0, get_max_mapped(), false);
> > >
> > >       idt_setup_early_pf();
> > >
> > > diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
> > > index 30a48bb..b924387 100644
> > > --- a/arch/xtensa/mm/init.c
> > > +++ b/arch/xtensa/mm/init.c
> > > @@ -60,7 +60,7 @@ void __init bootmem_init(void)
> > >       max_pfn = PFN_DOWN(memblock_end_of_DRAM());
> > >       max_low_pfn = min(max_pfn, MAX_LOW_PFN);
> > >
> > > -     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > > +     memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
> > >       dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
> > >
> > >       memblock_dump_all();
> > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > > index aee299a..49676f0 100644
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -88,6 +88,8 @@ struct memblock_type {
> > >   */
> > >  struct memblock {
> > >       bool bottom_up;  /* is bottom up direction? */
> > > +     bool enforce_checking;
> > > +     phys_addr_t start_limit;
> > >       phys_addr_t current_limit;
> > >       struct memblock_type memory;
> > >       struct memblock_type reserved;
> > > @@ -482,12 +484,14 @@ static inline void memblock_dump_all(void)
> > >   * memblock_set_current_limit - Set the current allocation limit to allow
> > >   *                         limiting allocations to what is currently
> > >   *                         accessible during boot
> > > - * @limit: New limit value (physical address)
> > > + * [start_limit, end_limit]: New limit value (physical address)
> > > + * enforcing: whether check against the limit boundary or not
> > >   */
> > > -void memblock_set_current_limit(phys_addr_t limit);
> > > +void memblock_set_current_limit(phys_addr_t start_limit,
> > > +     phys_addr_t end_limit, bool enforcing);
> > >
> > >
> > > -phys_addr_t memblock_get_current_limit(void);
> > > +bool memblock_get_current_limit(phys_addr_t *start, phys_addr_t *end);
> > >
> > >  /*
> > >   * pfn conversion functions
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index 81ae63c..b792be0 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -116,6 +116,8 @@ struct memblock memblock __initdata_memblock = {
> > >  #endif
> > >
> > >       .bottom_up              = false,
> > > +     .enforce_checking       = false,
> > > +     .start_limit            = 0,
> > >       .current_limit          = MEMBLOCK_ALLOC_ANYWHERE,
> > >  };
> > >
> > > @@ -261,8 +263,11 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> > >  {
> > >       phys_addr_t kernel_end, ret;
> > >
> > > +     if (unlikely(memblock.enforce_checking)) {
> > > +             start = memblock.start_limit;
> > > +             end = memblock.current_limit;
> > >       /* pump up @end */
> > > -     if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
> > > +     } else if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
> > >               end = memblock.current_limit;
> > >
> > >       /* avoid allocating the first page */
> > > @@ -1826,14 +1831,22 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
> > >       }
> > >  }
> > >
> > > -void __init_memblock memblock_set_current_limit(phys_addr_t limit)
> > > +void __init_memblock memblock_set_current_limit(phys_addr_t start,
> > > +     phys_addr_t end, bool enforcing)
> > >  {
> > > -     memblock.current_limit = limit;
> > > +     memblock.start_limit = start;
> > > +     memblock.current_limit = end;
> > > +     memblock.enforce_checking = enforcing;
> > >  }
> > >
> > > -phys_addr_t __init_memblock memblock_get_current_limit(void)
> > > +bool __init_memblock memblock_get_current_limit(phys_addr_t *start,
> > > +     phys_addr_t *end)
> > >  {
> > > -     return memblock.current_limit;
> > > +     if (start)
> > > +             *start = memblock.start_limit;
> > > +     if (end)
> > > +             *end = memblock.current_limit;
> > > +     return memblock.enforce_checking;
> > >  }
> > >
> > >  static void __init_memblock memblock_dump(struct memblock_type *type)
> > > --
> > > 2.7.4
> > >
> >
> > --
> > Sincerely yours,
> > Mike.
> >
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose
  2019-01-14  8:50       ` Mike Rapoport
@ 2019-01-14  9:13         ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-14  9:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Mon, Jan 14, 2019 at 4:50 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Mon, Jan 14, 2019 at 04:33:50PM +0800, Pingfan Liu wrote:
> > On Mon, Jan 14, 2019 at 3:51 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
> > >
> > > Hi Pingfan,
> > >
> > > On Fri, Jan 11, 2019 at 01:12:53PM +0800, Pingfan Liu wrote:
> > > > During boot time, there is requirement to tell whether a series of func
> > > > call will consume memory or not. For some reason, a temporary memory
> > > > resource can be loan to those func through memblock allocator, but at a
> > > > check point, all of the loan memory should be turned back.
> > > > A typical using style:
> > > >  -1. find a usable range by memblock_find_in_range(), said, [A,B]
> > > >  -2. before calling a series of func, memblock_set_current_limit(A,B,true)
> > > >  -3. call funcs
> > > >  -4. memblock_find_in_range(A,B,B-A,1), if failed, then some memory is not
> > > >      turned back.
> > > >  -5. reset the original limit
> > > >
> > > > E.g. in the case of hotmovable memory, some acpi routines should be called,
> > > > and they are not allowed to own some movable memory. Although at present
> > > > these functions do not consume memory, but later, if changed without
> > > > awareness, they may do. With the above method, the allocation can be
> > > > detected, and pr_warn() to ask people to resolve it.
> > >
> > > To ensure there were that a sequence of function calls didn't create new
> > > memblock allocations you can simply check the number of the reserved
> > > regions before and after that sequence.
> > >
> > Yes, thank you point out it.
> >
> > > Still, I'm not sure it would be practical to try tracking what code that's called
> > > from x86::setup_arch() did memory allocation.
> > > Probably a better approach is to verify no memory ended up in the movable
> > > areas after their extents are known.
> > >
> > It is a probability problem whether allocated memory sit on hotmovable
> > memory or not. And if warning based on the verification, then it is
> > also a probability problem and maybe we will miss it.
>
> I'm not sure I'm following you here.
>
> After the hotmovable memory configuration is detected it is possible to
> traverse reserved memblock areas and warn if some of them reside in the
> hotmovable memory.
>
Oh, sorry that I did not explain it accurately. Let use say a machine
with nodeA/B/C from low to high memory address. With top-down
allocation by default, at this point, memory will always be allocated
from nodeC. But it depends on machine whether nodeC is hotmovable or
not. The verification can pass on a machine with unmovable nodeC , but
fails on a machine with movable nodeC. It will be a probability issue.

Thanks

[...]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info
  2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
                   ` (6 preceding siblings ...)
  2019-01-11  5:12 ` [PATCHv2 7/7] x86/mm: isolate the bottom-up style to init_32.c Pingfan Liu
@ 2019-01-14 23:02 ` Dave Hansen
  2019-01-15  6:06   ` Pingfan Liu
  7 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2019-01-14 23:02 UTC (permalink / raw)
  To: Pingfan Liu, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rafael J. Wysocki,
	Len Brown, Yinghai Lu, Tejun Heo, Chao Fan, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On 1/10/19 9:12 PM, Pingfan Liu wrote:
> Background
> When kaslr kernel can be guaranteed to sit inside unmovable node
> after [1].

What does this "[1]" refer to?

Also, can you clarify your terminology here a bit.  By "kaslr kernel",
do you mean the base address?

> But if kaslr kernel is located near the end of the movable node,
> then bottom-up allocator may create pagetable which crosses the boundary
> between unmovable node and movable node.

Again, I'm confused.  Do you literally mean a single page table page?  I
think you mean the page tables, but it would be nice to clarify this,
and also explicitly state which page tables these are.

>  It is a probability issue,
> two factors include -1. how big the gap between kernel end and
> unmovable node's end.  -2. how many memory does the system own.
> Alternative way to fix this issue is by increasing the gap by
> boot/compressed/kaslr*.

Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]?

It took me a minute to figure out you were talking about filenames.

> But taking the scenario of PB level memory, the pagetable will take
> server MB even if using 1GB page, different page attr and fragment
> will make things worse. So it is hard to decide how much should the
> gap increase.
I'm not following this.  If we move the image around, we leave holes.
Why do we need page table pages allocated to cover these holes?

> The following figure show the defection of current bottom-up style:
>   [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]

"defection"?

> If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
> can generate pgtable on nodeC, which stain movable node.

Let me see if I can summarize this:
1. The kernel ASLR decompression code picks a spot to place the kernel
   image in physical memory.
2. Some page tables are dynamically allocated near (after) this spot.
3. Sometimes, based on the random ASLR location, these page tables fall
   over into the "movable node" area.  Being unmovable allocations, this
   is not cool.
4. To fix this (on 64-bit at least), we stop allocating page tables
   based on the location of the kernel image.  Instead, we allocate
   using the memblock allocator itself, which knows how to avoid the
   movable node.

> This patch makes it certainty instead of a probablity problem. It achieves
> this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping().

What does memory hotplug have to do with this?  I thought this was all
about early boot.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade()
  2019-01-11  5:12 ` [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade() Pingfan Liu
  2019-01-11  5:30   ` Chao Fan
@ 2019-01-14 23:12   ` Dave Hansen
  2019-01-15  7:28     ` Pingfan Liu
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2019-01-14 23:12 UTC (permalink / raw)
  To: Pingfan Liu, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rafael J. Wysocki,
	Len Brown, Yinghai Lu, Tejun Heo, Chao Fan, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On 1/10/19 9:12 PM, Pingfan Liu wrote:
> The current acpi_table_upgrade() relies on initrd_start, but this var is

"var" meaning variable?

Could you please go back and try to ensure you spell out all the words
you are intending to write?  I think "topo" probably means "topology",
but it's a really odd word to use for changing the arguments of a
function, so I'm not sure.

There are a couple more of these in this set.

> only valid after relocate_initrd(). There is requirement to extract the
> acpi info from initrd before memblock-allocator can work(see [2/4]), hence
> acpi_table_upgrade() need to accept the input param directly.

"[2/4]"

It looks like you quickly resent this set without updating the patch
descriptions.

> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index 61203ee..84e0a79 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -471,10 +471,8 @@ static DECLARE_BITMAP(acpi_initrd_installed, NR_ACPI_INITRD_TABLES);
>  
>  #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
>  
> -void __init acpi_table_upgrade(void)
> +void __init acpi_table_upgrade(void *data, size_t size)
>  {
> -	void *data = (void *)initrd_start;
> -	size_t size = initrd_end - initrd_start;
>  	int sig, no, table_nr = 0, total_offset = 0;
>  	long offset = 0;
>  	struct acpi_table_header *table;

I know you are just replacing some existing variables, but we have a
slightly higher standard for naming when you actually have to specify
arguments to a function.  Can you please give these proper names?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64
  2019-01-11  5:12 ` [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64 Pingfan Liu
@ 2019-01-14 23:27   ` Dave Hansen
  2019-01-15  7:38     ` Pingfan Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2019-01-14 23:27 UTC (permalink / raw)
  To: Pingfan Liu, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rafael J. Wysocki,
	Len Brown, Yinghai Lu, Tejun Heo, Chao Fan, Baoquan He,
	Juergen Gross, Andrew Morton, Mike Rapoport, Vlastimil Babka,
	Michal Hocko, x86, linux-acpi, linux-mm

On 1/10/19 9:12 PM, Pingfan Liu wrote:
> Although kaslr-kernel can avoid to stain the movable node. [1]

Can you explain what staining is, or perhaps try to use some more
standard nomenclature?  There are exactly 0 instances of the word
"stain" in arch/x86/ or mm/.

> But the
> pgtable can still stain the movable node. That is a probability problem,
> although low, but exist. This patch tries to make it certainty by
> allocating pgtable on unmovable node, instead of following kernel end.

Anyway, can you read my suggested summary in the earlier patch and see
if it fits or if I missed anything?  This description is really hard to
read.

...> +#ifdef CONFIG_X86_32
> +
> +static unsigned long min_pfn_mapped;
> +
>  static unsigned long __init get_new_step_size(unsigned long step_size)
>  {
>  	/*
> @@ -653,6 +655,32 @@ static void __init memory_map_bottom_up(unsigned long map_start,
>  	}
>  }
>  
> +static unsigned long __init init_range_memory_mapping32(
> +	unsigned long r_start, unsigned long r_end)
> +{

Why is this returning a value which is not used?

Did you compile this?  Didn't you get a warning that you're not
returning a value from a function returning non-void?

Also, I'd much rather see something like this written:

static __init
unsigned long init_range_memory_mapping32(unsigned long r_start,
					  unsigned long r_end)

than what you have above.  But, if you get rid of the 'unsigned long',
it will look much more sane in the first place.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info
  2019-01-14 23:02 ` [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Dave Hansen
@ 2019-01-15  6:06   ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-15  6:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Tue, Jan 15, 2019 at 7:02 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/10/19 9:12 PM, Pingfan Liu wrote:
> > Background
> > When kaslr kernel can be guaranteed to sit inside unmovable node
> > after [1].
>
> What does this "[1]" refer to?
>
https://lore.kernel.org/patchwork/patch/1029376/

> Also, can you clarify your terminology here a bit.  By "kaslr kernel",
> do you mean the base address?
>
It should be the randomization of load address. Googled, and found out
that it is "base address".

> > But if kaslr kernel is located near the end of the movable node,
> > then bottom-up allocator may create pagetable which crosses the boundary
> > between unmovable node and movable node.
>
> Again, I'm confused.  Do you literally mean a single page table page?  I
> think you mean the page tables, but it would be nice to clarify this,
> and also explicitly state which page tables these are.
>
It should be page table pages. The page table is built by init_mem_mapping().

> >  It is a probability issue,
> > two factors include -1. how big the gap between kernel end and
> > unmovable node's end.  -2. how many memory does the system own.
> > Alternative way to fix this issue is by increasing the gap by
> > boot/compressed/kaslr*.
>
> Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]?
>
Sorry, and yes, code in arch/x86/boot/compressed/kaslr_64.c and kaslr.c

> It took me a minute to figure out you were talking about filenames.
>
> > But taking the scenario of PB level memory, the pagetable will take
> > server MB even if using 1GB page, different page attr and fragment
> > will make things worse. So it is hard to decide how much should the
> > gap increase.
> I'm not following this.  If we move the image around, we leave holes.
> Why do we need page table pages allocated to cover these holes?
>
I means in arch/x86/boot/compressed/kaslr.c, store_slot_info() {
slot_area.num = (region->size - image_size) /CONFIG_PHYSICAL_ALIGN + 1
}.  Let us denote the size of page table as "X", then the formula is
changed to slot_area.num = (region->size - image_size -X)
/CONFIG_PHYSICAL_ALIGN + 1. And it is hard to decide X due to the
above factors.

> > The following figure show the defection of current bottom-up style:
> >   [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]
>
> "defection"?
>
Oh, defect.

> > If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
> > can generate pgtable on nodeC, which stain movable node.
>
> Let me see if I can summarize this:
> 1. The kernel ASLR decompression code picks a spot to place the kernel
>    image in physical memory.
> 2. Some page tables are dynamically allocated near (after) this spot.
> 3. Sometimes, based on the random ASLR location, these page tables fall
>    over into the "movable node" area.  Being unmovable allocations, this
>    is not cool.
> 4. To fix this (on 64-bit at least), we stop allocating page tables
>    based on the location of the kernel image.  Instead, we allocate
>    using the memblock allocator itself, which knows how to avoid the
>    movable node.
>
Yes, you get my idea exactly. Thanks for your help to summary it. Hard
for me to express it clearly in English.

> > This patch makes it certainty instead of a probablity problem. It achieves
> > this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping().
>
> What does memory hotplug have to do with this?  I thought this was all
> about early boot.

Put the info about memory hot plugable to memblock allocator,
initmem_init()->...->acpi_numa_memory_affinity_init(), where
memblock_mark_hotplug() does it. Later when memory allocator works, in
__next_mem_range(), it will check this info by
memblock_is_hotpluggable().

Thanks and regards,
Pingfan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled
       [not found]   ` <96233c0c-940d-8d7c-b3be-d8863c026996@intel.com>
@ 2019-01-15  7:06     ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-15  7:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Tue, Jan 15, 2019 at 7:07 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/10/19 9:12 PM, Pingfan Liu wrote:
> > This patch identifies the point where memblock alloc start. It has no
> > functional.
>
> It has no functional ... what?  Effects?
>
During re-organize the code, it takes me a long time to figure out why
memblock_set_bottom_up(true) is added here, and how far can it be
deferred. And finally, I realize that it only takes effect after
e820__memblock_setup(), the point where memblock allocator can work.
So I concentrate the related code, and hope this patch can classify
this truth.

> > -     memblock_set_current_limit(ISA_END_ADDRESS);
> > -     e820__memblock_setup();
> > -
> >       reserve_bios_regions();
> >
> >       if (efi_enabled(EFI_MEMMAP)) {
> > @@ -1113,6 +1087,8 @@ void __init setup_arch(char **cmdline_p)
> >               efi_reserve_boot_services();
> >       }
> >
> > +     memblock_set_current_limit(0, ISA_END_ADDRESS, false);
> > +     e820__memblock_setup();
>
> It looks like you changed the arguments passed to
> memblock_set_current_limit().  How can this even compile?  Did you mean
> that this patch is not functional?
>
Sorry that during rebasing, merge trivial fix by mistake. I will build
against each patch.

Best regards,
Pingfan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade()
  2019-01-14 23:12   ` Dave Hansen
@ 2019-01-15  7:28     ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-15  7:28 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Tue, Jan 15, 2019 at 7:12 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/10/19 9:12 PM, Pingfan Liu wrote:
> > The current acpi_table_upgrade() relies on initrd_start, but this var is
>
> "var" meaning variable?
>
> Could you please go back and try to ensure you spell out all the words
> you are intending to write?  I think "topo" probably means "topology",
> but it's a really odd word to use for changing the arguments of a
> function, so I'm not sure.
>
> There are a couple more of these in this set.
>
Yes. I will do it and fix them in next version.

> > only valid after relocate_initrd(). There is requirement to extract the
> > acpi info from initrd before memblock-allocator can work(see [2/4]), hence
> > acpi_table_upgrade() need to accept the input param directly.
>
> "[2/4]"
>
> It looks like you quickly resent this set without updating the patch
> descriptions.
>
> > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> > index 61203ee..84e0a79 100644
> > --- a/drivers/acpi/tables.c
> > +++ b/drivers/acpi/tables.c
> > @@ -471,10 +471,8 @@ static DECLARE_BITMAP(acpi_initrd_installed, NR_ACPI_INITRD_TABLES);
> >
> >  #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
> >
> > -void __init acpi_table_upgrade(void)
> > +void __init acpi_table_upgrade(void *data, size_t size)
> >  {
> > -     void *data = (void *)initrd_start;
> > -     size_t size = initrd_end - initrd_start;
> >       int sig, no, table_nr = 0, total_offset = 0;
> >       long offset = 0;
> >       struct acpi_table_header *table;
>
> I know you are just replacing some existing variables, but we have a
> slightly higher standard for naming when you actually have to specify
> arguments to a function.  Can you please give these proper names?
>
OK, I will change it to acpi_table_upgrade(void *initrd, size_t size).

Thanks,
Pingfan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64
  2019-01-14 23:27   ` Dave Hansen
@ 2019-01-15  7:38     ` Pingfan Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Pingfan Liu @ 2019-01-15  7:38 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, Yinghai Lu, Tejun Heo, Chao Fan,
	Baoquan He, Juergen Gross, Andrew Morton, Mike Rapoport,
	Vlastimil Babka, Michal Hocko, x86, linux-acpi, linux-mm

On Tue, Jan 15, 2019 at 7:27 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/10/19 9:12 PM, Pingfan Liu wrote:
> > Although kaslr-kernel can avoid to stain the movable node. [1]
>
> Can you explain what staining is, or perhaps try to use some more
> standard nomenclature?  There are exactly 0 instances of the word
> "stain" in arch/x86/ or mm/.
>
I mean that KASLR may randomly choose some positions for base address,
which are located in movable node.

> > But the
> > pgtable can still stain the movable node. That is a probability problem,
> > although low, but exist. This patch tries to make it certainty by
> > allocating pgtable on unmovable node, instead of following kernel end.
>
> Anyway, can you read my suggested summary in the earlier patch and see
> if it fits or if I missed anything?  This description is really hard to
> read.
>
Your summary in the reply to [PATCH 0/7] express the things clearly. I
will use them to update the commit log

> ...> +#ifdef CONFIG_X86_32
> > +
> > +static unsigned long min_pfn_mapped;
> > +
> >  static unsigned long __init get_new_step_size(unsigned long step_size)
> >  {
> >       /*
> > @@ -653,6 +655,32 @@ static void __init memory_map_bottom_up(unsigned long map_start,
> >       }
> >  }
> >
> > +static unsigned long __init init_range_memory_mapping32(
> > +     unsigned long r_start, unsigned long r_end)
> > +{
>
> Why is this returning a value which is not used?
>
> Did you compile this?  Didn't you get a warning that you're not
> returning a value from a function returning non-void?
>
It should be void. I will fix it in next version

> Also, I'd much rather see something like this written:
>
> static __init
> unsigned long init_range_memory_mapping32(unsigned long r_start,
>                                           unsigned long r_end)
>
> than what you have above.  But, if you get rid of the 'unsigned long',
> it will look much more sane in the first place.

Yes. Thank for your kindly review.

Best Regards,
Pingfan

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-01-15  7:38 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11  5:12 [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled Pingfan Liu
2019-01-11  6:12   ` Chao Fan
2019-01-11 10:06     ` Pingfan Liu
     [not found]   ` <96233c0c-940d-8d7c-b3be-d8863c026996@intel.com>
2019-01-15  7:06     ` Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade() Pingfan Liu
2019-01-11  5:30   ` Chao Fan
2019-01-11 10:08     ` Pingfan Liu
2019-01-14 23:12   ` Dave Hansen
2019-01-15  7:28     ` Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose Pingfan Liu
2019-01-14  7:51   ` Mike Rapoport
2019-01-14  8:33     ` Pingfan Liu
2019-01-14  8:50       ` Mike Rapoport
2019-01-14  9:13         ` Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 4/7] x86/setup: parse acpi to get hotplug info before init_mem_mapping() Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 5/7] x86/mm: set allowed range for memblock allocator Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64 Pingfan Liu
2019-01-14 23:27   ` Dave Hansen
2019-01-15  7:38     ` Pingfan Liu
2019-01-11  5:12 ` [PATCHv2 7/7] x86/mm: isolate the bottom-up style to init_32.c Pingfan Liu
2019-01-14 23:02 ` [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Dave Hansen
2019-01-15  6:06   ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).