linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
@ 2018-08-30  9:03 Feng Tang
  2018-08-30 10:44 ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-08-30  9:03 UTC (permalink / raw)
  To: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Yinghai Lu, Peter Zijlstra, Dave Hansen, Andi Kleen
  Cc: Feng Tang

We hit a kernel panic when enabling earlycon for a platform, the
call trace is:

	  panic+0xd2/0x220
	  __alloc_bootmem+0x31/0x34
	  spp_getpage+0x60/0x8a
	  fill_pte+0x71/0x130
	  __set_pte_vaddr+0x1d/0x50
	  set_pte_vaddr+0x3c/0x60
	  __native_set_fixmap+0x23/0x30
	  native_set_fixmap+0x30/0x40
	  setup_earlycon+0x1e0/0x32f
	  param_setup_earlycon+0x13/0x22
	  do_early_param+0x5b/0x90
	  parse_args+0x1f7/0x300
	  parse_early_options+0x24/0x28
	  parse_early_param+0x65/0x73
	  setup_arch+0x31e/0x9f1
	  start_kernel+0x58/0x44e

The root cause is that when CONFIG_NO_BOOTMEM=y,  before
e820__memblock_setup() is called there is no memory for bootmem
to allocate, and any alloc_bootmem() in this time window will
trigger a panic. Seems this is a known issue as already
mentioned in arch/x86/kernel/setup.c:
	"
	/* after early param, so could get panic from serial */
	memblock_x86_reserve_range_setup_data();
	"
By reserving some memory in the linker script, small memory request
could be fulfilled by bootmem allocator. And this memory region
is not wasted, but usable after kernel boot.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/include/asm/sections.h |  5 +++++
 arch/x86/kernel/setup.c         | 12 +++++++++++-
 arch/x86/kernel/vmlinux.lds.S   |  9 +++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 4a911a382ade..d568829510e4 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -9,6 +9,11 @@ extern char __brk_base[], __brk_limit[];
 extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_aligned[];
 
+#ifdef CONFIG_NO_BOOTMEM
+extern char __bootmem_start[], __bootmem_end[];
+#endif
+
+
 #if defined(CONFIG_X86_64)
 extern char __end_rodata_hpage_align[];
 extern char __entry_trampoline_start[], __entry_trampoline_end[];
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4866badb235..a4e3d9ec1a83 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -829,6 +829,17 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	memblock_reserve(0, PAGE_SIZE);
 
+#ifdef CONFIG_NO_BOOTMEM
+	/*
+	 * There is small time window till e820__memblock_setup() is
+	 * called, in which __bootmem_alloc() has no available memory
+	 * to allocate and will trigger panic. Adding this revered bootmem
+	 * can alleviate the situation.
+	 */
+	memblock_add(__pa_symbol(__bootmem_start),
+			__bootmem_end - __bootmem_start);
+#endif
+
 	early_reserve_initrd();
 
 	/*
@@ -989,7 +1000,6 @@ void __init setup_arch(char **cmdline_p)
 
 	x86_report_nx();
 
-	/* after early param, so could get panic from serial */
 	memblock_x86_reserve_range_setup_data();
 
 	if (acpi_mps_check()) {
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 8bde0a419f86..fec1965d668f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -367,6 +367,15 @@ SECTIONS
 		__brk_limit = .;
 	}
 
+#ifdef CONFIG_NO_BOOTMEM
+	. = ALIGN(PAGE_SIZE);
+	.bootmem : AT(ADDR(.bootmem) - LOAD_OFFSET) {
+		__bootmem_start = .;
+		. += 128 * 1024;		/* 128 KB for early boot mem */
+		__bootmem_end = .;
+	}
+#endif
+
 	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30  9:03 [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM Feng Tang
@ 2018-08-30 10:44 ` Peter Zijlstra
  2018-08-30 11:12   ` Michal Hocko
  2018-08-30 12:43   ` Feng Tang
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2018-08-30 10:44 UTC (permalink / raw)
  To: Feng Tang
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> We hit a kernel panic when enabling earlycon for a platform, the
> call trace is:
> 
> 	  panic+0xd2/0x220
> 	  __alloc_bootmem+0x31/0x34
> 	  spp_getpage+0x60/0x8a
> 	  fill_pte+0x71/0x130
> 	  __set_pte_vaddr+0x1d/0x50
> 	  set_pte_vaddr+0x3c/0x60
> 	  __native_set_fixmap+0x23/0x30
> 	  native_set_fixmap+0x30/0x40
> 	  setup_earlycon+0x1e0/0x32f
> 	  param_setup_earlycon+0x13/0x22
> 	  do_early_param+0x5b/0x90
> 	  parse_args+0x1f7/0x300
> 	  parse_early_options+0x24/0x28
> 	  parse_early_param+0x65/0x73
> 	  setup_arch+0x31e/0x9f1
> 	  start_kernel+0x58/0x44e
> 
> The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> e820__memblock_setup() is called there is no memory for bootmem
> to allocate,

Which you bloody well asked for by using NO_BOOTMEM=y.

Going down this route; adding hacks for every little thing that does
want bootmem, completely defeats the purpose.

If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
solves your problem. No earlycon, no panic.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 10:44 ` Peter Zijlstra
@ 2018-08-30 11:12   ` Michal Hocko
  2018-08-30 11:54     ` Thomas Gleixner
  2018-08-30 12:09     ` Peter Zijlstra
  2018-08-30 12:43   ` Feng Tang
  1 sibling, 2 replies; 29+ messages in thread
From: Michal Hocko @ 2018-08-30 11:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Feng Tang, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > We hit a kernel panic when enabling earlycon for a platform, the
> > call trace is:
> > 
> > 	  panic+0xd2/0x220
> > 	  __alloc_bootmem+0x31/0x34
> > 	  spp_getpage+0x60/0x8a
> > 	  fill_pte+0x71/0x130
> > 	  __set_pte_vaddr+0x1d/0x50
> > 	  set_pte_vaddr+0x3c/0x60
> > 	  __native_set_fixmap+0x23/0x30
> > 	  native_set_fixmap+0x30/0x40
> > 	  setup_earlycon+0x1e0/0x32f
> > 	  param_setup_earlycon+0x13/0x22
> > 	  do_early_param+0x5b/0x90
> > 	  parse_args+0x1f7/0x300
> > 	  parse_early_options+0x24/0x28
> > 	  parse_early_param+0x65/0x73
> > 	  setup_arch+0x31e/0x9f1
> > 	  start_kernel+0x58/0x44e
> > 
> > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > e820__memblock_setup() is called there is no memory for bootmem
> > to allocate,
> 
> Which you bloody well asked for by using NO_BOOTMEM=y.
> 
> Going down this route; adding hacks for every little thing that does
> want bootmem, completely defeats the purpose.
> 
> If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> solves your problem. No earlycon, no panic.

Well, there is endeavor to remove bootmem allocator altogether. So
making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
me. I am not familiar with this code path but why cannot we postpone the
allocation to later or use a statically allocated storage?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 11:12   ` Michal Hocko
@ 2018-08-30 11:54     ` Thomas Gleixner
  2018-08-30 12:49       ` Michal Hocko
  2018-08-30 12:09     ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-08-30 11:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Zijlstra, Feng Tang, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, 30 Aug 2018, Michal Hocko wrote:
> On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > e820__memblock_setup() is called there is no memory for bootmem
> > > to allocate,
> > 
> > Which you bloody well asked for by using NO_BOOTMEM=y.
> > 
> > Going down this route; adding hacks for every little thing that does
> > want bootmem, completely defeats the purpose.
> > 
> > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > solves your problem. No earlycon, no panic.
> 
> Well, there is endeavor to remove bootmem allocator altogether. So
> making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to

If we want to remove bootmem, then reintroducing it with a static bootmem
section doesn't make any sense at all.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 11:12   ` Michal Hocko
  2018-08-30 11:54     ` Thomas Gleixner
@ 2018-08-30 12:09     ` Peter Zijlstra
  2018-08-30 12:39       ` Michal Hocko
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2018-08-30 12:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Feng Tang, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, Aug 30, 2018 at 01:12:07PM +0200, Michal Hocko wrote:
> On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > We hit a kernel panic when enabling earlycon for a platform, the
> > > call trace is:
> > > 
> > > 	  panic+0xd2/0x220
> > > 	  __alloc_bootmem+0x31/0x34
> > > 	  spp_getpage+0x60/0x8a
> > > 	  fill_pte+0x71/0x130
> > > 	  __set_pte_vaddr+0x1d/0x50
> > > 	  set_pte_vaddr+0x3c/0x60
> > > 	  __native_set_fixmap+0x23/0x30
> > > 	  native_set_fixmap+0x30/0x40
> > > 	  setup_earlycon+0x1e0/0x32f
> > > 	  param_setup_earlycon+0x13/0x22
> > > 	  do_early_param+0x5b/0x90
> > > 	  parse_args+0x1f7/0x300
> > > 	  parse_early_options+0x24/0x28
> > > 	  parse_early_param+0x65/0x73
> > > 	  setup_arch+0x31e/0x9f1
> > > 	  start_kernel+0x58/0x44e
> > > 
> > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > e820__memblock_setup() is called there is no memory for bootmem
> > > to allocate,
> > 
> > Which you bloody well asked for by using NO_BOOTMEM=y.
> > 
> > Going down this route; adding hacks for every little thing that does
> > want bootmem, completely defeats the purpose.
> > 
> > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > solves your problem. No earlycon, no panic.
> 
> Well, there is endeavor to remove bootmem allocator altogether. So

wasn't aware of that. why?

> making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
> me. I am not familiar with this code path but why cannot we postpone the
> allocation to later or use a statically allocated storage?

because 'early'... I suppose. Youu really want the early con up and
running asap.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 12:09     ` Peter Zijlstra
@ 2018-08-30 12:39       ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-08-30 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Feng Tang, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu 30-08-18 14:09:26, Peter Zijlstra wrote:
> On Thu, Aug 30, 2018 at 01:12:07PM +0200, Michal Hocko wrote:
> > On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > > We hit a kernel panic when enabling earlycon for a platform, the
> > > > call trace is:
> > > > 
> > > > 	  panic+0xd2/0x220
> > > > 	  __alloc_bootmem+0x31/0x34
> > > > 	  spp_getpage+0x60/0x8a
> > > > 	  fill_pte+0x71/0x130
> > > > 	  __set_pte_vaddr+0x1d/0x50
> > > > 	  set_pte_vaddr+0x3c/0x60
> > > > 	  __native_set_fixmap+0x23/0x30
> > > > 	  native_set_fixmap+0x30/0x40
> > > > 	  setup_earlycon+0x1e0/0x32f
> > > > 	  param_setup_earlycon+0x13/0x22
> > > > 	  do_early_param+0x5b/0x90
> > > > 	  parse_args+0x1f7/0x300
> > > > 	  parse_early_options+0x24/0x28
> > > > 	  parse_early_param+0x65/0x73
> > > > 	  setup_arch+0x31e/0x9f1
> > > > 	  start_kernel+0x58/0x44e
> > > > 
> > > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > > e820__memblock_setup() is called there is no memory for bootmem
> > > > to allocate,
> > > 
> > > Which you bloody well asked for by using NO_BOOTMEM=y.
> > > 
> > > Going down this route; adding hacks for every little thing that does
> > > want bootmem, completely defeats the purpose.
> > > 
> > > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > > solves your problem. No earlycon, no panic.
> > 
> > Well, there is endeavor to remove bootmem allocator altogether. So
> 
> wasn't aware of that. why?

Because it is yet another allocator that we have to maintain and that is
not really needed.
 
> > making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
> > me. I am not familiar with this code path but why cannot we postpone the
> > allocation to later or use a statically allocated storage?
> 
> because 'early'... I suppose. Youu really want the early con up and
> running asap.

Well, the memblock allocator is initialized quite early as well... If
that is not early enough then we have to play tricks.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 10:44 ` Peter Zijlstra
  2018-08-30 11:12   ` Michal Hocko
@ 2018-08-30 12:43   ` Feng Tang
  1 sibling, 0 replies; 29+ messages in thread
From: Feng Tang @ 2018-08-30 12:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Yinghai Lu, Dave Hansen, Andi Kleen

Hi Peter,

On Thu, Aug 30, 2018 at 12:44:02PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > We hit a kernel panic when enabling earlycon for a platform, the
> > call trace is:
> > 
> > 	  panic+0xd2/0x220
> > 	  __alloc_bootmem+0x31/0x34
> > 	  spp_getpage+0x60/0x8a
> > 	  fill_pte+0x71/0x130
> > 	  __set_pte_vaddr+0x1d/0x50
> > 	  set_pte_vaddr+0x3c/0x60
> > 	  __native_set_fixmap+0x23/0x30
> > 	  native_set_fixmap+0x30/0x40
> > 	  setup_earlycon+0x1e0/0x32f
> > 	  param_setup_earlycon+0x13/0x22
> > 	  do_early_param+0x5b/0x90
> > 	  parse_args+0x1f7/0x300
> > 	  parse_early_options+0x24/0x28
> > 	  parse_early_param+0x65/0x73
> > 	  setup_arch+0x31e/0x9f1
> > 	  start_kernel+0x58/0x44e
> > 
> > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > e820__memblock_setup() is called there is no memory for bootmem
> > to allocate,
> 
> Which you bloody well asked for by using NO_BOOTMEM=y.
> 
> Going down this route; adding hacks for every little thing that does
> want bootmem, completely defeats the purpose.
> 
> If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> solves your problem. No earlycon, no panic.

In current x86, the NO_BOOTMEM is "y" by default, and almost all the
servers/desktops I could access have CONFIG_BOOTMEM=y, while earlycon
is important for enabling new platforms, as well as kernel debugging.
IMHO, it's better to keep earlycon capability. Also any alloc_bootmem()
call in that time window will trigger panic.

Thanks,
Feng


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 11:54     ` Thomas Gleixner
@ 2018-08-30 12:49       ` Michal Hocko
  2018-08-30 12:59         ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2018-08-30 12:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Feng Tang, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu 30-08-18 13:54:19, Thomas Gleixner wrote:
> On Thu, 30 Aug 2018, Michal Hocko wrote:
> > On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > > e820__memblock_setup() is called there is no memory for bootmem
> > > > to allocate,
> > > 
> > > Which you bloody well asked for by using NO_BOOTMEM=y.
> > > 
> > > Going down this route; adding hacks for every little thing that does
> > > want bootmem, completely defeats the purpose.
> > > 
> > > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > > solves your problem. No earlycon, no panic.
> > 
> > Well, there is endeavor to remove bootmem allocator altogether. So
> > making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
> 
> If we want to remove bootmem, then reintroducing it with a static bootmem
> section doesn't make any sense at all.

I have actually checked the code now and see what you mean. I thought it
would be a single allocation that is needed but that is not the case so
the static buffer is not going to fly.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 12:49       ` Michal Hocko
@ 2018-08-30 12:59         ` Feng Tang
  2018-08-30 13:05           ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-08-30 12:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Thomas Gleixner, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, Aug 30, 2018 at 02:49:15PM +0200, Michal Hocko wrote:
> On Thu 30-08-18 13:54:19, Thomas Gleixner wrote:
> > On Thu, 30 Aug 2018, Michal Hocko wrote:
> > > On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > > > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > > > e820__memblock_setup() is called there is no memory for bootmem
> > > > > to allocate,
> > > > 
> > > > Which you bloody well asked for by using NO_BOOTMEM=y.
> > > > 
> > > > Going down this route; adding hacks for every little thing that does
> > > > want bootmem, completely defeats the purpose.
> > > > 
> > > > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > > > solves your problem. No earlycon, no panic.
> > > 
> > > Well, there is endeavor to remove bootmem allocator altogether. So
> > > making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
> > 
> > If we want to remove bootmem, then reintroducing it with a static bootmem
> > section doesn't make any sense at all.
> 
> I have actually checked the code now and see what you mean. I thought it
> would be a single allocation that is needed but that is not the case so
> the static buffer is not going to fly.

Exactly! I tried several ways for the static allocation, like in data, in bss
section, but they all failed, as in the very start of setup_arch():

	memblock_reserve(__pa_symbol(_text),
		(unsigned long)__bss_stop - (unsigned long)_text);

Those [_text, __bss_stop] is not able to be used by alloc_bootmem(). And I
only got this patch, and really appreciate any good suggestions.

Thanks,
Feng



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 12:59         ` Feng Tang
@ 2018-08-30 13:05           ` Thomas Gleixner
  2018-08-30 13:19             ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-08-30 13:05 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, 30 Aug 2018, Feng Tang wrote:
> On Thu, Aug 30, 2018 at 02:49:15PM +0200, Michal Hocko wrote:
> > On Thu 30-08-18 13:54:19, Thomas Gleixner wrote:
> > > On Thu, 30 Aug 2018, Michal Hocko wrote:
> > > > On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > > > > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > > > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > > > > e820__memblock_setup() is called there is no memory for bootmem
> > > > > > to allocate,
> > > > > 
> > > > > Which you bloody well asked for by using NO_BOOTMEM=y.
> > > > > 
> > > > > Going down this route; adding hacks for every little thing that does
> > > > > want bootmem, completely defeats the purpose.
> > > > > 
> > > > > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > > > > solves your problem. No earlycon, no panic.
> > > > 
> > > > Well, there is endeavor to remove bootmem allocator altogether. So
> > > > making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
> > > 
> > > If we want to remove bootmem, then reintroducing it with a static bootmem
> > > section doesn't make any sense at all.
> > 
> > I have actually checked the code now and see what you mean. I thought it
> > would be a single allocation that is needed but that is not the case so
> > the static buffer is not going to fly.
> 
> Exactly! I tried several ways for the static allocation, like in data, in bss
> section, but they all failed, as in the very start of setup_arch():
> 
> 	memblock_reserve(__pa_symbol(_text),
> 		(unsigned long)__bss_stop - (unsigned long)_text);
> 
> Those [_text, __bss_stop] is not able to be used by alloc_bootmem(). And I
> only got this patch, and really appreciate any good suggestions.

And why do you want to use bootmem in the first place? If boot mem is going
away, why are you not fixing the early console crap to NOT use bootmem at
all?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 13:05           ` Thomas Gleixner
@ 2018-08-30 13:19             ` Feng Tang
  2018-08-30 13:25               ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-08-30 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

Hi Thomas,

On Thu, Aug 30, 2018 at 03:05:31PM +0200, Thomas Gleixner wrote:
> On Thu, 30 Aug 2018, Feng Tang wrote:
> > On Thu, Aug 30, 2018 at 02:49:15PM +0200, Michal Hocko wrote:
> > > On Thu 30-08-18 13:54:19, Thomas Gleixner wrote:
> > > > On Thu, 30 Aug 2018, Michal Hocko wrote:
> > > > > On Thu 30-08-18 12:44:02, Peter Zijlstra wrote:
> > > > > > On Thu, Aug 30, 2018 at 05:03:19PM +0800, Feng Tang wrote:
> > > > > > > The root cause is that when CONFIG_NO_BOOTMEM=y,  before
> > > > > > > e820__memblock_setup() is called there is no memory for bootmem
> > > > > > > to allocate,
> > > > > > 
> > > > > > Which you bloody well asked for by using NO_BOOTMEM=y.
> > > > > > 
> > > > > > Going down this route; adding hacks for every little thing that does
> > > > > > want bootmem, completely defeats the purpose.
> > > > > > 
> > > > > > If anything, make the earlycon thing depend on NO_BOOTMEM=n. That also
> > > > > > solves your problem. No earlycon, no panic.
> > > > > 
> > > > > Well, there is endeavor to remove bootmem allocator altogether. So
> > > > > making earlycon depend on NO_BOOTMEM=n doesn't sound like a good fit to
> > > > 
> > > > If we want to remove bootmem, then reintroducing it with a static bootmem
> > > > section doesn't make any sense at all.
> > > 
> > > I have actually checked the code now and see what you mean. I thought it
> > > would be a single allocation that is needed but that is not the case so
> > > the static buffer is not going to fly.
> > 
> > Exactly! I tried several ways for the static allocation, like in data, in bss
> > section, but they all failed, as in the very start of setup_arch():
> > 
> > 	memblock_reserve(__pa_symbol(_text),
> > 		(unsigned long)__bss_stop - (unsigned long)_text);
> > 
> > Those [_text, __bss_stop] is not able to be used by alloc_bootmem(). And I
> > only got this patch, and really appreciate any good suggestions.
> 
> And why do you want to use bootmem in the first place? If boot mem is going
> away, why are you not fixing the early console crap to NOT use bootmem at
> all?

The earlycon need use fixmap to map the mmio register, while fixmap will
need a new page for page table if the pmd/pte is not already their, as shown
in the commit log: 

	  panic+0xd2/0x220
	  __alloc_bootmem+0x31/0x34
	  spp_getpage+0x60/0x8a
	  fill_pte+0x71/0x130
	  __set_pte_vaddr+0x1d/0x50
	  set_pte_vaddr+0x3c/0x60
	  __native_set_fixmap+0x23/0x30
	  native_set_fixmap+0x30/0x40
	  setup_earlycon+0x1e0/0x32f
	  param_setup_earlycon+0x13/0x22
	  do_early_param+0x5b/0x90
	  parse_args+0x1f7/0x300
	  parse_early_options+0x24/0x28
	  parse_early_param+0x65/0x73
	  setup_arch+0x31e/0x9f1
	  start_kernel+0x58/0x44e

inside the spp_getpage():
	
	if (after_bootmem)
		ptr = (void *) get_zeroed_page(GFP_ATOMIC);
	else
		ptr = alloc_bootmem_pages(PAGE_SIZE);

Thanks,
Feng


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 13:19             ` Feng Tang
@ 2018-08-30 13:25               ` Thomas Gleixner
  2018-08-30 13:55                 ` Feng Tang
  2018-08-31  6:15                 ` Feng Tang
  0 siblings, 2 replies; 29+ messages in thread
From: Thomas Gleixner @ 2018-08-30 13:25 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, 30 Aug 2018, Feng Tang wrote:
> On Thu, Aug 30, 2018 at 03:05:31PM +0200, Thomas Gleixner wrote:
> > > Those [_text, __bss_stop] is not able to be used by alloc_bootmem(). And I
> > > only got this patch, and really appreciate any good suggestions.
> > 
> > And why do you want to use bootmem in the first place? If boot mem is going
> > away, why are you not fixing the early console crap to NOT use bootmem at
> > all?
> 
> The earlycon need use fixmap to map the mmio register, while fixmap will
> need a new page for page table if the pmd/pte is not already their, as shown
> in the commit log: 
> 
> 	  panic+0xd2/0x220
> 	  __alloc_bootmem+0x31/0x34
> 	  spp_getpage+0x60/0x8a
> 	  fill_pte+0x71/0x130
> 	  __set_pte_vaddr+0x1d/0x50
> 	  set_pte_vaddr+0x3c/0x60
> 	  __native_set_fixmap+0x23/0x30
> 	  native_set_fixmap+0x30/0x40
> 	  setup_earlycon+0x1e0/0x32f
> 	  param_setup_earlycon+0x13/0x22
> 	  do_early_param+0x5b/0x90
> 	  parse_args+0x1f7/0x300
> 	  parse_early_options+0x24/0x28
> 	  parse_early_param+0x65/0x73
> 	  setup_arch+0x31e/0x9f1
> 	  start_kernel+0x58/0x44e
> 
> inside the spp_getpage():
> 	
> 	if (after_bootmem)
> 		ptr = (void *) get_zeroed_page(GFP_ATOMIC);
> 	else
> 		ptr = alloc_bootmem_pages(PAGE_SIZE);

And where is the problem? We know that we need the fixmap during early boot
anyway, so allocating the PTE page statically and setting it up early
enough is not really rocket science. No allocator required.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 13:25               ` Thomas Gleixner
@ 2018-08-30 13:55                 ` Feng Tang
  2018-08-31  6:15                 ` Feng Tang
  1 sibling, 0 replies; 29+ messages in thread
From: Feng Tang @ 2018-08-30 13:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, Aug 30, 2018 at 03:25:42PM +0200, Thomas Gleixner wrote:
> On Thu, 30 Aug 2018, Feng Tang wrote:
> > On Thu, Aug 30, 2018 at 03:05:31PM +0200, Thomas Gleixner wrote:
> > > > Those [_text, __bss_stop] is not able to be used by alloc_bootmem(). And I
> > > > only got this patch, and really appreciate any good suggestions.
> > > 
> > > And why do you want to use bootmem in the first place? If boot mem is going
> > > away, why are you not fixing the early console crap to NOT use bootmem at
> > > all?
> > 
> > The earlycon need use fixmap to map the mmio register, while fixmap will
> > need a new page for page table if the pmd/pte is not already their, as shown
> > in the commit log: 
> > 
> > 	  panic+0xd2/0x220
> > 	  __alloc_bootmem+0x31/0x34
> > 	  spp_getpage+0x60/0x8a
> > 	  fill_pte+0x71/0x130
> > 	  __set_pte_vaddr+0x1d/0x50
> > 	  set_pte_vaddr+0x3c/0x60
> > 	  __native_set_fixmap+0x23/0x30
> > 	  native_set_fixmap+0x30/0x40
> > 	  setup_earlycon+0x1e0/0x32f
> > 	  param_setup_earlycon+0x13/0x22
> > 	  do_early_param+0x5b/0x90
> > 	  parse_args+0x1f7/0x300
> > 	  parse_early_options+0x24/0x28
> > 	  parse_early_param+0x65/0x73
> > 	  setup_arch+0x31e/0x9f1
> > 	  start_kernel+0x58/0x44e
> > 
> > inside the spp_getpage():
> > 	
> > 	if (after_bootmem)
> > 		ptr = (void *) get_zeroed_page(GFP_ATOMIC);
> > 	else
> > 		ptr = alloc_bootmem_pages(PAGE_SIZE);
> 
> And where is the problem? We know that we need the fixmap during early boot
> anyway, so allocating the PTE page statically and setting it up early
> enough is not really rocket science. No allocator required.

Yes, if the bootmem itslef is going way, setting up the page table for
fixmap is a cleaner solution.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-30 13:25               ` Thomas Gleixner
  2018-08-30 13:55                 ` Feng Tang
@ 2018-08-31  6:15                 ` Feng Tang
  2018-08-31 11:33                   ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-08-31  6:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Thu, Aug 30, 2018 at 03:25:42PM +0200, Thomas Gleixner wrote:
 
> And where is the problem? We know that we need the fixmap during early boot
> anyway, so allocating the PTE page statically and setting it up early
> enough is not really rocket science. No allocator required.

Further check shows the page table for fixmap is already setup in [-12M, -10M],
but fixmap's address space breaks the expection under certain kernel config.
So here is another fix, please review, thanks!

----------------------------------------------------------------

From f63e4e5c6166a2f8d13ad3142b7a49b2fd1f8edf Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Fri, 31 Aug 2018 13:48:21 +0800
Subject: [PATCH] x86, mm: Make the real fixmap address consistent

We hit a kernel panic when enabling earlycon for a platform, the
call trace is:

	  panic+0xd2/0x220
	  __alloc_bootmem+0x31/0x34
	  spp_getpage+0x60/0x8a
	  fill_pte+0x71/0x130
	  __set_pte_vaddr+0x1d/0x50
	  set_pte_vaddr+0x3c/0x60
	  __native_set_fixmap+0x23/0x30
	  native_set_fixmap+0x30/0x40
	  setup_earlycon+0x1e0/0x32f
	  param_setup_earlycon+0x13/0x22
	  do_early_param+0x5b/0x90
	  parse_args+0x1f7/0x300
	  parse_early_options+0x24/0x28
	  parse_early_param+0x65/0x73
	  setup_arch+0x31e/0x9f1
	  start_kernel+0x58/0x44e

This panic happens as the earlycon's fixmap address has no
pmd/pte ready, and __set_fixmap will try to allocate memory to
setup the page table, and trigger panic due to no memory.

x86 kernel actually prepares the page table for fixmap in head_64.S:

	NEXT_PAGE(level2_fixmap_pgt)
		.fill	506,8,0
		.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
		/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
		.fill	5,8,0

and it expects the fixmap address is in [-12M, -10M] range, but
current code in fixmap.h will break the expectation when
X86_VSYSCALL_EMULATION=n

	#ifdef CONFIG_X86_VSYSCALL_EMULATION
		VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
	#endif

So removing the "#ifdef" will make the fixmap address space stable in
[-12M, -10M] and fix the issue.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/include/asm/fixmap.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index e203169931c7..fcf27171f493 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -67,9 +67,7 @@ enum fixed_addresses {
 #ifdef CONFIG_X86_32
 	FIX_HOLE,
 #else
-#ifdef CONFIG_X86_VSYSCALL_EMULATION
 	VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
-#endif
 #endif
 	FIX_DBGP_BASE,
 	FIX_EARLYCON_MEM_BASE,
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-31  6:15                 ` Feng Tang
@ 2018-08-31 11:33                   ` Thomas Gleixner
  2018-08-31 13:36                     ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-08-31 11:33 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Fri, 31 Aug 2018, Feng Tang wrote:
> On Thu, Aug 30, 2018 at 03:25:42PM +0200, Thomas Gleixner wrote:
> This panic happens as the earlycon's fixmap address has no
> pmd/pte ready, and __set_fixmap will try to allocate memory to
> setup the page table, and trigger panic due to no memory.
> 
> x86 kernel actually prepares the page table for fixmap in head_64.S:
> 
> 	NEXT_PAGE(level2_fixmap_pgt)
> 		.fill	506,8,0
> 		.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
> 		/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
> 		.fill	5,8,0
> 
> and it expects the fixmap address is in [-12M, -10M] range, but
> current code in fixmap.h will break the expectation when
> X86_VSYSCALL_EMULATION=n
> 
> 	#ifdef CONFIG_X86_VSYSCALL_EMULATION
> 		VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
> 	#endif
> 
> So removing the "#ifdef" will make the fixmap address space stable in
> [-12M, -10M] and fix the issue.

Why on earth are you not fixing the damned PTE setup which is the obvious
and correct thing to do?

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-31 11:33                   ` Thomas Gleixner
@ 2018-08-31 13:36                     ` Feng Tang
  2018-09-07  8:17                       ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-08-31 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Fri, Aug 31, 2018 at 01:33:05PM +0200, Thomas Gleixner wrote:
> On Fri, 31 Aug 2018, Feng Tang wrote:
> > On Thu, Aug 30, 2018 at 03:25:42PM +0200, Thomas Gleixner wrote:
> > This panic happens as the earlycon's fixmap address has no
> > pmd/pte ready, and __set_fixmap will try to allocate memory to
> > setup the page table, and trigger panic due to no memory.
> > 
> > x86 kernel actually prepares the page table for fixmap in head_64.S:
> > 
> > 	NEXT_PAGE(level2_fixmap_pgt)
> > 		.fill	506,8,0
> > 		.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
> > 		/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
> > 		.fill	5,8,0
> > 
> > and it expects the fixmap address is in [-12M, -10M] range, but
> > current code in fixmap.h will break the expectation when
> > X86_VSYSCALL_EMULATION=n
> > 
> > 	#ifdef CONFIG_X86_VSYSCALL_EMULATION
> > 		VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
> > 	#endif
> > 
> > So removing the "#ifdef" will make the fixmap address space stable in
> > [-12M, -10M] and fix the issue.
> 
> Why on earth are you not fixing the damned PTE setup which is the obvious
> and correct thing to do?

Any sugestion? I can only have patch like this:

---
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 15ebc2fc166e..8cdb27ccc3a3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -446,11 +446,15 @@ NEXT_PAGE(level2_kernel_pgt)
 
 NEXT_PAGE(level2_fixmap_pgt)
 	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+	.quad	level1_fixmap_pgt0 - __START_KERNEL_map + _PAGE_TABLE_NOENC
+	.quad	level1_fixmap_pgt1 - __START_KERNEL_map + _PAGE_TABLE_NOENC
 	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
+	.fill	4,8,0
 
-NEXT_PAGE(level1_fixmap_pgt)
+NEXT_PAGE(level1_fixmap_pgt0)
+	.fill	512,8,0
+
+NEXT_PAGE(level1_fixmap_pgt1)
 	.fill	512,8,0
 
 #undef PMDS


Thanks,
Feng


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-08-31 13:36                     ` Feng Tang
@ 2018-09-07  8:17                       ` Feng Tang
  2018-09-07 10:52                         ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-07  8:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

Hi Thomas,

On Fri, Aug 31, 2018 at 09:36:59PM +0800, Feng Tang wrote:
> On Fri, Aug 31, 2018 at 01:33:05PM +0200, Thomas Gleixner wrote:
> > On Fri, 31 Aug 2018, Feng Tang wrote:
> > > On Thu, Aug 30, 2018 at 03:25:42PM +0200, Thomas Gleixner wrote:
> > > This panic happens as the earlycon's fixmap address has no
> > > pmd/pte ready, and __set_fixmap will try to allocate memory to
> > > setup the page table, and trigger panic due to no memory.
> > > 
> > > x86 kernel actually prepares the page table for fixmap in head_64.S:
> > > 
> > > 	NEXT_PAGE(level2_fixmap_pgt)
> > > 		.fill	506,8,0
> > > 		.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
> > > 		/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
> > > 		.fill	5,8,0
> > > 
> > > and it expects the fixmap address is in [-12M, -10M] range, but
> > > current code in fixmap.h will break the expectation when
> > > X86_VSYSCALL_EMULATION=n
> > > 
> > > 	#ifdef CONFIG_X86_VSYSCALL_EMULATION
> > > 		VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
> > > 	#endif
> > > 
> > > So removing the "#ifdef" will make the fixmap address space stable in
> > > [-12M, -10M] and fix the issue.
> > 
> > Why on earth are you not fixing the damned PTE setup which is the obvious
> > and correct thing to do?
> 
> Any sugestion? I can only have patch like this:

Could you review this patch, as at this time window there is no usable memory
block or other memory allocators that I know, so I follow the exisitng static
fixmap page table code and add one more.

Thanks,
Feng

> 
> ---
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 15ebc2fc166e..8cdb27ccc3a3 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -446,11 +446,15 @@ NEXT_PAGE(level2_kernel_pgt)
>  
>  NEXT_PAGE(level2_fixmap_pgt)
>  	.fill	506,8,0
> -	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
> +	.quad	level1_fixmap_pgt0 - __START_KERNEL_map + _PAGE_TABLE_NOENC
> +	.quad	level1_fixmap_pgt1 - __START_KERNEL_map + _PAGE_TABLE_NOENC
>  	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
> -	.fill	5,8,0
> +	.fill	4,8,0
>  
> -NEXT_PAGE(level1_fixmap_pgt)
> +NEXT_PAGE(level1_fixmap_pgt0)
> +	.fill	512,8,0
> +
> +NEXT_PAGE(level1_fixmap_pgt1)
>  	.fill	512,8,0
>  
>  #undef PMDS
> 
> 
> Thanks,
> Feng
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-07  8:17                       ` Feng Tang
@ 2018-09-07 10:52                         ` Thomas Gleixner
  2018-09-10  9:39                           ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-07 10:52 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Fri, 7 Sep 2018, Feng Tang wrote:
> On Fri, Aug 31, 2018 at 09:36:59PM +0800, Feng Tang wrote:
> > Any sugestion? I can only have patch like this:
> 
> Could you review this patch, as at this time window there is no usable memory
> block or other memory allocators that I know, so I follow the exisitng static
> fixmap page table code and add one more.

You really want to add a proper sanity check for this. The static stuff has
to cover the fixmap. Just making it work for this particular case is sloppy
at best as the next larger change of the fixmap will just bring the problem
back.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-07 10:52                         ` Thomas Gleixner
@ 2018-09-10  9:39                           ` Feng Tang
  2018-09-10  9:53                             ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-10  9:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Fri, Sep 07, 2018 at 12:52:17PM +0200, Thomas Gleixner wrote:
> On Fri, 7 Sep 2018, Feng Tang wrote:
> > On Fri, Aug 31, 2018 at 09:36:59PM +0800, Feng Tang wrote:
> > > Any sugestion? I can only have patch like this:
> > 
> > Could you review this patch, as at this time window there is no usable memory
> > block or other memory allocators that I know, so I follow the exisitng static
> > fixmap page table code and add one more.
> 
> You really want to add a proper sanity check for this. The static stuff has
> to cover the fixmap. Just making it work for this particular case is sloppy
> at best as the next larger change of the fixmap will just bring the problem
> back.
Yes, agree.

Since the issue happens very early before any serial console, I can only think
of a build time check.

Thanks,
Feng

---
 arch/x86/kernel/head_64.S |  6 +++++-
 arch/x86/mm/pgtable.c     | 13 +++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 15ebc2fc166e..3a1d33b43c29 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -448,11 +448,15 @@ NEXT_PAGE(level2_fixmap_pgt)
 	.fill	506,8,0
 	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
+	.quad	level1_fixmap_pgt1 - __START_KERNEL_map + _PAGE_TABLE_NOENC
+	.fill	4,8,0
 
 NEXT_PAGE(level1_fixmap_pgt)
 	.fill	512,8,0
 
+NEXT_PAGE(level1_fixmap_pgt1)
+	.fill	512,8,0
+
 #undef PMDS
 
 	.data
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e848a4811785..fc172551048a 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -637,6 +637,19 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
 {
 	unsigned long address = __fix_to_virt(idx);
 
+#ifdef CONFIG_X86_64
+	/*
+	 * In arch/x86/kernel/head_64.S, the static page table
+	 * has been setup for 4M space [-12M, -8M]
+	 *	0xFFFFFFFFFF400000 ~ 0xFFFFFFFFFF7FFFFF
+	 * Add a sanity check whether fixed_address crosses
+	 * the boundary.
+	 */
+	#define FIXMAP_STATIC_PGTABLE_START	0xFFFFFFFFFF400000
+	BUILD_BUG_ON(__fix_to_virt(__end_of_permanent_fixed_addresses)
+			< FIXMAP_STATIC_PGTABLE_START);
+#endif
+
 	if (idx >= __end_of_fixed_addresses) {
 		BUG();
 		return;
-- 
2.14.1

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-10  9:39                           ` Feng Tang
@ 2018-09-10  9:53                             ` Thomas Gleixner
  2018-09-11  6:14                               ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-10  9:53 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Mon, 10 Sep 2018, Feng Tang wrote:
> On Fri, Sep 07, 2018 at 12:52:17PM +0200, Thomas Gleixner wrote:
> > You really want to add a proper sanity check for this. The static stuff has
> > to cover the fixmap. Just making it work for this particular case is sloppy
> > at best as the next larger change of the fixmap will just bring the problem
> > back.
> Yes, agree.
> 
> Since the issue happens very early before any serial console, I can only think
> of a build time check.

Of course. That's obvious. The size _IS_ known at build time.

> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index e848a4811785..fc172551048a 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -637,6 +637,19 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
>  {
>  	unsigned long address = __fix_to_virt(idx);
>  
> +#ifdef CONFIG_X86_64
> +	/*
> +	 * In arch/x86/kernel/head_64.S, the static page table
> +	 * has been setup for 4M space [-12M, -8M]
> +	 *	0xFFFFFFFFFF400000 ~ 0xFFFFFFFFFF7FFFFF
> +	 * Add a sanity check whether fixed_address crosses
> +	 * the boundary.
> +	 */
> +	#define FIXMAP_STATIC_PGTABLE_START	0xFFFFFFFFFF400000

Errm what? This is beyond hillarious, really. What helps that if I remove
that second FIXMAP entry in head_64.S?

The size of the fixmap is known at compile time and it is known when
building head_64.S. So the obvious check is to check right there where the
static pagetable is set up and where you know how many entries you set up
whether it's covering the full size or not and err out there.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-10  9:53                             ` Thomas Gleixner
@ 2018-09-11  6:14                               ` Feng Tang
  2018-09-15 10:29                                 ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-11  6:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

Hi Thomas,

On Mon, Sep 10, 2018 at 11:53:39AM +0200, Thomas Gleixner wrote:
> On Mon, 10 Sep 2018, Feng Tang wrote:
> > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > index e848a4811785..fc172551048a 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -637,6 +637,19 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
> >  {
> >  	unsigned long address = __fix_to_virt(idx);
> >  
> > +#ifdef CONFIG_X86_64
> > +	/*
> > +	 * In arch/x86/kernel/head_64.S, the static page table
> > +	 * has been setup for 4M space [-12M, -8M]
> > +	 *	0xFFFFFFFFFF400000 ~ 0xFFFFFFFFFF7FFFFF
> > +	 * Add a sanity check whether fixed_address crosses
> > +	 * the boundary.
> > +	 */
> > +	#define FIXMAP_STATIC_PGTABLE_START	0xFFFFFFFFFF400000
> 
> Errm what? This is beyond hillarious, really. What helps that if I remove
> that second FIXMAP entry in head_64.S?
> 
> The size of the fixmap is known at compile time and it is known when
> building head_64.S. So the obvious check is to check right there where the
> static pagetable is set up and where you know how many entries you set up
> whether it's covering the full size or not and err out there.

Thanks for the suggestion, and I have 2 patches: one adds a build warning,   
the other prepares fixmap page table on demand and doesn't need warning.

But I met a problem, that the "__end_of_permanent_fixed_addresses" is
defined in fixmap.h, which is protected by #ifndef __ASSEMBLY__, also
fixmap.h reference many other header file, which makes it harder to
extract the definition out. Any suggestion on this? thanks!

- Feng

patch1:
---
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3a1d33b43c29..5863623890af 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -451,12 +451,20 @@ NEXT_PAGE(level2_fixmap_pgt)
 	.quad	level1_fixmap_pgt1 - __START_KERNEL_map + _PAGE_TABLE_NOENC
 	.fill	4,8,0
 
+/*
+ * Next are static page table for 4M fixmap space:
+ *	[-12M, -10M], [-10M, -8M]
+ */
 NEXT_PAGE(level1_fixmap_pgt)
 	.fill	512,8,0
 
 NEXT_PAGE(level1_fixmap_pgt1)
 	.fill	512,8,0
 
+#if __end_of_permanent_fixed_addresses > 1024
+.error "Total fixmap space exceeds the static page table capacity, please expand the page table!"
+#endif
+
 #undef PMDS
 
 	.data


Patch2:
---
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 15ebc2fc166e..2b98ce234686 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -24,6 +24,7 @@
 #include "../entry/calling.h"
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
+#include <asm/fixmap.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -444,14 +445,21 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
+#define FIXMAP_PMD_NUM	((__end_of_permanent_fixed_addresses + 511) / 512)
 NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+	.fill	(512 - 4 - FIXMAP_PMD_NUM),8,0
+	pgtno = 0
+	.rept (FIXMAP_PMD_NUM)
+	.quad level1_fixmap_pgt + (pgtno << PAGE_SHIFT) - __START_KERNEL_map  + _PAGE_TABLE_NOENC
+	pgtno = pgtno + 1
+	.endr
 	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
+	.fill	4,8,0
 
 NEXT_PAGE(level1_fixmap_pgt)
+	.rept (FIXMAP_PMD_NUM)
 	.fill	512,8,0
+	.endr
 
 #undef PMDS

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-11  6:14                               ` Feng Tang
@ 2018-09-15 10:29                                 ` Thomas Gleixner
  2018-09-15 16:47                                   ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-15 10:29 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Tue, 11 Sep 2018, Feng Tang wrote:
> Thanks for the suggestion, and I have 2 patches: one adds a build warning,   
> the other prepares fixmap page table on demand and doesn't need warning.

The latter please.

> But I met a problem, that the "__end_of_permanent_fixed_addresses" is
> defined in fixmap.h, which is protected by #ifndef __ASSEMBLY__, also
> fixmap.h reference many other header file, which makes it harder to
> extract the definition out. Any suggestion on this? thanks!

What prevents you from moving the enum out of the __ASSEMBLY__ protected
section aside of a bit of careful work?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-15 10:29                                 ` Thomas Gleixner
@ 2018-09-15 16:47                                   ` Feng Tang
  2018-09-15 17:28                                     ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-15 16:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

Hi Thomas,

On Sat, Sep 15, 2018 at 12:29:50PM +0200, Thomas Gleixner wrote:
> On Tue, 11 Sep 2018, Feng Tang wrote:
> > Thanks for the suggestion, and I have 2 patches: one adds a build warning,   
> > the other prepares fixmap page table on demand and doesn't need warning.
> 
> The latter please.

Okay.

> 
> > But I met a problem, that the "__end_of_permanent_fixed_addresses" is
> > defined in fixmap.h, which is protected by #ifndef __ASSEMBLY__, also
> > fixmap.h reference many other header file, which makes it harder to
> > extract the definition out. Any suggestion on this? thanks!
> 
> What prevents you from moving the enum out of the __ASSEMBLY__ protected
> section aside of a bit of careful work?

I have tried to change some header files incluing fixmap.h/apicdef.h/
vsyscall.h, and most of the .c files compile fine now, but I can not
use the "__end_of_permanent_fixed_addresses" in head_64.S as it is a 
enum type, and could not be recognized by assembly code.

Thanks,
Feng

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-15 16:47                                   ` Feng Tang
@ 2018-09-15 17:28                                     ` Thomas Gleixner
  2018-09-16 14:35                                       ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-15 17:28 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Sun, 16 Sep 2018, Feng Tang wrote:
> I have tried to change some header files incluing fixmap.h/apicdef.h/
> vsyscall.h, and most of the .c files compile fine now, but I can not
> use the "__end_of_permanent_fixed_addresses" in head_64.S as it is a 
> enum type, and could not be recognized by assembly code.

Hrmm. I did not think about the enum. So we have two possibilities:

1) Have some preprocessing which provides the info for the assembler

2) Use a constant for the number of PMDs which is defined in a header and
   then compile time checked against the size of the fixmap in a C-file.

#1 would be preferred, but for a quick fix #2 is okay as well.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-15 17:28                                     ` Thomas Gleixner
@ 2018-09-16 14:35                                       ` Feng Tang
  2018-09-16 14:43                                         ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-16 14:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, linux-kernel, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Sat, Sep 15, 2018 at 07:28:03PM +0200, Thomas Gleixner wrote:
> On Sun, 16 Sep 2018, Feng Tang wrote:
> > I have tried to change some header files incluing fixmap.h/apicdef.h/
> > vsyscall.h, and most of the .c files compile fine now, but I can not
> > use the "__end_of_permanent_fixed_addresses" in head_64.S as it is a 
> > enum type, and could not be recognized by assembly code.
> 
> Hrmm. I did not think about the enum. So we have two possibilities:
> 
> 1) Have some preprocessing which provides the info for the assembler
> 
> 2) Use a constant for the number of PMDs which is defined in a header and
>    then compile time checked against the size of the fixmap in a C-file.

As there is no fixmap.c, I put the chck into __native_set_fixmap temply,
please review, thanks

- Feng

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index e203169931c7..ffaa0fa31044 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -14,6 +14,9 @@
 #ifndef _ASM_X86_FIXMAP_H
 #define _ASM_X86_FIXMAP_H
 
+/* Put here to be accessble for assembly code like head_64.S */
+#define FIXMAP_PMD_NUM	2
+
 #ifndef __ASSEMBLY__
 #include <linux/kernel.h>
 #include <asm/acpi.h>
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index f773d5e6c8cc..d6820d4c6e47 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -14,6 +14,7 @@
 #include <asm/processor.h>
 #include <linux/bitops.h>
 #include <linux/threads.h>
+#include <asm/fixmap.h>
 
 extern p4d_t level4_kernel_pgt[512];
 extern p4d_t level4_ident_pgt[512];
@@ -22,7 +23,7 @@ extern pud_t level3_ident_pgt[512];
 extern pmd_t level2_kernel_pgt[512];
 extern pmd_t level2_fixmap_pgt[512];
 extern pmd_t level2_ident_pgt[512];
-extern pte_t level1_fixmap_pgt[512];
+extern pte_t level1_fixmap_pgt[512 * FIXMAP_PMD_NUM];
 extern pgd_t init_top_pgt[];
 
 #define swapper_pg_dir init_top_pgt
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 15ebc2fc166e..985eaff70792 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -24,6 +24,7 @@
 #include "../entry/calling.h"
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
+#include <asm/fixmap.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -445,13 +446,19 @@ NEXT_PAGE(level2_kernel_pgt)
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
 NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
+	.fill	(512 - 4 - FIXMAP_PMD_NUM),8,0
+	pgtno = 0
+	.rept (FIXMAP_PMD_NUM)
+	.quad level1_fixmap_pgt + (pgtno << PAGE_SHIFT) - __START_KERNEL_map  + _PAGE_TABLE_NOENC
+	pgtno = pgtno + 1
+	.endr
+	/* 6 MB reserved space + a 2MB hole */
+	.fill	4,8,0
 
 NEXT_PAGE(level1_fixmap_pgt)
+	.rept (FIXMAP_PMD_NUM)
 	.fill	512,8,0
+	.endr
 
 #undef PMDS
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e848a4811785..a927f5f39bee 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -637,6 +637,16 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
 {
 	unsigned long address = __fix_to_virt(idx);
 
+#ifdef CONFIG_X86_64
+       /*
+        * In arch/x86/kernel/head_64.S, the static page table has
+	* been setup for fixmap. Add a sanity compiling check
+	* whether fixmap space has exceeded the capacity.
+        */
+       BUILD_BUG_ON((__end_of_permanent_fixed_addresses + 511)/512
+                       > FIXMAP_PMD_NUM);
+#endif
+
 	if (idx >= __end_of_fixed_addresses) {
 		BUG();
 		return;
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 45b700ac5fe7..b0aa2364a3c6 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1953,7 +1953,10 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO);
+
+	for (i = 0; i < FIXMAP_PMD_NUM; i++)
+		set_page_prot(level1_fixmap_pgt + i * 512,
+				PAGE_KERNEL_RO);
 
 	/* Pin down new L4 */
 	pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,

> 
> #1 would be preferred, but for a quick fix #2 is okay as well.
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-16 14:35                                       ` Feng Tang
@ 2018-09-16 14:43                                         ` Thomas Gleixner
  2018-09-16 15:06                                           ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-16 14:43 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, LKML, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Sun, 16 Sep 2018, Feng Tang wrote:
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index e848a4811785..a927f5f39bee 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -637,6 +637,16 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
>  {
>  	unsigned long address = __fix_to_virt(idx);
>  
> +#ifdef CONFIG_X86_64
> +       /*
> +        * In arch/x86/kernel/head_64.S, the static page table has
> +	* been setup for fixmap. Add a sanity compiling check
> +	* whether fixmap space has exceeded the capacity.
> +        */
> +       BUILD_BUG_ON((__end_of_permanent_fixed_addresses + 511)/512
> +                       > FIXMAP_PMD_NUM);

There exist macros for rounding up and doing this in the header file is
perfectly fine.

> +#endif
> +
>  	if (idx >= __end_of_fixed_addresses) {
>  		BUG();
>  		return;
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index 45b700ac5fe7..b0aa2364a3c6 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -1953,7 +1953,10 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
>  	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
>  	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
> -	set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO);
> +
> +	for (i = 0; i < FIXMAP_PMD_NUM; i++)
> +		set_page_prot(level1_fixmap_pgt + i * 512,
> +				PAGE_KERNEL_RO);

The line break is there because?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-16 14:43                                         ` Thomas Gleixner
@ 2018-09-16 15:06                                           ` Feng Tang
  2018-09-17  7:01                                             ` Feng Tang
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-16 15:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, LKML, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

Hi Thomas,

On Sun, Sep 16, 2018 at 04:43:55PM +0200, Thomas Gleixner wrote:
> On Sun, 16 Sep 2018, Feng Tang wrote:
> > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > index e848a4811785..a927f5f39bee 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -637,6 +637,16 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
> >  {
> >  	unsigned long address = __fix_to_virt(idx);
> >  
> > +#ifdef CONFIG_X86_64
> > +       /*
> > +        * In arch/x86/kernel/head_64.S, the static page table has
> > +	* been setup for fixmap. Add a sanity compiling check
> > +	* whether fixmap space has exceeded the capacity.
> > +        */
> > +       BUILD_BUG_ON((__end_of_permanent_fixed_addresses + 511)/512
> > +                       > FIXMAP_PMD_NUM);
> 
> There exist macros for rounding up and doing this in the header file is
> perfectly fine.

I'll find the sample for the round up.

And for the header file check, I initially put this check directly in
the fixmap.h like:

#if ((__end_of_permanent_fixed_addresses + 511)/512 > FIXMAP_PMD_NUM)
#erro "some warning...."
#endif

But failed to compile, I got the warning
 warning: "__end_of_permanent_fixed_addresses" is not defined

 I guess the header file preprocessing can't handle it.

> 
> > +#endif
> > +
> >  	if (idx >= __end_of_fixed_addresses) {
> >  		BUG();
> >  		return;
> > diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> > index 45b700ac5fe7..b0aa2364a3c6 100644
> > --- a/arch/x86/xen/mmu_pv.c
> > +++ b/arch/x86/xen/mmu_pv.c
> > @@ -1953,7 +1953,10 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
> >  	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
> >  	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
> >  	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
> > -	set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO);
> > +
> > +	for (i = 0; i < FIXMAP_PMD_NUM; i++)
> > +		set_page_prot(level1_fixmap_pgt + i * 512,
> > +				PAGE_KERNEL_RO);
> 
> The line break is there because?
Will fix it. I used a MACRO before changing to 512  and forgot to remove
the line break :) 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-16 15:06                                           ` Feng Tang
@ 2018-09-17  7:01                                             ` Feng Tang
  2018-09-17  7:01                                               ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Feng Tang @ 2018-09-17  7:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Peter Zijlstra, LKML, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

Hi Thomas,

On Sun, Sep 16, 2018 at 11:06:41PM +0800, Feng Tang wrote:
> Hi Thomas,
> 
> On Sun, Sep 16, 2018 at 04:43:55PM +0200, Thomas Gleixner wrote:
> > On Sun, 16 Sep 2018, Feng Tang wrote:
> > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > > index e848a4811785..a927f5f39bee 100644
> > > --- a/arch/x86/mm/pgtable.c
> > > +++ b/arch/x86/mm/pgtable.c
> > > @@ -637,6 +637,16 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
> > >  {
> > >  	unsigned long address = __fix_to_virt(idx);
> > >  
> > > +#ifdef CONFIG_X86_64
> > > +       /*
> > > +        * In arch/x86/kernel/head_64.S, the static page table has
> > > +	* been setup for fixmap. Add a sanity compiling check
> > > +	* whether fixmap space has exceeded the capacity.
> > > +        */
> > > +       BUILD_BUG_ON((__end_of_permanent_fixed_addresses + 511)/512
> > > +                       > FIXMAP_PMD_NUM);
> > 
> > There exist macros for rounding up and doing this in the header file is
> > perfectly fine.
> 
> I'll find the sample for the round up.
> 
> And for the header file check, I initially put this check directly in
> the fixmap.h like:
> 
> #if ((__end_of_permanent_fixed_addresses + 511)/512 > FIXMAP_PMD_NUM)
> #erro "some warning...."
> #endif
> 
> But failed to compile, I got the warning
>  warning: "__end_of_permanent_fixed_addresses" is not defined
> 
>  I guess the header file preprocessing can't handle it.

Below is a patch against 4.19-rc4, verified with earlycon
working fine, please review, thanks!

- Feng

---

From 2710b48eb29a86c8ee568d00d5b04820c3fe1f3a Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Mon, 17 Sep 2018 12:12:08 +0800
Subject: [PATCH] x86/mm: Expand static page table for fixmap space

We met a kernel panic when enabling earlycon, which is due to
the fixmap address of earlycon is not statically setup.

Currently the static fixmap setup in head_64.S only covers 2M
virtual address space, while it acutually could be in 4M space
with different kernel configurations.

So increase the static space to 4M for now by defining FIXMAP_PMD_NUM
to 2, and add a build time check for future possible overflow so that
the macro could be increased accordingly.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/include/asm/fixmap.h     |  3 +++
 arch/x86/include/asm/pgtable_64.h |  3 ++-
 arch/x86/kernel/head_64.S         | 15 +++++++++++----
 arch/x86/mm/pgtable.c             | 10 ++++++++++
 arch/x86/xen/mmu_pv.c             |  4 +++-
 5 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index e203169..ffaa0fa 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -14,6 +14,9 @@
 #ifndef _ASM_X86_FIXMAP_H
 #define _ASM_X86_FIXMAP_H
 
+/* Put here to be accessble for assembly code like head_64.S */
+#define FIXMAP_PMD_NUM	2
+
 #ifndef __ASSEMBLY__
 #include <linux/kernel.h>
 #include <asm/acpi.h>
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index ce2b590..9c85b54 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -14,6 +14,7 @@
 #include <asm/processor.h>
 #include <linux/bitops.h>
 #include <linux/threads.h>
+#include <asm/fixmap.h>
 
 extern p4d_t level4_kernel_pgt[512];
 extern p4d_t level4_ident_pgt[512];
@@ -22,7 +23,7 @@ extern pud_t level3_ident_pgt[512];
 extern pmd_t level2_kernel_pgt[512];
 extern pmd_t level2_fixmap_pgt[512];
 extern pmd_t level2_ident_pgt[512];
-extern pte_t level1_fixmap_pgt[512];
+extern pte_t level1_fixmap_pgt[512 * FIXMAP_PMD_NUM];
 extern pgd_t init_top_pgt[];
 
 #define swapper_pg_dir init_top_pgt
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 15ebc2f..985eaff 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -24,6 +24,7 @@
 #include "../entry/calling.h"
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
+#include <asm/fixmap.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -445,13 +446,19 @@ NEXT_PAGE(level2_kernel_pgt)
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
 NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
+	.fill	(512 - 4 - FIXMAP_PMD_NUM),8,0
+	pgtno = 0
+	.rept (FIXMAP_PMD_NUM)
+	.quad level1_fixmap_pgt + (pgtno << PAGE_SHIFT) - __START_KERNEL_map  + _PAGE_TABLE_NOENC
+	pgtno = pgtno + 1
+	.endr
+	/* 6 MB reserved space + a 2MB hole */
+	.fill	4,8,0
 
 NEXT_PAGE(level1_fixmap_pgt)
+	.rept (FIXMAP_PMD_NUM)
 	.fill	512,8,0
+	.endr
 
 #undef PMDS
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ae39455..c9d94d0 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -637,6 +637,16 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
 {
 	unsigned long address = __fix_to_virt(idx);
 
+#ifdef CONFIG_X86_64
+       /*
+        * In arch/x86/kernel/head_64.S, the static page table has
+	* been setup for fixmap. Add a sanity compiling check
+	* whether fixmap space has exceeded the capacity.
+        */
+	BUILD_BUG_ON(__end_of_permanent_fixed_addresses
+			> (FIXMAP_PMD_NUM * PTRS_PER_PMD));
+#endif
+
 	if (idx >= __end_of_fixed_addresses) {
 		BUG();
 		return;
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 2fe5c9b..19077f9 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1952,7 +1952,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO);
+
+	for (i = 0; i < FIXMAP_PMD_NUM; i++)
+		set_page_prot(level1_fixmap_pgt + i * 512, PAGE_KERNEL_RO);
 
 	/* Pin down new L4 */
 	pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM
  2018-09-17  7:01                                             ` Feng Tang
@ 2018-09-17  7:01                                               ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-17  7:01 UTC (permalink / raw)
  To: Feng Tang
  Cc: Michal Hocko, Peter Zijlstra, LKML, x86, Ingo Molnar,
	H . Peter Anvin, Yinghai Lu, Dave Hansen, Andi Kleen

On Mon, 17 Sep 2018, Feng Tang wrote:
> 
> Below is a patch against 4.19-rc4, verified with earlycon
> working fine, please review, thanks!

Aside of the whitespace damage this looks about right. Care to send a
proper patch?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-09-17  7:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30  9:03 [PATCH] x86, mm: Reserver some memory for bootmem allocator for NO_BOOTMEM Feng Tang
2018-08-30 10:44 ` Peter Zijlstra
2018-08-30 11:12   ` Michal Hocko
2018-08-30 11:54     ` Thomas Gleixner
2018-08-30 12:49       ` Michal Hocko
2018-08-30 12:59         ` Feng Tang
2018-08-30 13:05           ` Thomas Gleixner
2018-08-30 13:19             ` Feng Tang
2018-08-30 13:25               ` Thomas Gleixner
2018-08-30 13:55                 ` Feng Tang
2018-08-31  6:15                 ` Feng Tang
2018-08-31 11:33                   ` Thomas Gleixner
2018-08-31 13:36                     ` Feng Tang
2018-09-07  8:17                       ` Feng Tang
2018-09-07 10:52                         ` Thomas Gleixner
2018-09-10  9:39                           ` Feng Tang
2018-09-10  9:53                             ` Thomas Gleixner
2018-09-11  6:14                               ` Feng Tang
2018-09-15 10:29                                 ` Thomas Gleixner
2018-09-15 16:47                                   ` Feng Tang
2018-09-15 17:28                                     ` Thomas Gleixner
2018-09-16 14:35                                       ` Feng Tang
2018-09-16 14:43                                         ` Thomas Gleixner
2018-09-16 15:06                                           ` Feng Tang
2018-09-17  7:01                                             ` Feng Tang
2018-09-17  7:01                                               ` Thomas Gleixner
2018-08-30 12:09     ` Peter Zijlstra
2018-08-30 12:39       ` Michal Hocko
2018-08-30 12:43   ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).