All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks
@ 2019-04-05 15:06 Thomas Gleixner
  2019-04-05 15:06 ` [patch V2 01/29] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
                   ` (28 more replies)
  0 siblings, 29 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:06 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Hi!

This is an updated version of the initial patch set which just covered the
exception (IST) stacks.

   https://lkml.kernel.org/r/20190331214020.836098943@linutronix.de

Aside of addressing the review comments the main change of V2 is that I
picked up the WIP series from Andy which adds guard pages to the interrupt
stack. With that _all_ stacks used on x86/64 have guard pages.

Changes vs. V1:

  - Correct dumpstack off by ones (Andy)

  - Split the debug stack into separate mappings with guard pages

  - Simplified the macro maze in the dumpstack speedup patch. Hopefully
    Josh likes that version better, otherwise I just sulk and reduce it to
    the quick range check.

  - Prepare for interrupt stack guards (Andy, myself)

  - Prevent crashing in the 32bit cpu bringup code when page allocation
    fails.

  - Add guard pages to the interrupt stack (Andy, todo's addressed by me)
  
  - Intermediate cleanup of the stack debugging code in irq_64 to
    address the backwards top/down variable names which caused me
    to get it wrong more than once.

    Kinda pointless exercise because the code is removed in the last patch
    as now all stacks have guard pages which catch overflow reliably
    instead of having heuristics which just add overhead to the interrupt
    fast path.

  - Addressed review comments

The lot is also available from tip:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/stackguards

	2bf08cce47f7 ("x86/irq/64: Remove stack overflow debug code")

Thanks,

	tglx

8<-------------
 Documentation/x86/kernel-stacks       |   13 +++-
 arch/x86/Kconfig                      |    2 
 arch/x86/entry/entry_64.S             |   16 ++---
 arch/x86/include/asm/cpu_entry_area.h |   73 +++++++++++++++++++++++--
 arch/x86/include/asm/debugreg.h       |    2 
 arch/x86/include/asm/irq.h            |    6 --
 arch/x86/include/asm/page_32_types.h  |    8 +-
 arch/x86/include/asm/page_64_types.h  |   15 ++---
 arch/x86/include/asm/processor.h      |   46 ++++++---------
 arch/x86/include/asm/smp.h            |    2 
 arch/x86/include/asm/stackprotector.h |    6 +-
 arch/x86/include/asm/stacktrace.h     |    2 
 arch/x86/kernel/asm-offsets_64.c      |    4 +
 arch/x86/kernel/cpu/common.c          |   60 +++-----------------
 arch/x86/kernel/dumpstack_32.c        |    8 +-
 arch/x86/kernel/dumpstack_64.c        |   99 +++++++++++++++++++++++-----------
 arch/x86/kernel/head_64.S             |    2 
 arch/x86/kernel/idt.c                 |   19 +++---
 arch/x86/kernel/irq_32.c              |   41 +++++++-------
 arch/x86/kernel/irq_64.c              |   88 +++++++++++++++---------------
 arch/x86/kernel/irqinit.c             |    4 -
 arch/x86/kernel/nmi.c                 |   20 ++++++
 arch/x86/kernel/setup_percpu.c        |    5 -
 arch/x86/kernel/smpboot.c             |   15 ++++-
 arch/x86/kernel/vmlinux.lds.S         |    7 +-
 arch/x86/mm/cpu_entry_area.c          |   64 +++++++++++++++------
 arch/x86/mm/fault.c                   |    3 -
 arch/x86/tools/relocs.c               |    2 
 arch/x86/xen/smp_pv.c                 |    4 +
 arch/x86/xen/xen-head.S               |   10 +--
 drivers/xen/events/events_base.c      |    1 
 31 files changed, 375 insertions(+), 272 deletions(-)




^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 01/29] x86/irq/64: Limit IST stack overflow check to #DB stack
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
@ 2019-04-05 15:06 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 02/29] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:06 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson,
	Mitsuo Hayasaka

commit 37fe6a42b343 ("x86: Check stack overflow in detail") added a broad
check for the full exception stack area, i.e. it considers the full
exception stack area as valid.

That's wrong in two aspects:

 1) It does not check the individual areas one by one

 2) #DF, NMI and #MCE are not enabling interrupts which means that a
    regular device interrupt cannot happen in their context. In fact if a
    device interrupt hits one of those IST stacks that's a bug because some
    code path enabled interrupts while handling the exception.

Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
---
 arch/x86/kernel/irq_64.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -26,9 +26,18 @@ int sysctl_panic_on_stackoverflow;
 /*
  * Probabilistic stack overflow check:
  *
- * Only check the stack in process context, because everything else
- * runs on the big interrupt stacks. Checking reliably is too expensive,
- * so we just check from interrupts.
+ * Regular device interrupts can enter on the following stacks:
+ *
+ * - User stack
+ *
+ * - Kernel task stack
+ *
+ * - Interrupt stack if a device driver reenables interrupts
+ *   which should only happen in really old drivers.
+ *
+ * - Debug IST stack
+ *
+ * All other contexts are invalid.
  */
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
@@ -53,8 +62,8 @@ static inline void stack_overflow_check(
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[0] - EXCEPTION_STKSZ + STACK_TOP_MARGIN;
-	estack_bottom = (u64)oist->ist[N_EXCEPTION_STACKS - 1];
+	estack_bottom = (u64)oist->ist[DEBUG_STACK];
+	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
 	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
 		return;
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 02/29] x86/dumpstack: Fix off-by-one errors in stack identification
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
  2019-04-05 15:06 ` [patch V2 01/29] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:44   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

The get_stack_info() function is off-by-one when checking whether an
address is on a IRQ stack or a IST stack.  This prevents a overflowed IRQ
or IST stack from being dumped properly.

[ tglx: Do the same for 32-bit ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

---
 arch/x86/kernel/dumpstack_32.c |    4 ++--
 arch/x86/kernel/dumpstack_64.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -41,7 +41,7 @@ static bool in_hardirq_stack(unsigned lo
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack > end)
 		return false;
 
 	info->type	= STACK_TYPE_IRQ;
@@ -66,7 +66,7 @@ static bool in_softirq_stack(unsigned lo
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack > end)
 		return false;
 
 	info->type	= STACK_TYPE_SOFTIRQ;
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -65,7 +65,7 @@ static bool in_exception_stack(unsigned
 		begin = end - (exception_stack_sizes[k] / sizeof(long));
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack <= begin || stack >= end)
+		if (stack < begin || stack >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
@@ -88,7 +88,7 @@ static bool in_irq_stack(unsigned long *
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack >= end)
 		return false;
 
 	info->type	= STACK_TYPE_IRQ;



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
  2019-04-05 15:06 ` [patch V2 01/29] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 02/29] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 16:37   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 04/29] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

stack_overflow_check() is using both irq_stack_ptr and irq_stack_union to
find the IRQ stack. That's going to break when vmapped irq stacks are
introduced.

Change it to just use irq_stack_ptr.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/irq_64.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -55,9 +55,8 @@ static inline void stack_overflow_check(
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
-			STACK_TOP_MARGIN;
 	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
 	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
 		return;
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 04/29] x86/irq/64: Sanitize the top/bottom confusion
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 16:54   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 05/29] x86/idt: Remove unused macro SISTG Thomas Gleixner
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

On x86 stacks go top to bottom, but the stack overflow check uses it the
other way round, which is just confusing. Clean it up and sanitize the
warning string a bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq_64.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -42,7 +42,7 @@ int sysctl_panic_on_stackoverflow;
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
-#define STACK_TOP_MARGIN	128
+#define STACK_MARGIN	128
 	struct orig_ist *oist;
 	u64 irq_stack_top, irq_stack_bottom;
 	u64 estack_top, estack_bottom;
@@ -51,25 +51,25 @@ static inline void stack_overflow_check(
 	if (user_mode(regs))
 		return;
 
-	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_TOP_MARGIN &&
+	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_MARGIN &&
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
-	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
-	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
+	irq_stack_top = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
+	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_bottom = (u64)oist->ist[DEBUG_STACK];
-	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
-	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
+	estack_top = (u64)oist->ist[DEBUG_STACK];
+	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
+	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
 
-	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx,ip:%pF)\n",
+	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx, irq stack:%Lx-%Lx, exception stack: %Lx-%Lx, ip:%pF)\n",
 		current->comm, curbase, regs->sp,
-		irq_stack_top, irq_stack_bottom,
-		estack_top, estack_bottom, (void *)regs->ip);
+		irq_stack_bottom, irq_stack_top,
+		estack_bottom, estack_top, (void *)regs->ip);
 
 	if (sysctl_panic_on_stackoverflow)
 		panic("low stack detected by irq handler - check messages\n");



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 05/29] x86/idt: Remove unused macro SISTG
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 04/29] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 06/29] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

commit d8ba61ba58c8 ("x86/entry/64: Don't use IST entry for #BP stack")
removed the last user but left the macro around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/idt.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -45,10 +45,6 @@ struct idt_data {
 #define ISTG(_vector, _addr, _ist)			\
 	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
-/* System interrupt gate with interrupt stack */
-#define SISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL3, __KERNEL_CS)
-
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
 	G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 06/29] x86/exceptions: Remove unused stack defines on 32bit
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 05/29] x86/idt: Remove unused macro SISTG Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 07/29] x86/exceptions: Make IST index zero based Thomas Gleixner
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Nothing requires those for 32bit builds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/page_32_types.h |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,11 +22,7 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 0
-#define DEBUG_STACK 0
-#define MCE_STACK 0
-#define N_EXCEPTION_STACKS 1
+#define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
 /*



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 07/29] x86/exceptions: Make IST index zero based
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 06/29] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 18:59   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 08/29] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The defines for the exception stack (IST) array in the TSS are using the
SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
array indices related to IST. That's confusing at best and does not provide
any value.

Make the indices zero based and fixup the usage sites. The only code which
needs to adjust the 0 based index is the interrupt descriptor setup which
needs to add 1 now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/x86/kernel-stacks      |    8 ++++----
 arch/x86/entry/entry_64.S            |    4 ++--
 arch/x86/include/asm/page_64_types.h |   13 ++++++++-----
 arch/x86/kernel/cpu/common.c         |    4 ++--
 arch/x86/kernel/dumpstack_64.c       |   14 +++++++-------
 arch/x86/kernel/idt.c                |   15 +++++++++------
 arch/x86/kernel/irq_64.c             |    2 +-
 arch/x86/mm/fault.c                  |    2 +-
 8 files changed, 34 insertions(+), 28 deletions(-)

--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -59,7 +59,7 @@ If that assumption is ever broken then t
 
 The currently assigned IST stacks are :-
 
-* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ISTACK_DF.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).
 
@@ -68,7 +68,7 @@ The currently assigned IST stacks are :-
   Using a separate stack allows the kernel to recover from it well enough
   in many cases to still output an oops.
 
-* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ISTACK_NMI.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for non-maskable interrupts (NMI).
 
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* DEBUG_STACK.  DEBUG_STKSZ
+* ISTACK_DB.  DEBUG_STKSZ
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,7 +86,7 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
-* MCE_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ISTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
 
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -841,7 +841,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8)
 
 /**
  * idtentry - Generate an IDT entry stub
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=ISTACK_DB
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -25,11 +25,14 @@
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 2
-#define DEBUG_STACK 3
-#define MCE_STACK 4
-#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
+/*
+ * The index for the tss.ist[] array. The hardware limit is 7 entries.
+ */
+#define	ISTACK_DF		0
+#define	ISTACK_NMI		1
+#define	ISTACK_DB		2
+#define	ISTACK_MCE		3
+#define	N_EXCEPTION_STACKS	4
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -516,7 +516,7 @@ DEFINE_PER_CPU(struct cpu_entry_area *,
  */
 static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
+	  [ISTACK_DB]				= DEBUG_STKSZ
 };
 #endif
 
@@ -1760,7 +1760,7 @@ void cpu_init(void)
 			estacks += exception_stack_sizes[v];
 			oist->ist[v] = t->x86_tss.ist[v] =
 					(unsigned long)estacks;
-			if (v == DEBUG_STACK-1)
+			if (v == ISTACK_DB)
 				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
 		}
 	}
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -18,16 +18,16 @@
 
 #include <asm/stacktrace.h>
 
-static char *exception_stack_names[N_EXCEPTION_STACKS] = {
-		[ DOUBLEFAULT_STACK-1	]	= "#DF",
-		[ NMI_STACK-1		]	= "NMI",
-		[ DEBUG_STACK-1		]	= "#DB",
-		[ MCE_STACK-1		]	= "#MC",
+static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ ISTACK_DF	]	= "#DF",
+		[ ISTACK_NMI	]	= "NMI",
+		[ ISTACK_DB	]	= "#DB",
+		[ ISTACK_MCE	]	= "#MC",
 };
 
-static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[DEBUG_STACK - 1]			= DEBUG_STKSZ
+	[ISTACK_DB]				= DEBUG_STKSZ
 };
 
 const char *stack_type_name(enum stack_type type)
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -41,9 +41,12 @@ struct idt_data {
 #define SYSG(_vector, _addr)				\
 	G(_vector, _addr, DEFAULT_STACK, GATE_INTERRUPT, DPL3, __KERNEL_CS)
 
-/* Interrupt gate with interrupt stack */
+/*
+ * Interrupt gate with interrupt stack. The _ist index is the index in
+ * the tss.ist[] array, but for the descriptor it needs to start at 1.
+ */
 #define ISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
+	G(_vector, _addr, _ist + 1, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
@@ -180,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] _
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		DEBUG_STACK),
-	ISTG(X86_TRAP_NMI,	nmi,		NMI_STACK),
-	ISTG(X86_TRAP_DF,	double_fault,	DOUBLEFAULT_STACK),
+	ISTG(X86_TRAP_DB,	debug,		ISTACK_DB),
+	ISTG(X86_TRAP_NMI,	nmi,		ISTACK_NMI),
+	ISTG(X86_TRAP_DF,	double_fault,	ISTACK_DF),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	MCE_STACK),
+	ISTG(X86_TRAP_MC,	&machine_check,	ISTACK_MCE),
 #endif
 };
 
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -61,7 +61,7 @@ static inline void stack_overflow_check(
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[DEBUG_STACK];
+	estack_top = (u64)oist->ist[ISTACK_DB];
 	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
 	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -793,7 +793,7 @@ no_context(struct pt_regs *regs, unsigne
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
+		unsigned long stack = this_cpu_read(orig_ist.ist[ISTACK_DF]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 08/29] x86/cpu_entry_area: Cleanup setup functions
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 07/29] x86/exceptions: Make IST index zero based Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 19:25   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 09/29] x86/exceptions: Add structs for exception stacks Thomas Gleixner
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

No point in retrieving the entry area pointer over and over. Do it once and
use unsigned int for 'cpu' consistently.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/cpu_entry_area.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -52,10 +52,10 @@ cea_map_percpu_pages(void *cea_vaddr, vo
 		cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
 }
 
-static void __init percpu_setup_debug_store(int cpu)
+static void __init percpu_setup_debug_store(unsigned int cpu)
 {
 #ifdef CONFIG_CPU_SUP_INTEL
-	int npages;
+	unsigned int npages;
 	void *cea;
 
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
@@ -79,8 +79,9 @@ static void __init percpu_setup_debug_st
 }
 
 /* Setup the fixmap mappings only once per-processor */
-static void __init setup_cpu_entry_area(int cpu)
+static void __init setup_cpu_entry_area(unsigned int cpu)
 {
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
 #ifdef CONFIG_X86_64
 	/* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
 	pgprot_t gdt_prot = PAGE_KERNEL_RO;
@@ -101,10 +102,9 @@ static void __init setup_cpu_entry_area(
 	pgprot_t tss_prot = PAGE_KERNEL;
 #endif
 
-	cea_set_pte(&get_cpu_entry_area(cpu)->gdt, get_cpu_gdt_paddr(cpu),
-		    gdt_prot);
+	cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);
 
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->entry_stack_page,
+	cea_map_percpu_pages(&cea->entry_stack_page,
 			     per_cpu_ptr(&entry_stack_storage, cpu), 1,
 			     PAGE_KERNEL);
 
@@ -128,19 +128,18 @@ static void __init setup_cpu_entry_area(
 	BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
 		      offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
 	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->tss,
-			     &per_cpu(cpu_tss_rw, cpu),
+	cea_map_percpu_pages(&cea->tss, &per_cpu(cpu_tss_rw, cpu),
 			     sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
 
 #ifdef CONFIG_X86_32
-	per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
+	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
 		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->exception_stacks,
+	cea_map_percpu_pages(&cea->exception_stacks,
 			     &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 09/29] x86/exceptions: Add structs for exception stacks
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 08/29] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 20:48   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 10/29] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
                   ` (19 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

At the moment everything assumes a full linear mapping of the various
exception stacks. Adding guard pages to the cpu entry area mapping of the
exception stacks will break that assumption.

As a preparatory step convert both the real storage and the effective
mapping in the cpu entry area from character arrays to structures.

To ensure that both arrays have the same ordering and the same size of the
individual stacks fill the members with a macro. The guard size is the only
difference between the two resulting structures. For now both have guard
size 0 until the preparation of all usage sites is done.

Provide a couple of helper macros which are used in the following
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |   51 ++++++++++++++++++++++++++++++----
 arch/x86/kernel/cpu/common.c          |    2 -
 arch/x86/mm/cpu_entry_area.c          |    8 ++---
 3 files changed, 50 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -7,6 +7,50 @@
 #include <asm/processor.h>
 #include <asm/intel_ds.h>
 
+#ifdef CONFIG_X86_64
+
+/* Macro to enforce the same ordering and stack sizes */
+#define ESTACKS_MEMBERS(guardsize)		\
+	char	DF_stack[EXCEPTION_STKSZ];	\
+	char	DF_stack_guard[guardsize];	\
+	char	NMI_stack[EXCEPTION_STKSZ];	\
+	char	NMI_stack_guard[guardsize];	\
+	char	DB_stack[DEBUG_STKSZ];		\
+	char	DB_stack_guard[guardsize];	\
+	char	MCE_stack[EXCEPTION_STKSZ];	\
+	char	MCE_stack_guard[guardsize];	\
+
+/* The exception stacks linear storage. No guard pages required */
+struct exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+/*
+ * The effective cpu entry area mapping with guard pages. Guard size is
+ * zero until the code which makes assumptions about linear mapping is
+ * cleaned up.
+ */
+struct cea_exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+#define CEA_ESTACK_TOP(ceastp, st)			\
+	((unsigned long)&(ceastp)->st## _stack_guard)
+
+#define CEA_ESTACK_BOT(ceastp, st)			\
+	((unsigned long)&(ceastp)->st## _stack)
+
+#define CEA_ESTACK_OFFS(st)					\
+	offsetof(struct cea_exception_stacks, st## _stack)
+
+#define CEA_ESTACK_SIZE(st)					\
+	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
+
+#define CEA_ESTACK_PAGES					\
+	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
+
+#endif
+
 /*
  * cpu_entry_area is a percpu region that contains things needed by the CPU
  * and early entry/exit code.  Real types aren't used for all fields here
@@ -32,12 +76,9 @@ struct cpu_entry_area {
 
 #ifdef CONFIG_X86_64
 	/*
-	 * Exception stacks used for IST entries.
-	 *
-	 * In the future, this should have a separate slot for each stack
-	 * with guard pages between them.
+	 * Exception stacks used for IST entries with guard pages.
 	 */
-	char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
+	struct cea_exception_stacks estacks;
 #endif
 #ifdef CONFIG_CPU_SUP_INTEL
 	/*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1754,7 +1754,7 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!oist->ist[0]) {
-		char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
+		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
 
 		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
 			estacks += exception_stack_sizes[v];
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -13,8 +13,7 @@
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
 
 #ifdef CONFIG_X86_64
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
-	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -138,9 +137,8 @@ static void __init setup_cpu_entry_area(
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&cea->exception_stacks,
-			     &per_cpu(exception_stacks, cpu),
+		     sizeof(((struct cpu_entry_area *)0)->estacks));
+	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif
 	percpu_setup_debug_store(cpu);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 10/29] x86/cpu_entry_area: Prepare for IST guard pages
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 09/29] x86/exceptions: Add structs for exception stacks Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 11/29] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

To allow guard pages between the IST stacks each stack needs to be mapped
individually.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/cpu_entry_area.c |   37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -77,6 +77,34 @@ static void __init percpu_setup_debug_st
 #endif
 }
 
+#ifdef CONFIG_X86_64
+
+#define cea_map_stack(name) do {					\
+	npages = sizeof(estacks->name## _stack) / PAGE_SIZE;		\
+	cea_map_percpu_pages(cea->estacks.name## _stack,		\
+			estacks->name## _stack, npages, PAGE_KERNEL);	\
+	} while (0)
+
+static void __init percpu_setup_exception_stacks(unsigned int cpu)
+{
+	struct exception_stacks *estacks = per_cpu_ptr(&exception_stacks, cpu);
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
+	unsigned int npages;
+
+	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+	/*
+	 * The exceptions stack mappings in the per cpu area are protected
+	 * by guard pages so each stack must be mapped separately.
+	 */
+	cea_map_stack(DF);
+	cea_map_stack(NMI);
+	cea_map_stack(DB);
+	cea_map_stack(MCE);
+}
+#else
+static inline void percpu_setup_exception_stacks(unsigned int cpu) {}
+#endif
+
 /* Setup the fixmap mappings only once per-processor */
 static void __init setup_cpu_entry_area(unsigned int cpu)
 {
@@ -134,13 +162,8 @@ static void __init setup_cpu_entry_area(
 	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
-#ifdef CONFIG_X86_64
-	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
-	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->estacks));
-	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
-			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
-#endif
+	percpu_setup_exception_stacks(cpu);
+
 	percpu_setup_debug_store(cpu);
 }
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 11/29] x86/cpu_entry_area: Provide exception stack accessor
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 10/29] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 12/29] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Store a pointer to the per cpu entry area exception stack mappings to allow
fast retrieval.

Required for converting various places from using the shadow IST array to
directly doing address calculations on the actual mapping address.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |    4 ++++
 arch/x86/mm/cpu_entry_area.c          |    4 ++++
 2 files changed, 8 insertions(+)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -98,6 +98,7 @@ struct cpu_entry_area {
 #define CPU_ENTRY_AREA_TOT_SIZE	(CPU_ENTRY_AREA_SIZE * NR_CPUS)
 
 DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks);
 
 extern void setup_cpu_entry_areas(void);
 extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
@@ -117,4 +118,7 @@ static inline struct entry_stack *cpu_en
 	return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
 }
 
+#define __this_cpu_ist_top_va(name)					\
+	CEA_ESTACK_TOP(__this_cpu_read(cea_exception_stacks), name)
+
 #endif
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -14,6 +14,7 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(struc
 
 #ifdef CONFIG_X86_64
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
+DEFINE_PER_CPU(struct cea_exception_stacks*, cea_exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -92,6 +93,9 @@ static void __init percpu_setup_exceptio
 	unsigned int npages;
 
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+
+	per_cpu(cea_exception_stacks, cpu) = &cea->estacks;
+
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
 	 * by guard pages so each stack must be mapped separately.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 12/29] x86/traps: Use cpu_entry_area instead of orig_ist
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (10 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 11/29] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 13/29] x86/irq/64: Use cpu entry area " Thomas Gleixner
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/fault.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
 #include <asm/efi.h>			/* efi_recover_from_page_fault()*/
 #include <asm/desc.h>			/* store_idt(), ...		*/
+#include <asm/cpu_entry_area.h>		/* exception stack		*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -793,7 +794,7 @@ no_context(struct pt_regs *regs, unsigne
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[ISTACK_DF]) - sizeof(void *);
+		unsigned long stack = __this_cpu_ist_top_va(DF) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 13/29] x86/irq/64: Use cpu entry area instead of orig_ist
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (11 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 12/29] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 14/29] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq_64.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,8 @@
 #include <linux/uaccess.h>
 #include <linux/smp.h>
 #include <linux/sched/task_stack.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
@@ -43,10 +45,9 @@ static inline void stack_overflow_check(
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 #define STACK_MARGIN	128
-	struct orig_ist *oist;
-	u64 irq_stack_top, irq_stack_bottom;
-	u64 estack_top, estack_bottom;
+	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
 	u64 curbase = (u64)task_stack_page(current);
+	struct cea_exception_stacks *estacks;
 
 	if (user_mode(regs))
 		return;
@@ -60,9 +61,9 @@ static inline void stack_overflow_check(
 	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
 
-	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[ISTACK_DB];
-	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
+	estacks = __this_cpu_read(cea_exception_stacks);
+	estack_top = CEA_ESTACK_TOP(estacks, DB);
+	estack_bottom = CEA_ESTACK_BOT(estacks, DB) + STACK_MARGIN;
 	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 14/29] x86/dumpstack/64: Use cpu_entry_area instead of orig_ist
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (12 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 13/29] x86/irq/64: Use cpu entry area " Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 15/29] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c |   39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/nmi.h>
 
+#include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
 static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
@@ -25,11 +26,6 @@ static const char *exception_stack_names
 		[ ISTACK_MCE	]	= "#MC",
 };
 
-static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[ISTACK_DB]				= DEBUG_STKSZ
-};
-
 const char *stack_type_name(enum stack_type type)
 {
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
@@ -52,25 +48,44 @@ const char *stack_type_name(enum stack_t
 	return NULL;
 }
 
+struct estack_layout {
+	unsigned int	begin;
+	unsigned int	end;
+};
+
+#define	ESTACK_ENTRY(x)	{						  \
+	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
+	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
+	}
+
+static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+	[ ISTACK_DF	]	= ESTACK_ENTRY(DF),
+	[ ISTACK_NMI	]	= ESTACK_ENTRY(NMI),
+	[ ISTACK_DB	]	= ESTACK_ENTRY(DB),
+	[ ISTACK_MCE	]	= ESTACK_ENTRY(MCE),
+};
+
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin, *end;
+	unsigned long estacks, begin, end, stk = (unsigned long)stack;
 	struct pt_regs *regs;
-	unsigned k;
+	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
+	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		begin = estacks + layout[k].begin;
+		end   = estacks + layout[k].end;
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack < begin || stack >= end)
+		if (stk < begin || stk >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= begin;
-		info->end	= end;
+		info->begin	= (unsigned long *)begin;
+		info->end	= (unsigned long *)end;
 		info->next_sp	= (unsigned long *)regs->sp;
 
 		return true;



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 15/29] x86/cpu: Prepare TSS.IST setup for guard pages
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (13 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 14/29] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 16/29] x86/cpu: Remove orig_ist array Thomas Gleixner
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Convert the TSS.IST setup code to use the cpu entry area information
directly instead of assuming a linear mapping of the IST stacks.

The store to orig_ist[] is no longer required as there are no users
anymore.

This is the last preparatory step for IST guard pages.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/common.c |   35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -507,19 +507,6 @@ void load_percpu_segment(int cpu)
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
 #endif
 
-#ifdef CONFIG_X86_64
-/*
- * Special IST stacks which the CPU switches to when it calls
- * an IST-marked descriptor entry. Up to 7 stacks (hardware
- * limit), all of them are 4K, except the debug stack which
- * is 8K.
- */
-static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [ISTACK_DB]				= DEBUG_STKSZ
-};
-#endif
-
 /* Load the original GDT from the per-cpu structure */
 void load_direct_gdt(int cpu)
 {
@@ -1690,17 +1677,14 @@ static void setup_getcpu(int cpu)
  * initialized (naturally) in the bootstrap process, such as the GDT
  * and IDT. We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
- * A lot of state is already set up in PDA init for 64 bit
  */
 #ifdef CONFIG_X86_64
 
 void cpu_init(void)
 {
-	struct orig_ist *oist;
+	int cpu = raw_smp_processor_id();
 	struct task_struct *me;
 	struct tss_struct *t;
-	unsigned long v;
-	int cpu = raw_smp_processor_id();
 	int i;
 
 	wait_for_master_cpu(cpu);
@@ -1715,7 +1699,6 @@ void cpu_init(void)
 		load_ucode_ap();
 
 	t = &per_cpu(cpu_tss_rw, cpu);
-	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
 	if (this_cpu_read(numa_node) == 0 &&
@@ -1753,16 +1736,12 @@ void cpu_init(void)
 	/*
 	 * set up and load the per-CPU TSS
 	 */
-	if (!oist->ist[0]) {
-		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
-
-		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
-			estacks += exception_stack_sizes[v];
-			oist->ist[v] = t->x86_tss.ist[v] =
-					(unsigned long)estacks;
-			if (v == ISTACK_DB)
-				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
-		}
+	if (!t->x86_tss.ist[0]) {
+		t->x86_tss.ist[ISTACK_DF] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[ISTACK_NMI] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[ISTACK_DB] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[ISTACK_MCE] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[ISTACK_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 16/29] x86/cpu: Remove orig_ist array
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (14 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 15/29] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

All users gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    9 ---------
 arch/x86/kernel/cpu/common.c     |    6 ------
 2 files changed, 15 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -374,16 +374,7 @@ DECLARE_PER_CPU(unsigned long, cpu_curre
 #define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
 #endif
 
-/*
- * Save the original ist values for checking stack pointers during debugging
- */
-struct orig_ist {
-	unsigned long		ist[7];
-};
-
 #ifdef CONFIG_X86_64
-DECLARE_PER_CPU(struct orig_ist, orig_ist);
-
 union irq_stack_union {
 	char irq_stack[IRQ_STACK_SIZE];
 	/*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,12 +1549,6 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-/*
- * Copies of the original ist values from the tss are only accessed during
- * debugging, no special alignment required.
- */
-DEFINE_PER_CPU(struct orig_ist, orig_ist);
-
 static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (15 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 16/29] x86/cpu: Remove orig_ist array Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 21:57   ` Josh Poimboeuf
  2019-04-05 15:07 ` [patch V2 18/29] x86/exceptions: Enable IST guard pages Thomas Gleixner
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The entry order of the TSS.IST array and the order of the stack
storage/mapping are not required to be the same.

With the upcoming split of the debug stack this is going to fall apart as
the number of TSS.IST array entries stays the same while the actual stacks
are increasing.

Make them separate so that code like dumpstack can just utilize the mapping
order. The IST index is solely required for the actual TSS.IST array
initialization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S             |    2 +-
 arch/x86/include/asm/cpu_entry_area.h |   11 +++++++++++
 arch/x86/include/asm/page_64_types.h  |    9 ++++-----
 arch/x86/include/asm/stacktrace.h     |    2 ++
 arch/x86/kernel/cpu/common.c          |   10 +++++-----
 arch/x86/kernel/idt.c                 |    8 ++++----
 6 files changed, 27 insertions(+), 15 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=ISTACK_DB
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -34,6 +34,17 @@ struct cea_exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
+/*
+ * The exception stack ordering in [cea_]exception_stacks
+ */
+enum exception_stack_ordering {
+	ISTACK_DF,
+	ISTACK_NMI,
+	ISTACK_DB,
+	ISTACK_MCE,
+	N_EXCEPTION_STACKS
+};
+
 #define CEA_ESTACK_TOP(ceastp, st)			\
 	((unsigned long)&(ceastp)->st## _stack_guard)
 
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -28,11 +28,10 @@
 /*
  * The index for the tss.ist[] array. The hardware limit is 7 entries.
  */
-#define	ISTACK_DF		0
-#define	ISTACK_NMI		1
-#define	ISTACK_DB		2
-#define	ISTACK_MCE		3
-#define	N_EXCEPTION_STACKS	4
+#define	IST_INDEX_DF		0
+#define	IST_INDEX_NMI		1
+#define	IST_INDEX_DB		2
+#define	IST_INDEX_MCE		3
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -9,6 +9,8 @@
 
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/switch_to.h>
 
 enum stack_type {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1731,11 +1731,11 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!t->x86_tss.ist[0]) {
-		t->x86_tss.ist[ISTACK_DF] = __this_cpu_ist_top_va(DF);
-		t->x86_tss.ist[ISTACK_NMI] = __this_cpu_ist_top_va(NMI);
-		t->x86_tss.ist[ISTACK_DB] = __this_cpu_ist_top_va(DB);
-		t->x86_tss.ist[ISTACK_MCE] = __this_cpu_ist_top_va(MCE);
-		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[ISTACK_DB];
+		t->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[IST_INDEX_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -183,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] _
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		ISTACK_DB),
-	ISTG(X86_TRAP_NMI,	nmi,		ISTACK_NMI),
-	ISTG(X86_TRAP_DF,	double_fault,	ISTACK_DF),
+	ISTG(X86_TRAP_DB,	debug,		IST_INDEX_DB),
+	ISTG(X86_TRAP_NMI,	nmi,		IST_INDEX_NMI),
+	ISTG(X86_TRAP_DF,	double_fault,	IST_INDEX_DF),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	ISTACK_MCE),
+	ISTG(X86_TRAP_MC,	&machine_check,	IST_INDEX_MCE),
 #endif
 };
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 18/29] x86/exceptions: Enable IST guard pages
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (16 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 19/29] x86/exceptions: Split debug IST stack Thomas Gleixner
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

All usage sites which expected that the exception stacks in the CPU entry
area are mapped linearly are fixed up. Enable guard pages between the
IST stacks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -25,13 +25,9 @@ struct exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
-/*
- * The effective cpu entry area mapping with guard pages. Guard size is
- * zero until the code which makes assumptions about linear mapping is
- * cleaned up.
- */
+/* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(PAGE_SIZE)
 };
 
 /*



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 19/29] x86/exceptions: Split debug IST stack
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (17 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 18/29] x86/exceptions: Enable IST guard pages Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 20:55   ` Sean Christopherson
  2019-04-05 15:07 ` [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.

The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.

Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:

      --- top of DB_stack	<- Initial stack
      --- end of DB_stack
      	  guard page

      --- top of DB1_stack	<- Top of stack after entering first #DB
      --- end of DB1_stack
      	  guard page

      --- top of DB2_stack	<- Top of stack after entering second #DB
      --- end of DB2_stack	   
      	  guard page

If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.

The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.

 - Adjust the low level entry code so it adjusts top of #DB with the offset
   between the stacks instead of exception stack size.

 - Make the dumpstack code aware of the new stacks.

 - Adjust the in_debug_stack() implementation and move it into the NMI code
   where it belongs. As this is NMI hotpath code, it just checks the full
   area between top of DB_stack and bottom of DB1_stack without checking
   for the guard page. That's correct because the NMI cannot hit a
   stackpointer pointing to the guard page between DB and DB1 stack.  Even
   if it would, then the NMI operation still is unaffected, but the resume
   of the debug exception on the topmost DB stack will crash by touching
   the guard page.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/x86/kernel-stacks       |    7 ++++++-
 arch/x86/entry/entry_64.S             |    8 ++++----
 arch/x86/include/asm/cpu_entry_area.h |   21 ++++++++++++++++-----
 arch/x86/include/asm/debugreg.h       |    2 --
 arch/x86/include/asm/page_64_types.h  |    3 ---
 arch/x86/kernel/asm-offsets_64.c      |    2 ++
 arch/x86/kernel/cpu/common.c          |   11 -----------
 arch/x86/kernel/dumpstack_64.c        |   12 ++++++++----
 arch/x86/kernel/nmi.c                 |   20 +++++++++++++++++++-
 arch/x86/mm/cpu_entry_area.c          |    4 +++-
 10 files changed, 58 insertions(+), 32 deletions(-)

--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* ISTACK_DB.  DEBUG_STKSZ
+* ISTACK_DB.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,6 +86,11 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
+  To handle nested #DB correctly there exist two instances of DB stacks. On
+  #DB entry the IST stackpointer for #DB is switched to the second instance
+  so a nested #DB starts from a clean stack. The nested #DB switches to
+  the IST stackpointer to a guard hole to catch triple nesting.
+
 * ISTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -879,7 +879,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work
  * @paranoid == 2 is special: the stub will never switch stacks.  This is for
  * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
  */
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 
@@ -925,13 +925,13 @@ ENTRY(\sym)
 	.endif
 
 	.if \shift_ist != -1
-	subq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
+	subq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
 	call	\do_sym
 
 	.if \shift_ist != -1
-	addq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
+	addq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
 	/* these procedures expect "no swapgs" flag in ebx */
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -9,25 +9,34 @@
 
 #ifdef CONFIG_X86_64
 
-/* Macro to enforce the same ordering and stack sizes */
-#define ESTACKS_MEMBERS(guardsize)		\
+/*
+ * Macro to enforce the same ordering and stack sizes.
+ *
+ * Note: DB2 stack is never mapped into the cpu_entry_area. It's there to
+ * catch triple nesting of #DB.
+ */
+#define ESTACKS_MEMBERS(guardsize, db2_holesize)\
 	char	DF_stack[EXCEPTION_STKSZ];	\
 	char	DF_stack_guard[guardsize];	\
 	char	NMI_stack[EXCEPTION_STKSZ];	\
 	char	NMI_stack_guard[guardsize];	\
-	char	DB_stack[DEBUG_STKSZ];		\
+	char	DB2_stack[db2_holesize];	\
+	char	DB2_stack_guard[guardsize];	\
+	char	DB1_stack[EXCEPTION_STKSZ];	\
+	char	DB1_stack_guard[guardsize];	\
+	char	DB_stack[EXCEPTION_STKSZ];	\
 	char	DB_stack_guard[guardsize];	\
 	char	MCE_stack[EXCEPTION_STKSZ];	\
 	char	MCE_stack_guard[guardsize];	\
 
 /* The exception stacks linear storage. No guard pages required */
 struct exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(0, 0)
 };
 
 /* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(PAGE_SIZE)
+	ESTACKS_MEMBERS(PAGE_SIZE, EXCEPTION_STKSZ)
 };
 
 /*
@@ -36,6 +45,8 @@ struct cea_exception_stacks {
 enum exception_stack_ordering {
 	ISTACK_DF,
 	ISTACK_NMI,
+	ISTACK_DB2,
+	ISTACK_DB1,
 	ISTACK_DB,
 	ISTACK_MCE,
 	N_EXCEPTION_STACKS
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -104,11 +104,9 @@ static inline void debug_stack_usage_dec
 {
 	__this_cpu_dec(debug_stack_usage);
 }
-int is_debug_stack(unsigned long addr);
 void debug_stack_set_zero(void);
 void debug_stack_reset(void);
 #else /* !X86_64 */
-static inline int is_debug_stack(unsigned long addr) { return 0; }
 static inline void debug_stack_set_zero(void) { }
 static inline void debug_stack_reset(void) { }
 static inline void debug_stack_usage_inc(void) { }
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -19,9 +19,6 @@
 #define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
 #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
 
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
-
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -68,6 +68,8 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
+	DEFINE(DB_STACK_OFFSET, offsetof(struct cea_exception_stacks, DB_stack) -
+	       offsetof(struct cea_exception_stacks, DB1_stack));
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,17 +1549,7 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
-
-int is_debug_stack(unsigned long addr)
-{
-	return __this_cpu_read(debug_stack_usage) ||
-		(addr <= __this_cpu_read(debug_stack_addr) &&
-		 addr > (__this_cpu_read(debug_stack_addr) - DEBUG_STKSZ));
-}
-NOKPROBE_SYMBOL(is_debug_stack);
-
 DEFINE_PER_CPU(u32, debug_idt_ctr);
 
 void debug_stack_set_zero(void)
@@ -1735,7 +1725,6 @@ void cpu_init(void)
 		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
 		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
 		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
-		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[IST_INDEX_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -19,16 +19,18 @@
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
-static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+static const char *exception_stack_names[] = {
 		[ ISTACK_DF	]	= "#DF",
 		[ ISTACK_NMI	]	= "NMI",
+		[ ISTACK_DB2	]	= "#DB2",
+		[ ISTACK_DB1	]	= "#DB1",
 		[ ISTACK_DB	]	= "#DB",
 		[ ISTACK_MCE	]	= "#MC",
 };
 
 const char *stack_type_name(enum stack_type type)
 {
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	if (type == STACK_TYPE_IRQ)
 		return "IRQ";
@@ -58,9 +60,11 @@ struct estack_layout {
 	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
 	}
 
-static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+static const struct estack_layout layout[] = {
 	[ ISTACK_DF	]	= ESTACK_ENTRY(DF),
 	[ ISTACK_NMI	]	= ESTACK_ENTRY(NMI),
+	[ ISTACK_DB2	]	= { .begin = 0, .end = 0},
+	[ ISTACK_DB1	]	= ESTACK_ENTRY(DB1),
 	[ ISTACK_DB	]	= ESTACK_ENTRY(DB),
 	[ ISTACK_MCE	]	= ESTACK_ENTRY(MCE),
 };
@@ -71,7 +75,7 @@ static bool in_exception_stack(unsigned
 	struct pt_regs *regs;
 	unsigned int k;
 
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
 
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -21,13 +21,14 @@
 #include <linux/ratelimit.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/atomic.h>
 #include <linux/sched/clock.h>
 
 #if defined(CONFIG_EDAC)
 #include <linux/edac.h>
 #endif
 
-#include <linux/atomic.h>
+#include <asm/cpu_entry_area.h>
 #include <asm/traps.h>
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
@@ -487,6 +488,23 @@ static DEFINE_PER_CPU(unsigned long, nmi
  * switch back to the original IDT.
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
+
+static bool notrace is_debug_stack(unsigned long addr)
+{
+	struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
+	unsigned long top = CEA_ESTACK_TOP(cs, DB);
+	unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
+
+	if (__this_cpu_read(debug_stack_usage))
+		return true;
+	/*
+	 * Note, this covers the guard page between DB and DB1 as well to
+	 * avoid two checks. But by all means @addr can never point into
+	 * the guard page.
+	 */
+	return addr > bot && addr < top;
+}
+NOKPROBE_SYMBOL(is_debug_stack);
 #endif
 
 dotraplinkage notrace void
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -98,10 +98,12 @@ static void __init percpu_setup_exceptio
 
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
-	 * by guard pages so each stack must be mapped separately.
+	 * by guard pages so each stack must be mapped separately. DB2 is
+	 * not mapped; it just exists to catch triple nesting of #DB.
 	 */
 	cea_map_stack(DF);
 	cea_map_stack(NMI);
+	cea_map_stack(DB1);
 	cea_map_stack(DB);
 	cea_map_stack(MCE);
 }



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (18 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 19/29] x86/exceptions: Split debug IST stack Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 21:55   ` Josh Poimboeuf
  2019-04-05 15:07 ` [patch V2 21/29] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Simplify the macro maze
---
 arch/x86/kernel/dumpstack_64.c |   90 +++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 35 deletions(-)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -50,52 +50,72 @@ const char *stack_type_name(enum stack_t
 	return NULL;
 }
 
-struct estack_layout {
-	unsigned int	begin;
-	unsigned int	end;
+/**
+ * struct estack_pages - Page descriptor for exception stacks
+ * @offs:	Offset from the start of the exception stack area
+ * @size:	Size of the exception stack
+ * @type:	Type to store in the stack_info struct
+ */
+struct estack_pages {
+	u32	offs;
+	u16	size;
+	u16	type;
 };
 
-#define	ESTACK_ENTRY(x)	{						  \
-	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
-	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
-	}
-
-static const struct estack_layout layout[] = {
-	[ ISTACK_DF	]	= ESTACK_ENTRY(DF),
-	[ ISTACK_NMI	]	= ESTACK_ENTRY(NMI),
-	[ ISTACK_DB2	]	= { .begin = 0, .end = 0},
-	[ ISTACK_DB1	]	= ESTACK_ENTRY(DB1),
-	[ ISTACK_DB	]	= ESTACK_ENTRY(DB),
-	[ ISTACK_MCE	]	= ESTACK_ENTRY(MCE),
+#define EPAGERANGE(st)							\
+	[PFN_DOWN(CEA_ESTACK_OFFS(st)) ...				\
+	 PFN_DOWN(CEA_ESTACK_OFFS(st) + CEA_ESTACK_SIZE(st) - 1)] = {	\
+		.offs	= CEA_ESTACK_OFFS(st),				\
+		.size	= CEA_ESTACK_SIZE(st),				\
+		.type	= STACK_TYPE_EXCEPTION + ISTACK_ ##st, }
+
+/*
+ * Array of exception stack page descriptors. If the stack is larger than
+ * PAGE_SIZE, all pages covering a particular stack will have the same
+ * info. The guard pages including the not mapped DB2 stack are zeroed
+ * out.
+ */
+static const
+struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
+	EPAGERANGE(DF),
+	EPAGERANGE(NMI),
+	EPAGERANGE(DB1),
+	EPAGERANGE(DB),
+	EPAGERANGE(MCE),
 };
 
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long estacks, begin, end, stk = (unsigned long)stack;
+	unsigned long begin, end, stk = (unsigned long)stack;
+	const struct estack_pages *ep;
 	struct pt_regs *regs;
 	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
-	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
-
-	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		begin = estacks + layout[k].begin;
-		end   = estacks + layout[k].end;
-		regs  = (struct pt_regs *)end - 1;
-
-		if (stk < begin || stk >= end)
-			continue;
-
-		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= (unsigned long *)begin;
-		info->end	= (unsigned long *)end;
-		info->next_sp	= (unsigned long *)regs->sp;
-
-		return true;
-	}
-
-	return false;
+	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	end = begin + sizeof(struct cea_exception_stacks);
+	/* Bail if @stack is outside the exception stack area. */
+	if (stk < begin || stk >= end)
+		return false;
+
+	/* Calc page offset from start of exception stacks */
+	k = (stk - begin) >> PAGE_SHIFT;
+	/* Lookup the page descriptor */
+	ep = &estack_pages[k];
+	/* Guard page? */
+	if (!ep->size)
+		return false;
+
+	begin += (unsigned long)ep->offs;
+	end = begin + (unsigned long)ep->size;
+	regs = (struct pt_regs *)end - 1;
+
+	info->type	= ep->type;
+	info->begin	= (unsigned long *)begin;
+	info->end	= (unsigned long *)end;
+	info->next_sp	= (unsigned long *)regs->sp;
+	return true;
 }
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 21/29] x86/irq/32: Define IRQ_STACK_SIZE
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (19 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 22/29] x86/irq/32: Make irq stack a character array Thomas Gleixner
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

On 32-bit IRQ_STACK_SIZE is the same as THREAD_SIZE.

To allow sharing struct irq_stack with 32-bit, define IRQ_STACK_SIZE for
32-bit and use of for struct irq_stack.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/page_32_types.h |    2 ++
 arch/x86/include/asm/processor.h     |    4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,6 +22,8 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
+#define IRQ_STACK_SIZE		THREAD_SIZE
+
 #define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -422,8 +422,8 @@ DECLARE_PER_CPU_ALIGNED(struct stack_can
  * per-CPU IRQ handling stacks
  */
 struct irq_stack {
-	u32                     stack[THREAD_SIZE/sizeof(u32)];
-} __aligned(THREAD_SIZE);
+	u32			stack[IRQ_STACK_SIZE / sizeof(u32)];
+} __aligned(IRQ_STACK_SIZE);
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 22/29] x86/irq/32: Make irq stack a character array
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (20 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 21/29] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 23/29] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

There is no reason to have an u32 array in struct irq_stack. The only
purpose of the array is to size the struct properly.

Preparatory change for sharing struct irq_stack with 64-bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -422,7 +422,7 @@ DECLARE_PER_CPU_ALIGNED(struct stack_can
  * per-CPU IRQ handling stacks
  */
 struct irq_stack {
-	u32			stack[IRQ_STACK_SIZE / sizeof(u32)];
+	char			stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 23/29] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (21 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 22/29] x86/irq/32: Make irq stack a character array Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 24/29] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The percpu storage holds a pointer to the stack not the stack
itself. Rename it before sharing struct irq_stack with 64-bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    4 ++--
 arch/x86/kernel/dumpstack_32.c   |    4 ++--
 arch/x86/kernel/irq_32.c         |   19 ++++++++++---------
 3 files changed, 14 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -425,8 +425,8 @@ struct irq_stack {
 	char			stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
-DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
-DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
+DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
 extern unsigned int fpu_kernel_xstate_size;
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -34,7 +34,7 @@ const char *stack_type_name(enum stack_t
 
 static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
@@ -59,7 +59,7 @@ static bool in_hardirq_stack(unsigned lo
 
 static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -51,8 +51,8 @@ static inline int check_stack_overflow(v
 static inline void print_stack_overflow(void) { }
 #endif
 
-DEFINE_PER_CPU(struct irq_stack *, hardirq_stack);
-DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
+DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+DEFINE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 
 static void call_on_stack(void *func, void *stack)
 {
@@ -76,7 +76,7 @@ static inline int execute_on_irq_stack(i
 	u32 *isp, *prev_esp, arg1;
 
 	curstk = (struct irq_stack *) current_stack();
-	irqstk = __this_cpu_read(hardirq_stack);
+	irqstk = __this_cpu_read(hardirq_stack_ptr);
 
 	/*
 	 * this is where we switch to the IRQ stack. However, if we are
@@ -113,21 +113,22 @@ void irq_ctx_init(int cpu)
 {
 	struct irq_stack *irqstk;
 
-	if (per_cpu(hardirq_stack, cpu))
+	if (per_cpu(hardirq_stack_ptr, cpu))
 		return;
 
 	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
 					       THREADINFO_GFP,
 					       THREAD_SIZE_ORDER));
-	per_cpu(hardirq_stack, cpu) = irqstk;
+	per_cpu(hardirq_stack_ptr, cpu) = irqstk;
 
 	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
 					       THREADINFO_GFP,
 					       THREAD_SIZE_ORDER));
-	per_cpu(softirq_stack, cpu) = irqstk;
+	per_cpu(softirq_stack_ptr, cpu) = irqstk;
 
-	printk(KERN_DEBUG "CPU %u irqstacks, hard=%p soft=%p\n",
-	       cpu, per_cpu(hardirq_stack, cpu),  per_cpu(softirq_stack, cpu));
+	pr_debug("CPU %u irqstacks, hard=%p soft=%p\n",
+		 cpu, per_cpu(hardirq_stack_ptr, cpu),
+		 per_cpu(softirq_stack_ptr, cpu));
 }
 
 void do_softirq_own_stack(void)
@@ -135,7 +136,7 @@ void do_softirq_own_stack(void)
 	struct irq_stack *irqstk;
 	u32 *isp, *prev_esp;
 
-	irqstk = __this_cpu_read(softirq_stack);
+	irqstk = __this_cpu_read(softirq_stack_ptr);
 
 	/* build the stack frame on the softirq stack */
 	isp = (u32 *) ((char *)irqstk + sizeof(*irqstk));



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 24/29] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (22 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 23/29] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 25/29] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Preparatory patch to share code with 32bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S        |    2 +-
 arch/x86/include/asm/processor.h |    2 +-
 arch/x86/kernel/cpu/common.c     |    2 +-
 arch/x86/kernel/dumpstack_64.c   |    2 +-
 arch/x86/kernel/irq_64.c         |    2 +-
 arch/x86/kernel/setup_percpu.c   |    2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -431,7 +431,7 @@ END(irq_entries_start)
 	 */
 
 	movq	\old_rsp, PER_CPU_VAR(irq_stack_union + IRQ_STACK_SIZE - 8)
-	movq	PER_CPU_VAR(irq_stack_ptr), %rsp
+	movq	PER_CPU_VAR(hardirq_stack_ptr), %rsp
 
 #ifdef CONFIG_DEBUG_ENTRY
 	/*
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -396,7 +396,7 @@ static inline unsigned long cpu_kernelmo
 	return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
 }
 
-DECLARE_PER_CPU(char *, irq_stack_ptr);
+DECLARE_PER_CPU(char *, hardirq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1510,7 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, cur
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, irq_stack_ptr) =
+DEFINE_PER_CPU(char *, hardirq_stack_ptr) =
 	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -120,7 +120,7 @@ static bool in_exception_stack(unsigned
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
+	unsigned long *end   = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
 
 	/*
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -56,7 +56,7 @@ static inline void stack_overflow_check(
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_top = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_top = (u64)__this_cpu_read(hardirq_stack_ptr);
 	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
 	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -245,7 +245,7 @@ void __init setup_per_cpu_areas(void)
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
 #ifdef CONFIG_X86_64
-		per_cpu(irq_stack_ptr, cpu) =
+		per_cpu(hardirq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
 			IRQ_STACK_SIZE;
 #endif



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 25/29] x86/irq/32: Invoke irq_ctx_init() from init_IRQ()
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (23 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 24/29] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:27   ` Juergen Gross
  2019-04-05 15:07 ` [patch V2 26/29] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson, Juergen Gross

irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
code. There is no reason to have this split. The interrupt stacks must be
allocated no matter what.

Invoke it from init_IRQ() before invoking the native or XEN init
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
---
 arch/x86/kernel/irqinit.c        |    4 ++--
 drivers/xen/events/events_base.c |    1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -91,6 +91,8 @@ void __init init_IRQ(void)
 	for (i = 0; i < nr_legacy_irqs(); i++)
 		per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
 
+	irq_ctx_init(smp_processor_id());
+
 	x86_init.irqs.intr_init();
 }
 
@@ -104,6 +106,4 @@ void __init native_init_IRQ(void)
 
 	if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs())
 		setup_irq(2, &irq2);
-
-	irq_ctx_init(smp_processor_id());
 }
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1687,7 +1687,6 @@ void __init xen_init_IRQ(void)
 
 #ifdef CONFIG_X86
 	if (xen_pv_domain()) {
-		irq_ctx_init(smp_processor_id());
 		if (xen_initial_domain())
 			pci_xen_initial_domain();
 	}



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 26/29] x86/irq/32: Handle irq stack allocation failure proper
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (24 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 25/29] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 27/29] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

irq_ctx_init() crashes hard on page allocation failures. While that's ok
during early boot, it's just wrong in the CPU hotplug bringup code.

Check the page allocation failure and return -ENOMEM and handle it at the
call sites. On early boot the only way out is to BUG(), but on CPU hotplug
there is no reason to crash, so just abort the operation.

Rename the function to something more sensible while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irq.h |    4 ++--
 arch/x86/include/asm/smp.h |    2 +-
 arch/x86/kernel/irq_32.c   |   34 +++++++++++++++++-----------------
 arch/x86/kernel/irqinit.c  |    2 +-
 arch/x86/kernel/smpboot.c  |   15 ++++++++++++---
 arch/x86/xen/smp_pv.c      |    4 +++-
 6 files changed, 36 insertions(+), 25 deletions(-)

--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -17,9 +17,9 @@ static inline int irq_canonicalize(int i
 }
 
 #ifdef CONFIG_X86_32
-extern void irq_ctx_init(int cpu);
+extern int irq_init_percpu_irqstack(unsigned int cpu);
 #else
-# define irq_ctx_init(cpu) do { } while (0)
+static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
 #endif
 
 #define __ARCH_HAS_DO_SOFTIRQ
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -131,7 +131,7 @@ void native_smp_prepare_boot_cpu(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void calculate_max_logical_packages(void);
 void native_smp_cpus_done(unsigned int max_cpus);
-void common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
+int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_disable(void);
 int common_cpu_die(unsigned int cpu);
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -107,28 +107,28 @@ static inline int execute_on_irq_stack(i
 }
 
 /*
- * allocate per-cpu stacks for hardirq and for softirq processing
+ * Allocate per-cpu stacks for hardirq and softirq processing
  */
-void irq_ctx_init(int cpu)
+int irq_init_percpu_irqstack(unsigned int cpu)
 {
-	struct irq_stack *irqstk;
+	int node = cpu_to_node(cpu);
+	struct page *ph, *ps;
 
 	if (per_cpu(hardirq_stack_ptr, cpu))
-		return;
+		return 0;
 
-	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
-					       THREADINFO_GFP,
-					       THREAD_SIZE_ORDER));
-	per_cpu(hardirq_stack_ptr, cpu) = irqstk;
-
-	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
-					       THREADINFO_GFP,
-					       THREAD_SIZE_ORDER));
-	per_cpu(softirq_stack_ptr, cpu) = irqstk;
-
-	pr_debug("CPU %u irqstacks, hard=%p soft=%p\n",
-		 cpu, per_cpu(hardirq_stack_ptr, cpu),
-		 per_cpu(softirq_stack_ptr, cpu));
+	ph = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
+	if (!ph)
+		return -ENOMEM;
+	ps = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
+	if (!ps) {
+		__free_pages(ph, THREAD_SIZE_ORDER);
+		return -ENOMEM;
+	}
+
+	per_cpu(hardirq_stack_ptr, cpu) = page_address(ph);
+	per_cpu(softirq_stack_ptr, cpu) = page_address(ps);
+	return 0;
 }
 
 void do_softirq_own_stack(void)
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -91,7 +91,7 @@ void __init init_IRQ(void)
 	for (i = 0; i < nr_legacy_irqs(); i++)
 		per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
 
-	irq_ctx_init(smp_processor_id());
+	BUG_ON(irq_init_percpu_irqstack(smp_processor_id()));
 
 	x86_init.irqs.intr_init();
 }
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -935,20 +935,27 @@ wakeup_cpu_via_init_nmi(int cpu, unsigne
 	return boot_error;
 }
 
-void common_cpu_up(unsigned int cpu, struct task_struct *idle)
+int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 {
+	int ret;
+
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
 
 	per_cpu(current_task, cpu) = idle;
 
+	/* Initialize the interrupt stack(s) */
+	ret = irq_init_percpu_irqstack(cpu);
+	if (ret)
+		return ret;
+
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
-	irq_ctx_init(cpu);
 	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #else
 	initial_gs = per_cpu_offset(cpu);
 #endif
+	return 0;
 }
 
 /*
@@ -1106,7 +1113,9 @@ int native_cpu_up(unsigned int cpu, stru
 	/* the FPU context is blank, nobody can own it */
 	per_cpu(fpu_fpregs_owner_ctx, cpu) = NULL;
 
-	common_cpu_up(cpu, tidle);
+	err = common_cpu_up(cpu, tidle);
+	if (err)
+		return err;
 
 	err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
 	if (err) {
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -361,7 +361,9 @@ static int xen_pv_cpu_up(unsigned int cp
 {
 	int rc;
 
-	common_cpu_up(cpu, idle);
+	rc = common_cpu_up(cpu, idle);
+	if (rc)
+		return rc;
 
 	xen_setup_runstate_info(cpu);
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 27/29] x86/irq/64: Split the IRQ stack into its own pages
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (25 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 26/29] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
  2019-04-05 15:07 ` [patch V2 29/29] x86/irq/64: Remove stack overflow debug code Thomas Gleixner
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

Currently the IRQ stack is hardcoded as the first page of the percpu area,
and the stack canary lives on the IRQ stack.  The former gets in the way of
adding an IRQ stack guard page, and the latter is a potential weakness in
the stack canary mechanism.

Split the IRQ stack into its own private percpu pages.

[ tglx: Make 64 and 32 bit share struct irq_stack ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S             |    4 +--
 arch/x86/include/asm/processor.h      |   35 ++++++++++++++++------------------
 arch/x86/include/asm/stackprotector.h |    6 ++---
 arch/x86/kernel/asm-offsets_64.c      |    2 -
 arch/x86/kernel/cpu/common.c          |   10 +++------
 arch/x86/kernel/head_64.S             |    2 -
 arch/x86/kernel/irq_64.c              |    2 +
 arch/x86/kernel/setup_percpu.c        |    5 +---
 arch/x86/kernel/vmlinux.lds.S         |    7 +++---
 arch/x86/tools/relocs.c               |    2 -
 arch/x86/xen/xen-head.S               |   10 ++++-----
 11 files changed, 42 insertions(+), 43 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -298,7 +298,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(fixed_percpu_data) + stack_canary_offset
 #endif
 
 #ifdef CONFIG_RETPOLINE
@@ -430,7 +430,7 @@ END(irq_entries_start)
 	 * it before we actually move ourselves to the IRQ stack.
 	 */
 
-	movq	\old_rsp, PER_CPU_VAR(irq_stack_union + IRQ_STACK_SIZE - 8)
+	movq	\old_rsp, PER_CPU_VAR(irq_stack_backing_store + IRQ_STACK_SIZE - 8)
 	movq	PER_CPU_VAR(hardirq_stack_ptr), %rsp
 
 #ifdef CONFIG_DEBUG_ENTRY
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -367,6 +367,13 @@ DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_
 #define __KERNEL_TSS_LIMIT	\
 	(IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
 
+/* Per CPU interrupt stacks */
+struct irq_stack {
+	char		stack[IRQ_STACK_SIZE];
+} __aligned(IRQ_STACK_SIZE);
+
+DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+
 #ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
 #else
@@ -375,28 +382,27 @@ DECLARE_PER_CPU(unsigned long, cpu_curre
 #endif
 
 #ifdef CONFIG_X86_64
-union irq_stack_union {
-	char irq_stack[IRQ_STACK_SIZE];
+struct fixed_percpu_data {
 	/*
 	 * GCC hardcodes the stack canary as %gs:40.  Since the
 	 * irq_stack is the object at %gs:0, we reserve the bottom
 	 * 48 bytes of the irq stack for the canary.
 	 */
-	struct {
-		char gs_base[40];
-		unsigned long stack_canary;
-	};
+	char		gs_base[40];
+	unsigned long	stack_canary;
 };
 
-DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
-DECLARE_INIT_PER_CPU(irq_stack_union);
+DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible;
+DECLARE_INIT_PER_CPU(fixed_percpu_data);
+
+DECLARE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store);
+DECLARE_INIT_PER_CPU(irq_stack_backing_store);
 
 static inline unsigned long cpu_kernelmode_gs_base(int cpu)
 {
-	return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
+	return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu);
 }
 
-DECLARE_PER_CPU(char *, hardirq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 
@@ -418,14 +424,7 @@ struct stack_canary {
 };
 DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
-/*
- * per-CPU IRQ handling stacks
- */
-struct irq_stack {
-	char			stack[IRQ_STACK_SIZE];
-} __aligned(IRQ_STACK_SIZE);
-
-DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+/* Per CPU softirq stack pointer */
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -13,7 +13,7 @@
  * On x86_64, %gs is shared by percpu area and stack canary.  All
  * percpu symbols are zero based and %gs points to the base of percpu
  * area.  The first occupant of the percpu area is always
- * irq_stack_union which contains stack_canary at offset 40.  Userland
+ * fixed_percpu_data which contains stack_canary at offset 40.  Userland
  * %gs is always saved and restored on kernel entry and exit using
  * swapgs, so stack protector doesn't add any complexity there.
  *
@@ -64,7 +64,7 @@ static __always_inline void boot_init_st
 	u64 tsc;
 
 #ifdef CONFIG_X86_64
-	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
+	BUILD_BUG_ON(offsetof(struct fixed_percpu_data, stack_canary) != 40);
 #endif
 	/*
 	 * We both use the random pool and the current TSC as a source
@@ -79,7 +79,7 @@ static __always_inline void boot_init_st
 
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
-	this_cpu_write(irq_stack_union.stack_canary, canary);
+	this_cpu_write(fixed_percpu_data.stack_canary, canary);
 #else
 	this_cpu_write(stack_canary.canary, canary);
 #endif
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -73,7 +73,7 @@ int main(void)
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
-	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+	DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_canary));
 	BLANK();
 #endif
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1498,9 +1498,9 @@ static __init int setup_clearcpuid(char
 __setup("clearcpuid=", setup_clearcpuid);
 
 #ifdef CONFIG_X86_64
-DEFINE_PER_CPU_FIRST(union irq_stack_union,
-		     irq_stack_union) __aligned(PAGE_SIZE) __visible;
-EXPORT_PER_CPU_SYMBOL_GPL(irq_stack_union);
+DEFINE_PER_CPU_FIRST(struct fixed_percpu_data,
+		     fixed_percpu_data) __aligned(PAGE_SIZE) __visible;
+EXPORT_PER_CPU_SYMBOL_GPL(fixed_percpu_data);
 
 /*
  * The following percpu variables are hot.  Align current_task to
@@ -1510,9 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, cur
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, hardirq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
-
+DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -265,7 +265,7 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
-	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+	.quad	INIT_PER_CPU_VAR(fixed_percpu_data)
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -23,6 +23,8 @@
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
+DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
+
 int sysctl_panic_on_stackoverflow;
 
 /*
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -245,9 +245,8 @@ void __init setup_per_cpu_areas(void)
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
 #ifdef CONFIG_X86_64
-		per_cpu(hardirq_stack_ptr, cpu) =
-			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_STACK_SIZE;
+		per_cpu(hardirq_stack_ptr, cpu) = (struct irq_stack *)
+			per_cpu_ptr(&irq_stack_backing_store, cpu) + 1;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -403,7 +403,8 @@ SECTIONS
  */
 #define INIT_PER_CPU(x) init_per_cpu__##x = ABSOLUTE(x) + __per_cpu_load
 INIT_PER_CPU(gdt_page);
-INIT_PER_CPU(irq_stack_union);
+INIT_PER_CPU(fixed_percpu_data);
+INIT_PER_CPU(irq_stack_backing_store);
 
 /*
  * Build-time check on the image size:
@@ -412,8 +413,8 @@ INIT_PER_CPU(irq_stack_union);
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
 #ifdef CONFIG_SMP
-. = ASSERT((irq_stack_union == 0),
-           "irq_stack_union is not at start of per-cpu area");
+. = ASSERT((fixed_percpu_data == 0),
+           "fixed_percpu_data is not at start of per-cpu area");
 #endif
 
 #endif /* CONFIG_X86_32 */
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -738,7 +738,7 @@ static void percpu_init(void)
  *	__per_cpu_load
  *
  * The "gold" linker incorrectly associates:
- *	init_per_cpu__irq_stack_union
+ *	init_per_cpu__fixed_percpu_data
  *	init_per_cpu__gdt_page
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -40,13 +40,13 @@ ENTRY(startup_xen)
 #ifdef CONFIG_X86_64
 	/* Set up %gs.
 	 *
-	 * The base of %gs always points to the bottom of the irqstack
-	 * union.  If the stack protector canary is enabled, it is
-	 * located at %gs:40.  Note that, on SMP, the boot cpu uses
-	 * init data section till per cpu areas are set up.
+	 * The base of %gs always points to fixed_percpu_data.  If the
+	 * stack protector canary is enabled, it is located at %gs:40.
+	 * Note that, on SMP, the boot cpu uses init data section until
+	 * the per cpu areas are set up.
 	 */
 	movl	$MSR_GS_BASE,%ecx
-	movq	$INIT_PER_CPU_VAR(irq_stack_union),%rax
+	movq	$INIT_PER_CPU_VAR(fixed_percpu_data),%rax
 	cdq
 	wrmsr
 #endif



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (26 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 27/29] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  2019-04-07  4:56   ` Andy Lutomirski
  2019-04-05 15:07 ` [patch V2 29/29] x86/irq/64: Remove stack overflow debug code Thomas Gleixner
  28 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

The IRQ stack lives in percpu space, so an IRQ handler that overflows it
will overwrite other data structures.

Use vmap() to remap the IRQ stack so that it will have the usual guard
pages that vmap/vmalloc allocations have. With this the kernel will panic
immediately on an IRQ stack overflow.

[ tglx: Move the map code to a proper place and invoke it only when a CPU
  	is about to be brought online. No point in installing the map at
  	early boot for all possible CPUs. Fail the CPU bringup if the vmap
  	fails as done for all other preparatory stages in cpu hotplug. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irq.h     |    4 ---
 arch/x86/kernel/irq_64.c       |   45 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup_percpu.c |    4 ---
 3 files changed, 45 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -16,11 +16,7 @@ static inline int irq_canonicalize(int i
 	return ((irq == 2) ? 9 : irq);
 }
 
-#ifdef CONFIG_X86_32
 extern int irq_init_percpu_irqstack(unsigned int cpu);
-#else
-static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
-#endif
 
 #define __ARCH_HAS_DO_SOFTIRQ
 
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -89,3 +89,48 @@ bool handle_irq(struct irq_desc *desc, s
 	generic_handle_irq_desc(desc);
 	return true;
 }
+
+#ifdef CONFIG_VMAP_STACK
+/*
+ * VMAP the backing store with guard pages
+ */
+static int map_irq_stack(unsigned int cpu)
+{
+	char *stack = (char *)per_cpu_ptr(&irq_stack_backing_store, cpu);
+	struct page *pages[IRQ_STACK_SIZE / PAGE_SIZE];
+	void *va;
+	int i;
+
+	for (i = 0; i < IRQ_STACK_SIZE / PAGE_SIZE; i++) {
+		phys_addr_t pa = per_cpu_ptr_to_phys(stack + (i << PAGE_SHIFT));
+
+		pages[i] = pfn_to_page(pa >> PAGE_SHIFT);
+	}
+
+	va = vmap(pages, IRQ_STACK_SIZE / PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+	if (!va)
+		return -ENOMEM;
+
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	return 0;
+}
+#else
+/*
+ * If VMAP stacks are disabled due to KASAN, just use the per cpu
+ * backing store without guard pages.
+ */
+static int map_irq_stack(unsigned int cpu)
+{
+	void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
+
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	return 0;
+}
+#endif
+
+int irq_init_percpu_irqstack(unsigned int cpu)
+{
+	if (per_cpu(hardirq_stack_ptr, cpu))
+		return 0;
+	return map_irq_stack(cpu);
+}
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -244,10 +244,6 @@ void __init setup_per_cpu_areas(void)
 		per_cpu(x86_cpu_to_logical_apicid, cpu) =
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
-#ifdef CONFIG_X86_64
-		per_cpu(hardirq_stack_ptr, cpu) = (struct irq_stack *)
-			per_cpu_ptr(&irq_stack_backing_store, cpu) + 1;
-#endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
 			early_per_cpu_map(x86_cpu_to_node_map, cpu);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [patch V2 29/29] x86/irq/64: Remove stack overflow debug code
  2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (27 preceding siblings ...)
  2019-04-05 15:07 ` [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
@ 2019-04-05 15:07 ` Thomas Gleixner
  28 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 15:07 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

All stack types on x86 64-bit have guard pages now.

So there is no point in executing probabilistic overflow checks as the
guard pages are a accurate and reliable overflow prevention.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig         |    2 -
 arch/x86/kernel/irq_64.c |   56 -----------------------------------------------
 2 files changed, 1 insertion(+), 57 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -14,6 +14,7 @@ config X86_32
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select CLKSRC_I8253
 	select CLONE_BACKWARDS
+	select HAVE_DEBUG_STACKOVERFLOW
 	select MODULES_USE_ELF_REL
 	select OLD_SIGACTION
 
@@ -138,7 +139,6 @@ config X86
 	select HAVE_COPY_THREAD_TLS
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_DEBUG_KMEMLEAK
-	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -25,64 +25,8 @@
 
 DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
 
-int sysctl_panic_on_stackoverflow;
-
-/*
- * Probabilistic stack overflow check:
- *
- * Regular device interrupts can enter on the following stacks:
- *
- * - User stack
- *
- * - Kernel task stack
- *
- * - Interrupt stack if a device driver reenables interrupts
- *   which should only happen in really old drivers.
- *
- * - Debug IST stack
- *
- * All other contexts are invalid.
- */
-static inline void stack_overflow_check(struct pt_regs *regs)
-{
-#ifdef CONFIG_DEBUG_STACKOVERFLOW
-#define STACK_MARGIN	128
-	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
-	u64 curbase = (u64)task_stack_page(current);
-	struct cea_exception_stacks *estacks;
-
-	if (user_mode(regs))
-		return;
-
-	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_MARGIN &&
-	    regs->sp <= curbase + THREAD_SIZE)
-		return;
-
-	irq_stack_top = (u64)__this_cpu_read(hardirq_stack_ptr);
-	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
-	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
-		return;
-
-	estacks = __this_cpu_read(cea_exception_stacks);
-	estack_top = CEA_ESTACK_TOP(estacks, DB);
-	estack_bottom = CEA_ESTACK_BOT(estacks, DB) + STACK_MARGIN;
-	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
-		return;
-
-	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx, irq stack:%Lx-%Lx, exception stack: %Lx-%Lx, ip:%pF)\n",
-		current->comm, curbase, regs->sp,
-		irq_stack_bottom, irq_stack_top,
-		estack_bottom, estack_top, (void *)regs->ip);
-
-	if (sysctl_panic_on_stackoverflow)
-		panic("low stack detected by irq handler - check messages\n");
-#endif
-}
-
 bool handle_irq(struct irq_desc *desc, struct pt_regs *regs)
 {
-	stack_overflow_check(regs);
-
 	if (IS_ERR_OR_NULL(desc))
 		return false;
 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 25/29] x86/irq/32: Invoke irq_ctx_init() from init_IRQ()
  2019-04-05 15:07 ` [patch V2 25/29] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
@ 2019-04-05 15:27   ` Juergen Gross
  0 siblings, 0 replies; 60+ messages in thread
From: Juergen Gross @ 2019-04-05 15:27 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

On 05/04/2019 17:07, Thomas Gleixner wrote:
> irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
> code. There is no reason to have this split. The interrupt stacks must be
> allocated no matter what.
> 
> Invoke it from init_IRQ() before invoking the native or XEN init
> implementation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 02/29] x86/dumpstack: Fix off-by-one errors in stack identification
  2019-04-05 15:07 ` [patch V2 02/29] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
@ 2019-04-05 15:44   ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 15:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:00PM +0200, Thomas Gleixner wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> The get_stack_info() function is off-by-one when checking whether an
> address is on a IRQ stack or a IST stack.  This prevents a overflowed IRQ
> or IST stack from being dumped properly.
> 
> [ tglx: Do the same for 32-bit ]
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> ---

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-05 15:07 ` [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
@ 2019-04-05 16:37   ` Sean Christopherson
  2019-04-05 16:38     ` Sean Christopherson
  2019-04-05 17:18     ` Josh Poimboeuf
  0 siblings, 2 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 16:37 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:01PM +0200, Thomas Gleixner wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> stack_overflow_check() is using both irq_stack_ptr and irq_stack_union to
> find the IRQ stack. That's going to break when vmapped irq stacks are
> introduced.
> 
> Change it to just use irq_stack_ptr.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  arch/x86/kernel/irq_64.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/arch/x86/kernel/irq_64.c
> +++ b/arch/x86/kernel/irq_64.c
> @@ -55,9 +55,8 @@ static inline void stack_overflow_check(
>  	    regs->sp <= curbase + THREAD_SIZE)
>  		return;
>  
> -	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
> -			STACK_TOP_MARGIN;
>  	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
> +	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;

Not introduced in this patch, but the names for top and bottom are flipped,
both for irq_stack and estack.  STACK_TOP_MARGIN should also be
STACK_BOTTOM_MARGIN.  The actual checks are functionally correct, but holy
hell does it make reading the code confusing, and the WARN prints backwards
information.

E.g.:

  swapper/10 has overflown the kernel stack

    cur:ffffc900000bc000,sp:ffff888277b03f78
    irq stk top-bottom:ffff888277b00080-ffff888277b04000
    exception stk top-bottom:fffffe00001b4080-fffffe00001b9000


Printing out top-bottom for "cur" would also probably be helpful.

Let me know if you'd like me to send a patch, or if you'll fold a change
into this series.


For this patch,

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

>  	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
>  		return;
>  
> 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-05 16:37   ` Sean Christopherson
@ 2019-04-05 16:38     ` Sean Christopherson
  2019-04-05 17:18     ` Josh Poimboeuf
  1 sibling, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 16:38 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 09:37:27AM -0700, Sean Christopherson wrote:
> On Fri, Apr 05, 2019 at 05:07:01PM +0200, Thomas Gleixner wrote:
> > From: Andy Lutomirski <luto@kernel.org>
> > 
> > stack_overflow_check() is using both irq_stack_ptr and irq_stack_union to
> > find the IRQ stack. That's going to break when vmapped irq stacks are
> > introduced.
> > 
> > Change it to just use irq_stack_ptr.
> > 
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > 
> > ---
> >  arch/x86/kernel/irq_64.c |    3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > --- a/arch/x86/kernel/irq_64.c
> > +++ b/arch/x86/kernel/irq_64.c
> > @@ -55,9 +55,8 @@ static inline void stack_overflow_check(
> >  	    regs->sp <= curbase + THREAD_SIZE)
> >  		return;
> >  
> > -	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
> > -			STACK_TOP_MARGIN;
> >  	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
> > +	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
> 
> Not introduced in this patch, but the names for top and bottom are flipped,
> both for irq_stack and estack.  STACK_TOP_MARGIN should also be
> STACK_BOTTOM_MARGIN.  The actual checks are functionally correct, but holy
> hell does it make reading the code confusing, and the WARN prints backwards
> information.
> 
> E.g.:
> 
>   swapper/10 has overflown the kernel stack
> 
>     cur:ffffc900000bc000,sp:ffff888277b03f78
>     irq stk top-bottom:ffff888277b00080-ffff888277b04000
>     exception stk top-bottom:fffffe00001b4080-fffffe00001b9000
> 
> 
> Printing out top-bottom for "cur" would also probably be helpful.
> 
> Let me know if you'd like me to send a patch, or if you'll fold a change
> into this series.

And hello patch 03/29.  *sigh*

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 04/29] x86/irq/64: Sanitize the top/bottom confusion
  2019-04-05 15:07 ` [patch V2 04/29] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
@ 2019-04-05 16:54   ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 16:54 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:02PM +0200, Thomas Gleixner wrote:
> On x86 stacks go top to bottom, but the stack overflow check uses it the
> other way round, which is just confusing. Clean it up and sanitize the
> warning string a bit.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-05 16:37   ` Sean Christopherson
  2019-04-05 16:38     ` Sean Christopherson
@ 2019-04-05 17:18     ` Josh Poimboeuf
  2019-04-05 17:47       ` Thomas Gleixner
  1 sibling, 1 reply; 60+ messages in thread
From: Josh Poimboeuf @ 2019-04-05 17:18 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Thomas Gleixner, LKML, x86, Andy Lutomirski

On Fri, Apr 05, 2019 at 09:37:27AM -0700, Sean Christopherson wrote:
> On Fri, Apr 05, 2019 at 05:07:01PM +0200, Thomas Gleixner wrote:
> > From: Andy Lutomirski <luto@kernel.org>
> > 
> > stack_overflow_check() is using both irq_stack_ptr and irq_stack_union to
> > find the IRQ stack. That's going to break when vmapped irq stacks are
> > introduced.
> > 
> > Change it to just use irq_stack_ptr.
> > 
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > 
> > ---
> >  arch/x86/kernel/irq_64.c |    3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > --- a/arch/x86/kernel/irq_64.c
> > +++ b/arch/x86/kernel/irq_64.c
> > @@ -55,9 +55,8 @@ static inline void stack_overflow_check(
> >  	    regs->sp <= curbase + THREAD_SIZE)
> >  		return;
> >  
> > -	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
> > -			STACK_TOP_MARGIN;
> >  	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
> > +	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
> 
> Not introduced in this patch, but the names for top and bottom are flipped,
> both for irq_stack and estack.  STACK_TOP_MARGIN should also be
> STACK_BOTTOM_MARGIN.  The actual checks are functionally correct, but holy
> hell does it make reading the code confusing, and the WARN prints backwards
> information.

I agree, but... one man's top is another man's bottom.  Especially when
stacks grow physically down (as defined by Intel) but conceptually up
(as defined by every CS algorithms class ever).

-- 
Josh

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-05 17:18     ` Josh Poimboeuf
@ 2019-04-05 17:47       ` Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Sean Christopherson, LKML, x86, Andy Lutomirski

On Fri, 5 Apr 2019, Josh Poimboeuf wrote:
> On Fri, Apr 05, 2019 at 09:37:27AM -0700, Sean Christopherson wrote:
> > On Fri, Apr 05, 2019 at 05:07:01PM +0200, Thomas Gleixner wrote:
> > > -	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
> > > -			STACK_TOP_MARGIN;
> > >  	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
> > > +	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
> > 
> > Not introduced in this patch, but the names for top and bottom are flipped,
> > both for irq_stack and estack.  STACK_TOP_MARGIN should also be
> > STACK_BOTTOM_MARGIN.  The actual checks are functionally correct, but holy
> > hell does it make reading the code confusing, and the WARN prints backwards
> > information.
> 
> I agree, but... one man's top is another man's bottom.  Especially when
> stacks grow physically down (as defined by Intel) but conceptually up
> (as defined by every CS algorithms class ever).

That's where they explain that stack concept with stacking of porcelain
plates, which causes the association of shards when the stack is turned
around. Right?

Thankfully I got never exposed to that.

/me runs

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 07/29] x86/exceptions: Make IST index zero based
  2019-04-05 15:07 ` [patch V2 07/29] x86/exceptions: Make IST index zero based Thomas Gleixner
@ 2019-04-05 18:59   ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 18:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:05PM +0200, Thomas Gleixner wrote:
> The defines for the exception stack (IST) array in the TSS are using the
> SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
> array indices related to IST. That's confusing at best and does not provide
> any value.
> 
> Make the indices zero based and fixup the usage sites. The only code which
> needs to adjust the 0 based index is the interrupt descriptor setup which
> needs to add 1 now.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 08/29] x86/cpu_entry_area: Cleanup setup functions
  2019-04-05 15:07 ` [patch V2 08/29] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
@ 2019-04-05 19:25   ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 19:25 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:06PM +0200, Thomas Gleixner wrote:
> No point in retrieving the entry area pointer over and over. Do it once and
> use unsigned int for 'cpu' consistently.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

"consistent" might be a bit of a stretch for 'unsigned int cpu' vs 'int cpu' :-)

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 09/29] x86/exceptions: Add structs for exception stacks
  2019-04-05 15:07 ` [patch V2 09/29] x86/exceptions: Add structs for exception stacks Thomas Gleixner
@ 2019-04-05 20:48   ` Sean Christopherson
  2019-04-05 20:50     ` Sean Christopherson
  2019-04-05 21:00     ` Thomas Gleixner
  0 siblings, 2 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 20:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:07PM +0200, Thomas Gleixner wrote:
> At the moment everything assumes a full linear mapping of the various
> exception stacks. Adding guard pages to the cpu entry area mapping of the
> exception stacks will break that assumption.
> 
> As a preparatory step convert both the real storage and the effective
> mapping in the cpu entry area from character arrays to structures.
> 
> To ensure that both arrays have the same ordering and the same size of the
> individual stacks fill the members with a macro. The guard size is the only
> difference between the two resulting structures. For now both have guard
> size 0 until the preparation of all usage sites is done.
> 
> Provide a couple of helper macros which are used in the following
> conversions.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/include/asm/cpu_entry_area.h |   51 ++++++++++++++++++++++++++++++----
>  arch/x86/kernel/cpu/common.c          |    2 -
>  arch/x86/mm/cpu_entry_area.c          |    8 ++---
>  3 files changed, 50 insertions(+), 11 deletions(-)
> 
> --- a/arch/x86/include/asm/cpu_entry_area.h
> +++ b/arch/x86/include/asm/cpu_entry_area.h
> @@ -7,6 +7,50 @@
>  #include <asm/processor.h>
>  #include <asm/intel_ds.h>
>  
> +#ifdef CONFIG_X86_64
> +
> +/* Macro to enforce the same ordering and stack sizes */
> +#define ESTACKS_MEMBERS(guardsize)		\
> +	char	DF_stack[EXCEPTION_STKSZ];	\
> +	char	DF_stack_guard[guardsize];	\
> +	char	NMI_stack[EXCEPTION_STKSZ];	\
> +	char	NMI_stack_guard[guardsize];	\
> +	char	DB_stack[DEBUG_STKSZ];		\
> +	char	DB_stack_guard[guardsize];	\
> +	char	MCE_stack[EXCEPTION_STKSZ];	\
> +	char	MCE_stack_guard[guardsize];	\

Conceptually, shouldn't the stack guard precede its associated stack
since the stacks grow down?  And don't we want a guard page below the
DF_stack?  There could still be a guard page above MCE_stack,
e.g. IST_stack_guard or something.

E.g. the example in patch "Speedup in_exception_stack()" also suggests
that "guard page" is associated with the stack physical above it:

      --- top of DB_stack       <- Initial stack
      --- end of DB_stack
          guard page

      --- top of DB1_stack      <- Top of stack after entering first #DB
      --- end of DB1_stack
          guard page

      --- top of DB2_stack      <- Top of stack after entering second #DB
      --- end of DB2_stack
          guard page

> +
> +/* The exception stacks linear storage. No guard pages required */
> +struct exception_stacks {
> +	ESTACKS_MEMBERS(0)
> +};
> +
> +/*
> + * The effective cpu entry area mapping with guard pages. Guard size is
> + * zero until the code which makes assumptions about linear mapping is
> + * cleaned up.
> + */
> +struct cea_exception_stacks {
> +	ESTACKS_MEMBERS(0)
> +};
> +
> +#define CEA_ESTACK_TOP(ceastp, st)			\
> +	((unsigned long)&(ceastp)->st## _stack_guard)

IMO, using the stack guard to define the top of stack is unnecessarily
confusing and fragile, e.g. reordering the names of the stack guards
would break this macro.

What about:

#define CEA_ESTACK_TOP(ceastp, st)			\
	(CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))

> +#define CEA_ESTACK_BOT(ceastp, st)			\
> +	((unsigned long)&(ceastp)->st## _stack)
> +
> +#define CEA_ESTACK_OFFS(st)					\
> +	offsetof(struct cea_exception_stacks, st## _stack)
> +
> +#define CEA_ESTACK_SIZE(st)					\
> +	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
> +
> +#define CEA_ESTACK_PAGES					\
> +	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
> +
> +#endif
> +
>  /*
>   * cpu_entry_area is a percpu region that contains things needed by the CPU
>   * and early entry/exit code.  Real types aren't used for all fields here
> @@ -32,12 +76,9 @@ struct cpu_entry_area {
>  
>  #ifdef CONFIG_X86_64
>  	/*
> -	 * Exception stacks used for IST entries.
> -	 *
> -	 * In the future, this should have a separate slot for each stack
> -	 * with guard pages between them.
> +	 * Exception stacks used for IST entries with guard pages.
>  	 */
> -	char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
> +	struct cea_exception_stacks estacks;
>  #endif
>  #ifdef CONFIG_CPU_SUP_INTEL
>  	/*
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1754,7 +1754,7 @@ void cpu_init(void)
>  	 * set up and load the per-CPU TSS
>  	 */
>  	if (!oist->ist[0]) {
> -		char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
> +		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
>  
>  		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
>  			estacks += exception_stack_sizes[v];
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -13,8 +13,7 @@
>  static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
>  
>  #ifdef CONFIG_X86_64
> -static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
> -	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
> +static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
>  #endif
>  
>  struct cpu_entry_area *get_cpu_entry_area(int cpu)
> @@ -138,9 +137,8 @@ static void __init setup_cpu_entry_area(
>  #ifdef CONFIG_X86_64
>  	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
>  	BUILD_BUG_ON(sizeof(exception_stacks) !=
> -		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
> -	cea_map_percpu_pages(&cea->exception_stacks,
> -			     &per_cpu(exception_stacks, cpu),
> +		     sizeof(((struct cpu_entry_area *)0)->estacks));
> +	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
>  			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
>  #endif
>  	percpu_setup_debug_store(cpu);
> 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 09/29] x86/exceptions: Add structs for exception stacks
  2019-04-05 20:48   ` Sean Christopherson
@ 2019-04-05 20:50     ` Sean Christopherson
  2019-04-05 21:00     ` Thomas Gleixner
  1 sibling, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 20:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 01:48:38PM -0700, Sean Christopherson wrote:
> On Fri, Apr 05, 2019 at 05:07:07PM +0200, Thomas Gleixner wrote:
> > At the moment everything assumes a full linear mapping of the various
> > exception stacks. Adding guard pages to the cpu entry area mapping of the
> > exception stacks will break that assumption.
> > 
> > As a preparatory step convert both the real storage and the effective
> > mapping in the cpu entry area from character arrays to structures.
> > 
> > To ensure that both arrays have the same ordering and the same size of the
> > individual stacks fill the members with a macro. The guard size is the only
> > difference between the two resulting structures. For now both have guard
> > size 0 until the preparation of all usage sites is done.
> > 
> > Provide a couple of helper macros which are used in the following
> > conversions.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  arch/x86/include/asm/cpu_entry_area.h |   51 ++++++++++++++++++++++++++++++----
> >  arch/x86/kernel/cpu/common.c          |    2 -
> >  arch/x86/mm/cpu_entry_area.c          |    8 ++---
> >  3 files changed, 50 insertions(+), 11 deletions(-)
> > 
> > --- a/arch/x86/include/asm/cpu_entry_area.h
> > +++ b/arch/x86/include/asm/cpu_entry_area.h
> > @@ -7,6 +7,50 @@
> >  #include <asm/processor.h>
> >  #include <asm/intel_ds.h>
> >  
> > +#ifdef CONFIG_X86_64
> > +
> > +/* Macro to enforce the same ordering and stack sizes */
> > +#define ESTACKS_MEMBERS(guardsize)		\
> > +	char	DF_stack[EXCEPTION_STKSZ];	\
> > +	char	DF_stack_guard[guardsize];	\
> > +	char	NMI_stack[EXCEPTION_STKSZ];	\
> > +	char	NMI_stack_guard[guardsize];	\
> > +	char	DB_stack[DEBUG_STKSZ];		\
> > +	char	DB_stack_guard[guardsize];	\
> > +	char	MCE_stack[EXCEPTION_STKSZ];	\
> > +	char	MCE_stack_guard[guardsize];	\
> 
> Conceptually, shouldn't the stack guard precede its associated stack
> since the stacks grow down?  And don't we want a guard page below the
> DF_stack?  There could still be a guard page above MCE_stack,
> e.g. IST_stack_guard or something.
> 
> E.g. the example in patch "Speedup in_exception_stack()" also suggests

Gah, the example is from "x86/exceptions: Split debug IST stack".

> that "guard page" is associated with the stack physical above it:
> 
>       --- top of DB_stack       <- Initial stack
>       --- end of DB_stack
>           guard page
> 
>       --- top of DB1_stack      <- Top of stack after entering first #DB
>       --- end of DB1_stack
>           guard page
> 
>       --- top of DB2_stack      <- Top of stack after entering second #DB
>       --- end of DB2_stack
>           guard page
> 
> > +
> > +/* The exception stacks linear storage. No guard pages required */
> > +struct exception_stacks {
> > +	ESTACKS_MEMBERS(0)
> > +};
> > +
> > +/*
> > + * The effective cpu entry area mapping with guard pages. Guard size is
> > + * zero until the code which makes assumptions about linear mapping is
> > + * cleaned up.
> > + */
> > +struct cea_exception_stacks {
> > +	ESTACKS_MEMBERS(0)
> > +};
> > +
> > +#define CEA_ESTACK_TOP(ceastp, st)			\
> > +	((unsigned long)&(ceastp)->st## _stack_guard)
> 
> IMO, using the stack guard to define the top of stack is unnecessarily
> confusing and fragile, e.g. reordering the names of the stack guards
> would break this macro.
> 
> What about:
> 
> #define CEA_ESTACK_TOP(ceastp, st)			\
> 	(CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))
> 
> > +#define CEA_ESTACK_BOT(ceastp, st)			\
> > +	((unsigned long)&(ceastp)->st## _stack)
> > +
> > +#define CEA_ESTACK_OFFS(st)					\
> > +	offsetof(struct cea_exception_stacks, st## _stack)
> > +
> > +#define CEA_ESTACK_SIZE(st)					\
> > +	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
> > +
> > +#define CEA_ESTACK_PAGES					\
> > +	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
> > +
> > +#endif
> > +
> >  /*
> >   * cpu_entry_area is a percpu region that contains things needed by the CPU
> >   * and early entry/exit code.  Real types aren't used for all fields here
> > @@ -32,12 +76,9 @@ struct cpu_entry_area {
> >  
> >  #ifdef CONFIG_X86_64
> >  	/*
> > -	 * Exception stacks used for IST entries.
> > -	 *
> > -	 * In the future, this should have a separate slot for each stack
> > -	 * with guard pages between them.
> > +	 * Exception stacks used for IST entries with guard pages.
> >  	 */
> > -	char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
> > +	struct cea_exception_stacks estacks;
> >  #endif
> >  #ifdef CONFIG_CPU_SUP_INTEL
> >  	/*
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1754,7 +1754,7 @@ void cpu_init(void)
> >  	 * set up and load the per-CPU TSS
> >  	 */
> >  	if (!oist->ist[0]) {
> > -		char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
> > +		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
> >  
> >  		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
> >  			estacks += exception_stack_sizes[v];
> > --- a/arch/x86/mm/cpu_entry_area.c
> > +++ b/arch/x86/mm/cpu_entry_area.c
> > @@ -13,8 +13,7 @@
> >  static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
> >  
> >  #ifdef CONFIG_X86_64
> > -static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
> > -	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
> > +static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
> >  #endif
> >  
> >  struct cpu_entry_area *get_cpu_entry_area(int cpu)
> > @@ -138,9 +137,8 @@ static void __init setup_cpu_entry_area(
> >  #ifdef CONFIG_X86_64
> >  	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
> >  	BUILD_BUG_ON(sizeof(exception_stacks) !=
> > -		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
> > -	cea_map_percpu_pages(&cea->exception_stacks,
> > -			     &per_cpu(exception_stacks, cpu),
> > +		     sizeof(((struct cpu_entry_area *)0)->estacks));
> > +	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
> >  			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
> >  #endif
> >  	percpu_setup_debug_store(cpu);
> > 
> > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 19/29] x86/exceptions: Split debug IST stack
  2019-04-05 15:07 ` [patch V2 19/29] x86/exceptions: Split debug IST stack Thomas Gleixner
@ 2019-04-05 20:55   ` Sean Christopherson
  2019-04-05 21:01     ` Thomas Gleixner
  0 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2019-04-05 20:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, Apr 05, 2019 at 05:07:17PM +0200, Thomas Gleixner wrote:
> The debug IST stack is actually two separate debug stacks to handle #DB
> recursion. This is required because the CPU starts always at top of stack
> on exception entry, which means on #DB recursion the second #DB would
> overwrite the stack of the first.
> 
> The low level entry code therefore adjusts the top of stack on entry so a
> secondary #DB starts from a different stack page. But the stack pages are
> adjacent without a guard page between them.
> 
> Split the debug stack into 3 stacks which are separated by guard pages. The
> 3rd stack is never mapped into the cpu_entry_area and is only there to
> catch triple #DB nesting:
> 
>       --- top of DB_stack	<- Initial stack
>       --- end of DB_stack
>       	  guard page
> 
>       --- top of DB1_stack	<- Top of stack after entering first #DB
>       --- end of DB1_stack
>       	  guard page
> 
>       --- top of DB2_stack	<- Top of stack after entering second #DB
>       --- end of DB2_stack	   
>       	  guard page
> 
> If DB2 would not act as the final guard hole, a second #DB would point the
> top of #DB stack to the stack below #DB1 which would be valid and not catch
> the not so desired triple nesting.
> 
> The backing store does not allocate any memory for DB2 and its guard page
> as it is not going to be mapped into the cpu_entry_area.
> 
>  - Adjust the low level entry code so it adjusts top of #DB with the offset
>    between the stacks instead of exception stack size.
> 
>  - Make the dumpstack code aware of the new stacks.
> 
>  - Adjust the in_debug_stack() implementation and move it into the NMI code
>    where it belongs. As this is NMI hotpath code, it just checks the full
>    area between top of DB_stack and bottom of DB1_stack without checking
>    for the guard page. That's correct because the NMI cannot hit a
>    stackpointer pointing to the guard page between DB and DB1 stack.  Even
>    if it would, then the NMI operation still is unaffected, but the resume
>    of the debug exception on the topmost DB stack will crash by touching
>    the guard page.
> 
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

...

> +static bool notrace is_debug_stack(unsigned long addr)
> +{
> +	struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
> +	unsigned long top = CEA_ESTACK_TOP(cs, DB);
> +	unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
> +
> +	if (__this_cpu_read(debug_stack_usage))
> +		return true;
> +	/*
> +	 * Note, this covers the guard page between DB and DB1 as well to
> +	 * avoid two checks. But by all means @addr can never point into
> +	 * the guard page.
> +	 */
> +	return addr > bot && addr < top;

Isn't this an off by one error?  I.e. "return addr >= bot && addr < top".
%rsp == bot is technically still in the DB1 stack even though the next
PUSH/CALL will explode on the guard page.


> +}
> +NOKPROBE_SYMBOL(is_debug_stack);
>  #endif
>  
>  dotraplinkage notrace void
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -98,10 +98,12 @@ static void __init percpu_setup_exceptio
>  
>  	/*
>  	 * The exceptions stack mappings in the per cpu area are protected
> -	 * by guard pages so each stack must be mapped separately.
> +	 * by guard pages so each stack must be mapped separately. DB2 is
> +	 * not mapped; it just exists to catch triple nesting of #DB.
>  	 */
>  	cea_map_stack(DF);
>  	cea_map_stack(NMI);
> +	cea_map_stack(DB1);
>  	cea_map_stack(DB);
>  	cea_map_stack(MCE);
>  }
> 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 09/29] x86/exceptions: Add structs for exception stacks
  2019-04-05 20:48   ` Sean Christopherson
  2019-04-05 20:50     ` Sean Christopherson
@ 2019-04-05 21:00     ` Thomas Gleixner
  1 sibling, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 21:00 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, 5 Apr 2019, Sean Christopherson wrote:
> On Fri, Apr 05, 2019 at 05:07:07PM +0200, Thomas Gleixner wrote:
> > +#ifdef CONFIG_X86_64
> > +
> > +/* Macro to enforce the same ordering and stack sizes */
> > +#define ESTACKS_MEMBERS(guardsize)		\
> > +	char	DF_stack[EXCEPTION_STKSZ];	\
> > +	char	DF_stack_guard[guardsize];	\
> > +	char	NMI_stack[EXCEPTION_STKSZ];	\
> > +	char	NMI_stack_guard[guardsize];	\
> > +	char	DB_stack[DEBUG_STKSZ];		\
> > +	char	DB_stack_guard[guardsize];	\
> > +	char	MCE_stack[EXCEPTION_STKSZ];	\
> > +	char	MCE_stack_guard[guardsize];	\
> 
> Conceptually, shouldn't the stack guard precede its associated stack
> since the stacks grow down?  And don't we want a guard page below the
> DF_stack?  There could still be a guard page above MCE_stack,
> e.g. IST_stack_guard or something.

Yes and no. :)

Defacto we have already a guard page below #DF. See struct
cpu_entry_area. And because I come from 8 bit microcontrollers, it's just
an instinct to spare/share stuff whereever its possible.

But yes, it looks a bit odd and we can reorder that and have an extra guard
page below the first stack.

> > +#define CEA_ESTACK_TOP(ceastp, st)			\
> > +	((unsigned long)&(ceastp)->st## _stack_guard)
> 
> IMO, using the stack guard to define the top of stack is unnecessarily
> confusing and fragile, e.g. reordering the names of the stack guards
> would break this macro.

For me it's obvious, obviously :)

> What about:
> 
> #define CEA_ESTACK_TOP(ceastp, st)			\
> 	(CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))

Yeah. No problem.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 19/29] x86/exceptions: Split debug IST stack
  2019-04-05 20:55   ` Sean Christopherson
@ 2019-04-05 21:01     ` Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 21:01 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Fri, 5 Apr 2019, Sean Christopherson wrote:
> On Fri, Apr 05, 2019 at 05:07:17PM +0200, Thomas Gleixner wrote:
> > +	/*
> > +	 * Note, this covers the guard page between DB and DB1 as well to
> > +	 * avoid two checks. But by all means @addr can never point into
> > +	 * the guard page.
> > +	 */
> > +	return addr > bot && addr < top;
> 
> Isn't this an off by one error?  I.e. "return addr >= bot && addr < top".
> %rsp == bot is technically still in the DB1 stack even though the next
> PUSH/CALL will explode on the guard page.

Right you are.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-05 15:07 ` [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
@ 2019-04-05 21:55   ` Josh Poimboeuf
  2019-04-05 21:56     ` Thomas Gleixner
  0 siblings, 1 reply; 60+ messages in thread
From: Josh Poimboeuf @ 2019-04-05 21:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Sean Christopherson

On Fri, Apr 05, 2019 at 05:07:18PM +0200, Thomas Gleixner wrote:
> The current implementation of in_exception_stack() iterates over the
> exception stacks array. Most of the time this is an useless exercise, but
> even for the actual use cases (perf and ftrace) it takes at least 2
> iterations to get to the NMI stack.
> 
> As the exception stacks and the guard pages are page aligned the loop can
> be avoided completely.
> 
> Add a initial check whether the stack pointer is inside the full exception
> stack area and leave early if not.
> 
> Create a lookup table which describes the stack area. The table index is
> the page offset from the beginning of the exception stacks. So for any
> given stack pointer the page offset is computed and a lookup in the
> description table is performed. If it is inside a guard page, return. If
> not, use the descriptor to fill in the info structure.
> 
> The table is filled at compile time and for the !KASAN case the interesting
> page descriptors exactly fit into a single cache line. Just the last guard
> page descriptor is in the next cacheline, but that should not be accessed
> in the regular case.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Simplify the macro maze

This is indeed a little better.  It's friday, have an ack.

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

I don't like the clever ISTACK_* macro naming however...

-- 
Josh

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-05 21:55   ` Josh Poimboeuf
@ 2019-04-05 21:56     ` Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 21:56 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: LKML, x86, Andy Lutomirski, Sean Christopherson

On Fri, 5 Apr 2019, Josh Poimboeuf wrote:
> On Fri, Apr 05, 2019 at 05:07:18PM +0200, Thomas Gleixner wrote:
> > The current implementation of in_exception_stack() iterates over the
> > exception stacks array. Most of the time this is an useless exercise, but
> > even for the actual use cases (perf and ftrace) it takes at least 2
> > iterations to get to the NMI stack.
> > 
> > As the exception stacks and the guard pages are page aligned the loop can
> > be avoided completely.
> > 
> > Add a initial check whether the stack pointer is inside the full exception
> > stack area and leave early if not.
> > 
> > Create a lookup table which describes the stack area. The table index is
> > the page offset from the beginning of the exception stacks. So for any
> > given stack pointer the page offset is computed and a lookup in the
> > description table is performed. If it is inside a guard page, return. If
> > not, use the descriptor to fill in the info structure.
> > 
> > The table is filled at compile time and for the !KASAN case the interesting
> > page descriptors exactly fit into a single cache line. Just the last guard
> > page descriptor is in the next cacheline, but that should not be accessed
> > in the regular case.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> > V2: Simplify the macro maze
> 
> This is indeed a little better.  It's friday, have an ack.
> 
> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> I don't like the clever ISTACK_* macro naming however...

Come on it's friday, time for a little bikeshedding :)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order
  2019-04-05 15:07 ` [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
@ 2019-04-05 21:57   ` Josh Poimboeuf
  2019-04-05 22:00     ` Thomas Gleixner
  0 siblings, 1 reply; 60+ messages in thread
From: Josh Poimboeuf @ 2019-04-05 21:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Sean Christopherson

On Fri, Apr 05, 2019 at 05:07:15PM +0200, Thomas Gleixner wrote:
> +/*
> + * The exception stack ordering in [cea_]exception_stacks
> + */
> +enum exception_stack_ordering {
> +	ISTACK_DF,
> +	ISTACK_NMI,
> +	ISTACK_DB,
> +	ISTACK_MCE,
> +	N_EXCEPTION_STACKS
> +};

While clever, it reads as "interrupt stack" to me.  ESTACK or IST_STACK
would be infinitely better.

-- 
Josh

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order
  2019-04-05 21:57   ` Josh Poimboeuf
@ 2019-04-05 22:00     ` Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-05 22:00 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: LKML, x86, Andy Lutomirski, Sean Christopherson

On Fri, 5 Apr 2019, Josh Poimboeuf wrote:

> On Fri, Apr 05, 2019 at 05:07:15PM +0200, Thomas Gleixner wrote:
> > +/*
> > + * The exception stack ordering in [cea_]exception_stacks
> > + */
> > +enum exception_stack_ordering {
> > +	ISTACK_DF,
> > +	ISTACK_NMI,
> > +	ISTACK_DB,
> > +	ISTACK_MCE,
> > +	N_EXCEPTION_STACKS
> > +};
> 
> While clever, it reads as "interrupt stack" to me.  ESTACK or IST_STACK
> would be infinitely better.

Fair enough.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-05 15:07 ` [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
@ 2019-04-07  4:56   ` Andy Lutomirski
  2019-04-07  6:08     ` Thomas Gleixner
  2019-04-07 22:44     ` Thomas Gleixner
  0 siblings, 2 replies; 60+ messages in thread
From: Andy Lutomirski @ 2019-04-07  4:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Andy Lutomirski <luto@kernel.org>
>
> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> will overwrite other data structures.
>
> Use vmap() to remap the IRQ stack so that it will have the usual guard
> pages that vmap/vmalloc allocations have. With this the kernel will panic
> immediately on an IRQ stack overflow.

The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
because the store_stackinfo() function is utter garbage and this patch
correctly detects just how broken it is.  The attached patch "fixes"
it.  (It also contains a reliability improvement that should probably
get folded in, but is otherwise unrelated.)

A real fix would remove the generic kstack_end() function entirely
along with __HAVE_ARCH_KSTACK_END and would optionally replace
store_stackinfo() with something useful.  Josh, do we have a generic
API to do a little stack walk like this?  Otherwise, I don't think it
would be the end of the world to just remove the offending code.

--Andy

[-- Attachment #2: fix.diff --]
[-- Type: text/x-patch, Size: 1968 bytes --]

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 801c6f040faa..eb8939d28f96 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1510,6 +1510,12 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
+/*
+ * The initial hardirq_stack_ptr value of NULL is invalid.  To prevent it
+ * from being used if an IRQ happens too early, we initialize irq_count to 1,
+ * which effectively disables ENTER_IRQ_STACK.  The code that maps the IRQ
+ * stack will reset irq_count to -1.
+ */
 DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 48caa3d31662..61c691889362 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -56,6 +56,7 @@ static int map_irq_stack(unsigned int cpu)
 		return -ENOMEM;
 
 	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	pr_err("*** CPU %u: hardirq_stack_ptr = 0x%lx\n", cpu, (unsigned long)(va + IRQ_STACK_SIZE));
 	return 0;
 }
 #else
@@ -74,7 +75,14 @@ static int map_irq_stack(unsigned int cpu)
 
 int irq_init_percpu_irqstack(unsigned int cpu)
 {
+	int ret;
+
 	if (per_cpu(hardirq_stack_ptr, cpu))
 		return 0;
-	return map_irq_stack(cpu);
+	ret = map_irq_stack(cpu);
+	if (ret)
+		return ret;
+
+	per_cpu(irq_count, cpu) = -1;
+	return 0;
 }
diff --git a/mm/slab.c b/mm/slab.c
index 329bfe67f2ca..198e9948a874 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1481,6 +1481,7 @@ static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
 	size -= 3 * sizeof(unsigned long);
+	/*
 	{
 		unsigned long *sptr = &caller;
 		unsigned long svalue;
@@ -1496,6 +1497,7 @@ static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 		}
 
 	}
+	*/
 	*addr++ = 0x87654321;
 }
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-07  4:56   ` Andy Lutomirski
@ 2019-04-07  6:08     ` Thomas Gleixner
  2019-04-07  9:28       ` Andy Lutomirski
  2019-04-07 22:44     ` Thomas Gleixner
  1 sibling, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-07  6:08 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > From: Andy Lutomirski <luto@kernel.org>
> >
> > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > will overwrite other data structures.
> >
> > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > immediately on an IRQ stack overflow.
> 
> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
> because the store_stackinfo() function is utter garbage and this patch
> correctly detects just how broken it is.  The attached patch "fixes"
> it.  (It also contains a reliability improvement that should probably
> get folded in, but is otherwise unrelated.)
> 
> A real fix would remove the generic kstack_end() function entirely
> along with __HAVE_ARCH_KSTACK_END and would optionally replace
> store_stackinfo() with something useful.  Josh, do we have a generic
> API to do a little stack walk like this?  Otherwise, I don't think it
> would be the end of the world to just remove the offending code.

Yes, I found the same yesterday before heading out. It's already broken
with the percpu stack because there is no guarantee that the per cpu stack
is thread size aligned. It's guaranteed to be page aligned not more.

I'm all for removing that nonsense, but the real question is whether there
is more code which assumes THREAD_SIZE aligned stacks aside of the thread
stack itself.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-07  6:08     ` Thomas Gleixner
@ 2019-04-07  9:28       ` Andy Lutomirski
  2019-04-07  9:34         ` Thomas Gleixner
  0 siblings, 1 reply; 60+ messages in thread
From: Andy Lutomirski @ 2019-04-07  9:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf, Sean Christopherson



> On Apr 6, 2019, at 11:08 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
>>> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>> From: Andy Lutomirski <luto@kernel.org>
>>> 
>>> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
>>> will overwrite other data structures.
>>> 
>>> Use vmap() to remap the IRQ stack so that it will have the usual guard
>>> pages that vmap/vmalloc allocations have. With this the kernel will panic
>>> immediately on an IRQ stack overflow.
>> 
>> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
>> because the store_stackinfo() function is utter garbage and this patch
>> correctly detects just how broken it is.  The attached patch "fixes"
>> it.  (It also contains a reliability improvement that should probably
>> get folded in, but is otherwise unrelated.)
>> 
>> A real fix would remove the generic kstack_end() function entirely
>> along with __HAVE_ARCH_KSTACK_END and would optionally replace
>> store_stackinfo() with something useful.  Josh, do we have a generic
>> API to do a little stack walk like this?  Otherwise, I don't think it
>> would be the end of the world to just remove the offending code.
> 
> Yes, I found the same yesterday before heading out. It's already broken
> with the percpu stack because there is no guarantee that the per cpu stack
> is thread size aligned. It's guaranteed to be page aligned not more.
> 
> I'm all for removing that nonsense, but the real question is whether there
> is more code which assumes THREAD_SIZE aligned stacks aside of the thread
> stack itself.
> 
> 

Well, any code like this is already busted, since the stacks alignment doesn’t really change with these patches applied.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-07  9:28       ` Andy Lutomirski
@ 2019-04-07  9:34         ` Thomas Gleixner
  2019-04-07 14:03           ` Andy Lutomirski
  0 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-07  9:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

[-- Attachment #1: Type: text/plain, Size: 2115 bytes --]

On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > On Apr 6, 2019, at 11:08 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> >> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> >>> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>> 
> >>> From: Andy Lutomirski <luto@kernel.org>
> >>> 
> >>> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> >>> will overwrite other data structures.
> >>> 
> >>> Use vmap() to remap the IRQ stack so that it will have the usual guard
> >>> pages that vmap/vmalloc allocations have. With this the kernel will panic
> >>> immediately on an IRQ stack overflow.
> >> 
> >> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
> >> because the store_stackinfo() function is utter garbage and this patch
> >> correctly detects just how broken it is.  The attached patch "fixes"
> >> it.  (It also contains a reliability improvement that should probably
> >> get folded in, but is otherwise unrelated.)
> >> 
> >> A real fix would remove the generic kstack_end() function entirely
> >> along with __HAVE_ARCH_KSTACK_END and would optionally replace
> >> store_stackinfo() with something useful.  Josh, do we have a generic
> >> API to do a little stack walk like this?  Otherwise, I don't think it
> >> would be the end of the world to just remove the offending code.
> > 
> > Yes, I found the same yesterday before heading out. It's already broken
> > with the percpu stack because there is no guarantee that the per cpu stack
> > is thread size aligned. It's guaranteed to be page aligned not more.
> > 
> > I'm all for removing that nonsense, but the real question is whether there
> > is more code which assumes THREAD_SIZE aligned stacks aside of the thread
> > stack itself.
> > 
> > 
> Well, any code like this is already busted, since the stacks alignment
> doesn’t really change with these patches applied.

It does. The existing code has it aligned by chance because the irq stack
is the first entry in the per cpu area. But yes, there is no reason to require
that alignment for irqstacks.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-07  9:34         ` Thomas Gleixner
@ 2019-04-07 14:03           ` Andy Lutomirski
  0 siblings, 0 replies; 60+ messages in thread
From: Andy Lutomirski @ 2019-04-07 14:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf, Sean Christopherson


> On Apr 7, 2019, at 2:34 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, 7 Apr 2019, Andy Lutomirski wrote:
>>> On Apr 6, 2019, at 11:08 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>>>> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
>>>>> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>> 
>>>>> From: Andy Lutomirski <luto@kernel.org>
>>>>> 
>>>>> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
>>>>> will overwrite other data structures.
>>>>> 
>>>>> Use vmap() to remap the IRQ stack so that it will have the usual guard
>>>>> pages that vmap/vmalloc allocations have. With this the kernel will panic
>>>>> immediately on an IRQ stack overflow.
>>>> 
>>>> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
>>>> because the store_stackinfo() function is utter garbage and this patch
>>>> correctly detects just how broken it is.  The attached patch "fixes"
>>>> it.  (It also contains a reliability improvement that should probably
>>>> get folded in, but is otherwise unrelated.)
>>>> 
>>>> A real fix would remove the generic kstack_end() function entirely
>>>> along with __HAVE_ARCH_KSTACK_END and would optionally replace
>>>> store_stackinfo() with something useful.  Josh, do we have a generic
>>>> API to do a little stack walk like this?  Otherwise, I don't think it
>>>> would be the end of the world to just remove the offending code.
>>> 
>>> Yes, I found the same yesterday before heading out. It's already broken
>>> with the percpu stack because there is no guarantee that the per cpu stack
>>> is thread size aligned. It's guaranteed to be page aligned not more.
>>> 
>>> I'm all for removing that nonsense, but the real question is whether there
>>> is more code which assumes THREAD_SIZE aligned stacks aside of the thread
>>> stack itself.
>>> 
>>> 
>> Well, any code like this is already busted, since the stacks alignment
>> doesn’t really change with these patches applied.
> 
> It does. The existing code has it aligned by chance because the irq stack
> is the first entry in the per cpu area. But yes, there is no reason to require
> that alignment for irqstacks.
> 

Isn’t it the first entry in the percpu area (the normal one, not cea)?  Is that aligned beyond PAGE_SIZE?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-07  4:56   ` Andy Lutomirski
  2019-04-07  6:08     ` Thomas Gleixner
@ 2019-04-07 22:44     ` Thomas Gleixner
  2019-04-08  2:23       ` Andy Lutomirski
  1 sibling, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-07 22:44 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > From: Andy Lutomirski <luto@kernel.org>
> >
> > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > will overwrite other data structures.
> >
> > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > immediately on an IRQ stack overflow.
> 
> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
> because the store_stackinfo() function is utter garbage and this patch
> correctly detects just how broken it is.  The attached patch "fixes"
> it.  (It also contains a reliability improvement that should probably
> get folded in, but is otherwise unrelated.)
> 
> A real fix would remove the generic kstack_end() function entirely
> along with __HAVE_ARCH_KSTACK_END and would optionally replace
> store_stackinfo() with something useful.  Josh, do we have a generic
> API to do a little stack walk like this?  Otherwise, I don't think it
> would be the end of the world to just remove the offending code.

Actually we have: save_stack_trace()


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-07 22:44     ` Thomas Gleixner
@ 2019-04-08  2:23       ` Andy Lutomirski
  2019-04-08  6:46         ` Thomas Gleixner
  0 siblings, 1 reply; 60+ messages in thread
From: Andy Lutomirski @ 2019-04-08  2:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> > On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > From: Andy Lutomirski <luto@kernel.org>
> > >
> > > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > > will overwrite other data structures.
> > >
> > > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > > immediately on an IRQ stack overflow.
> >
> > The 0day bot noticed that this dies with DEBUG_PAGEALLOC on.  This is
> > because the store_stackinfo() function is utter garbage and this patch
> > correctly detects just how broken it is.  The attached patch "fixes"
> > it.  (It also contains a reliability improvement that should probably
> > get folded in, but is otherwise unrelated.)
> >
> > A real fix would remove the generic kstack_end() function entirely
> > along with __HAVE_ARCH_KSTACK_END and would optionally replace
> > store_stackinfo() with something useful.  Josh, do we have a generic
> > API to do a little stack walk like this?  Otherwise, I don't think it
> > would be the end of the world to just remove the offending code.
>
> Actually we have: save_stack_trace()
>

Like I did here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards

(Link is bad right now but will hopefully be okay when you read it.
I'm still fiddling with the other patches in there -- I'd like to kill
kstack_end() entirely.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-08  2:23       ` Andy Lutomirski
@ 2019-04-08  6:46         ` Thomas Gleixner
  2019-04-08 16:18           ` Andy Lutomirski
  0 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-08  6:46 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > Actually we have: save_stack_trace()
> >
> 
> Like I did here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards

Kinda, but what that code wants is to skip any entry before 'caller'. So we
either add something like save_stack_trace_from() which is trivial on x86
because unwind_start() already has an argument to hand in the start of
stack or we filter out the entries up to 'caller' in that code.

Btw, your patch will explode badly because stack_trace::entries is just a
pointer. It does not provide a storage array :)

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-08  6:46         ` Thomas Gleixner
@ 2019-04-08 16:18           ` Andy Lutomirski
  2019-04-08 16:36             ` Josh Poimboeuf
  2019-04-08 16:44             ` Thomas Gleixner
  0 siblings, 2 replies; 60+ messages in thread
From: Andy Lutomirski @ 2019-04-08 16:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Actually we have: save_stack_trace()
> > >
> >
> > Like I did here:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
>
> Kinda, but what that code wants is to skip any entry before 'caller'. So we
> either add something like save_stack_trace_from() which is trivial on x86
> because unwind_start() already has an argument to hand in the start of
> stack or we filter out the entries up to 'caller' in that code.
>
>

Whoops!

I could add a save_stack_trace_from() or I could add a "caller"
argument to struct stack_trace.  Any preference as to which looks
better?  The latter seems a little nicer to me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-08 16:18           ` Andy Lutomirski
@ 2019-04-08 16:36             ` Josh Poimboeuf
  2019-04-08 16:44             ` Thomas Gleixner
  1 sibling, 0 replies; 60+ messages in thread
From: Josh Poimboeuf @ 2019-04-08 16:36 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Thomas Gleixner, LKML, X86 ML, Sean Christopherson

On Mon, Apr 08, 2019 at 09:18:00AM -0700, Andy Lutomirski wrote:
> On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > Actually we have: save_stack_trace()
> > > >
> > >
> > > Like I did here:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
> >
> > Kinda, but what that code wants is to skip any entry before 'caller'. So we
> > either add something like save_stack_trace_from() which is trivial on x86
> > because unwind_start() already has an argument to hand in the start of
> > stack or we filter out the entries up to 'caller' in that code.
> >
> >
> 
> Whoops!
> 
> I could add a save_stack_trace_from() or I could add a "caller"
> argument to struct stack_trace.  Any preference as to which looks
> better?  The latter seems a little nicer to me.

The current official way to do that is to use the stack_trace "skip"
field.  That's a hack though because it relies on inlining decisions.

It would be nicer if the skip interface were pointer-based like your
suggestion.

-- 
Josh

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-08 16:18           ` Andy Lutomirski
  2019-04-08 16:36             ` Josh Poimboeuf
@ 2019-04-08 16:44             ` Thomas Gleixner
  2019-04-08 18:19               ` Thomas Gleixner
  1 sibling, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-08 16:44 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Mon, 8 Apr 2019, Andy Lutomirski wrote:
> On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > Actually we have: save_stack_trace()
> > > >
> > >
> > > Like I did here:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
> >
> > Kinda, but what that code wants is to skip any entry before 'caller'. So we
> > either add something like save_stack_trace_from() which is trivial on x86
> > because unwind_start() already has an argument to hand in the start of
> > stack or we filter out the entries up to 'caller' in that code.
> >
> >
> Whoops!
> 
> I could add a save_stack_trace_from() or I could add a "caller"
> argument to struct stack_trace.  Any preference as to which looks
> better?  The latter seems a little nicer to me.

The whole interface with struct stack_trace sucks. Why is skip and max
entries in that struct and not an argument? I went through all the call
sites and it just makes me shudder. That terminate trace with ULONG_MAX is
another horrible hack which is then undone on several callsites
again. Before we add more hacky stuff to it, lets cleanup that whole mess
first.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-08 16:44             ` Thomas Gleixner
@ 2019-04-08 18:19               ` Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-04-08 18:19 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson

On Mon, 8 Apr 2019, Thomas Gleixner wrote:

> On Mon, 8 Apr 2019, Andy Lutomirski wrote:
> > On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > > > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > > Actually we have: save_stack_trace()
> > > > >
> > > >
> > > > Like I did here:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
> > >
> > > Kinda, but what that code wants is to skip any entry before 'caller'. So we
> > > either add something like save_stack_trace_from() which is trivial on x86
> > > because unwind_start() already has an argument to hand in the start of
> > > stack or we filter out the entries up to 'caller' in that code.
> > >
> > >
> > Whoops!
> > 
> > I could add a save_stack_trace_from() or I could add a "caller"
> > argument to struct stack_trace.  Any preference as to which looks
> > better?  The latter seems a little nicer to me.

Bah, all that sucks. Because 'caller' is comes from __RET_IP__ and is not a
pointer to the stack. So this really needs to be a filter which prevents
storing an entry _before_ caller is seen on the stack.

ftrace and kasan do some post stacktrace filtering as well just different.

That's all bonkers. 

       tglx



^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2019-04-08 18:19 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 15:06 [patch V2 00/29] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
2019-04-05 15:06 ` [patch V2 01/29] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
2019-04-05 15:07 ` [patch V2 02/29] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
2019-04-05 15:44   ` Sean Christopherson
2019-04-05 15:07 ` [patch V2 03/29] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
2019-04-05 16:37   ` Sean Christopherson
2019-04-05 16:38     ` Sean Christopherson
2019-04-05 17:18     ` Josh Poimboeuf
2019-04-05 17:47       ` Thomas Gleixner
2019-04-05 15:07 ` [patch V2 04/29] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
2019-04-05 16:54   ` Sean Christopherson
2019-04-05 15:07 ` [patch V2 05/29] x86/idt: Remove unused macro SISTG Thomas Gleixner
2019-04-05 15:07 ` [patch V2 06/29] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
2019-04-05 15:07 ` [patch V2 07/29] x86/exceptions: Make IST index zero based Thomas Gleixner
2019-04-05 18:59   ` Sean Christopherson
2019-04-05 15:07 ` [patch V2 08/29] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
2019-04-05 19:25   ` Sean Christopherson
2019-04-05 15:07 ` [patch V2 09/29] x86/exceptions: Add structs for exception stacks Thomas Gleixner
2019-04-05 20:48   ` Sean Christopherson
2019-04-05 20:50     ` Sean Christopherson
2019-04-05 21:00     ` Thomas Gleixner
2019-04-05 15:07 ` [patch V2 10/29] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
2019-04-05 15:07 ` [patch V2 11/29] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
2019-04-05 15:07 ` [patch V2 12/29] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
2019-04-05 15:07 ` [patch V2 13/29] x86/irq/64: Use cpu entry area " Thomas Gleixner
2019-04-05 15:07 ` [patch V2 14/29] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
2019-04-05 15:07 ` [patch V2 15/29] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
2019-04-05 15:07 ` [patch V2 16/29] x86/cpu: Remove orig_ist array Thomas Gleixner
2019-04-05 15:07 ` [patch V2 17/29] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
2019-04-05 21:57   ` Josh Poimboeuf
2019-04-05 22:00     ` Thomas Gleixner
2019-04-05 15:07 ` [patch V2 18/29] x86/exceptions: Enable IST guard pages Thomas Gleixner
2019-04-05 15:07 ` [patch V2 19/29] x86/exceptions: Split debug IST stack Thomas Gleixner
2019-04-05 20:55   ` Sean Christopherson
2019-04-05 21:01     ` Thomas Gleixner
2019-04-05 15:07 ` [patch V2 20/29] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
2019-04-05 21:55   ` Josh Poimboeuf
2019-04-05 21:56     ` Thomas Gleixner
2019-04-05 15:07 ` [patch V2 21/29] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
2019-04-05 15:07 ` [patch V2 22/29] x86/irq/32: Make irq stack a character array Thomas Gleixner
2019-04-05 15:07 ` [patch V2 23/29] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
2019-04-05 15:07 ` [patch V2 24/29] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
2019-04-05 15:07 ` [patch V2 25/29] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
2019-04-05 15:27   ` Juergen Gross
2019-04-05 15:07 ` [patch V2 26/29] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
2019-04-05 15:07 ` [patch V2 27/29] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
2019-04-05 15:07 ` [patch V2 28/29] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
2019-04-07  4:56   ` Andy Lutomirski
2019-04-07  6:08     ` Thomas Gleixner
2019-04-07  9:28       ` Andy Lutomirski
2019-04-07  9:34         ` Thomas Gleixner
2019-04-07 14:03           ` Andy Lutomirski
2019-04-07 22:44     ` Thomas Gleixner
2019-04-08  2:23       ` Andy Lutomirski
2019-04-08  6:46         ` Thomas Gleixner
2019-04-08 16:18           ` Andy Lutomirski
2019-04-08 16:36             ` Josh Poimboeuf
2019-04-08 16:44             ` Thomas Gleixner
2019-04-08 18:19               ` Thomas Gleixner
2019-04-05 15:07 ` [patch V2 29/29] x86/irq/64: Remove stack overflow debug code Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.