All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks
@ 2019-04-14 15:59 Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 01/32] mm/slab: Fix broken stack trace storage Thomas Gleixner
                   ` (31 more replies)
  0 siblings, 32 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Hi!

This is an updated version of the guard page series:

 V1:   https://lkml.kernel.org/r/20190331214020.836098943@linutronix.de
 V2:   https://lkml.kernel.org/r/20190405150658.237064784@linutronix.de

Changes vs. V2:

  - Fixed the broken stack trace storage in slub

  - Adjust the guard page ordering and fix the top macro (Sean)

  - Fix an off by one in the debug stack check (Sean)

  - Rename ISTACK_ prefix to ESTACK_ (Josh)

  - Collected Acked/Reviewed-by tags

The lot is also available from tip:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/stackguards

     310a7f5b0b42 ("x86/irq/64: Remove stack overflow debug code")

Thanks,

        tglx

8<-------------
 Documentation/x86/kernel-stacks       |   13 +++-
 arch/x86/Kconfig                      |    2 
 arch/x86/entry/entry_64.S             |   16 ++---
 arch/x86/include/asm/cpu_entry_area.h |   69 +++++++++++++++++++++--
 arch/x86/include/asm/debugreg.h       |    2 
 arch/x86/include/asm/irq.h            |    6 --
 arch/x86/include/asm/page_32_types.h  |    8 +-
 arch/x86/include/asm/page_64_types.h  |   16 ++---
 arch/x86/include/asm/processor.h      |   43 +++++---------
 arch/x86/include/asm/smp.h            |    2 
 arch/x86/include/asm/stackprotector.h |    6 +-
 arch/x86/include/asm/stacktrace.h     |    2 
 arch/x86/kernel/asm-offsets_64.c      |    4 +
 arch/x86/kernel/cpu/common.c          |   60 +++-----------------
 arch/x86/kernel/dumpstack_32.c        |    8 +-
 arch/x86/kernel/dumpstack_64.c        |   99 +++++++++++++++++++++++-----------
 arch/x86/kernel/head_64.S             |    2 
 arch/x86/kernel/idt.c                 |   19 +++---
 arch/x86/kernel/irq_32.c              |   41 +++++++-------
 arch/x86/kernel/irq_64.c              |   89 +++++++++++++++---------------
 arch/x86/kernel/irqinit.c             |    4 -
 arch/x86/kernel/nmi.c                 |   20 ++++++
 arch/x86/kernel/setup_percpu.c        |    5 -
 arch/x86/kernel/smpboot.c             |   15 ++++-
 arch/x86/kernel/vmlinux.lds.S         |    7 +-
 arch/x86/mm/cpu_entry_area.c          |   64 +++++++++++++++------
 arch/x86/mm/fault.c                   |    3 -
 arch/x86/tools/relocs.c               |    2 
 arch/x86/xen/smp_pv.c                 |    4 +
 arch/x86/xen/xen-head.S               |   10 +--
 drivers/xen/events/events_base.c      |    1 
 mm/slab.c                             |   30 ++++------
 32 files changed, 382 insertions(+), 290 deletions(-)



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 01/32] mm/slab: Fix broken stack trace storage
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-14 16:16     ` Andy Lutomirski
  2019-04-14 15:59 ` [patch V3 02/32] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson,
	Andrew Morton, Pekka Enberg, linux-mm

kstack_end() is broken on interrupt stacks as they are not guaranteed to be
sized THREAD_SIZE and THREAD_SIZE aligned.

Use the stack tracer instead. Remove the pointless pointer increment at the
end of the function while at it.

Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
---
 mm/slab.c |   28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1470,33 +1470,29 @@ static bool is_debug_pagealloc_cache(str
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 			    unsigned long caller)
 {
-	int size = cachep->object_size;
+	int size = cachep->object_size / sizeof(unsigned long);
 
 	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
 
-	if (size < 5 * sizeof(unsigned long))
+	if (size < 5)
 		return;
 
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
-	size -= 3 * sizeof(unsigned long);
+#ifdef CONFIG_STACKTRACE
 	{
-		unsigned long *sptr = &caller;
-		unsigned long svalue;
-
-		while (!kstack_end(sptr)) {
-			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
-				*addr++ = svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
-		}
+		struct stack_trace trace = {
+			.max_entries	= size - 4;
+			.entries	= addr;
+			.skip		= 3;
+		};
 
+		save_stack_trace(&trace);
+		addr += trace.nr_entries;
 	}
-	*addr++ = 0x87654321;
+#endif
+	*addr = 0x87654321;
 }
 
 static void slab_kernel_map(struct kmem_cache *cachep, void *objp,



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 02/32] x86/irq/64: Limit IST stack overflow check to #DB stack
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 01/32] mm/slab: Fix broken stack trace storage Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:02   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 03/32] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson,
	Mitsuo Hayasaka

commit 37fe6a42b343 ("x86: Check stack overflow in detail") added a broad
check for the full exception stack area, i.e. it considers the full
exception stack area as valid.

That's wrong in two aspects:

 1) It does not check the individual areas one by one

 2) #DF, NMI and #MCE are not enabling interrupts which means that a
    regular device interrupt cannot happen in their context. In fact if a
    device interrupt hits one of those IST stacks that's a bug because some
    code path enabled interrupts while handling the exception.

Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
---
 arch/x86/kernel/irq_64.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -26,9 +26,18 @@ int sysctl_panic_on_stackoverflow;
 /*
  * Probabilistic stack overflow check:
  *
- * Only check the stack in process context, because everything else
- * runs on the big interrupt stacks. Checking reliably is too expensive,
- * so we just check from interrupts.
+ * Regular device interrupts can enter on the following stacks:
+ *
+ * - User stack
+ *
+ * - Kernel task stack
+ *
+ * - Interrupt stack if a device driver reenables interrupts
+ *   which should only happen in really old drivers.
+ *
+ * - Debug IST stack
+ *
+ * All other contexts are invalid.
  */
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
@@ -53,8 +62,8 @@ static inline void stack_overflow_check(
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[0] - EXCEPTION_STKSZ + STACK_TOP_MARGIN;
-	estack_bottom = (u64)oist->ist[N_EXCEPTION_STACKS - 1];
+	estack_bottom = (u64)oist->ist[DEBUG_STACK];
+	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
 	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
 		return;
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 03/32] x86/dumpstack: Fix off-by-one errors in stack identification
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 01/32] mm/slab: Fix broken stack trace storage Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 02/32] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:03   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
  2019-04-14 15:59 ` [patch V3 04/32] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
                   ` (28 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

The get_stack_info() function is off-by-one when checking whether an
address is on a IRQ stack or a IST stack.  This prevents a overflowed IRQ
or IST stack from being dumped properly.

[ tglx: Do the same for 32-bit ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

---
 arch/x86/kernel/dumpstack_32.c |    4 ++--
 arch/x86/kernel/dumpstack_64.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -41,7 +41,7 @@ static bool in_hardirq_stack(unsigned lo
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack > end)
 		return false;
 
 	info->type	= STACK_TYPE_IRQ;
@@ -66,7 +66,7 @@ static bool in_softirq_stack(unsigned lo
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack > end)
 		return false;
 
 	info->type	= STACK_TYPE_SOFTIRQ;
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -65,7 +65,7 @@ static bool in_exception_stack(unsigned
 		begin = end - (exception_stack_sizes[k] / sizeof(long));
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack <= begin || stack >= end)
+		if (stack < begin || stack >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
@@ -88,7 +88,7 @@ static bool in_irq_stack(unsigned long *
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack >= end)
 		return false;
 
 	info->type	= STACK_TYPE_IRQ;



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 04/32] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 03/32] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:03   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
  2019-04-14 15:59 ` [patch V3 05/32] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

stack_overflow_check() is using both irq_stack_ptr and irq_stack_union to
find the IRQ stack. That's going to break when vmapped irq stacks are
introduced.

Change it to just use irq_stack_ptr.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

---
 arch/x86/kernel/irq_64.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -55,9 +55,8 @@ static inline void stack_overflow_check(
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
-			STACK_TOP_MARGIN;
 	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
 	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
 		return;
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 05/32] x86/irq/64: Sanitize the top/bottom confusion
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 04/32] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:04   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 06/32] x86/idt: Remove unused macro SISTG Thomas Gleixner
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

On x86 stacks go top to bottom, but the stack overflow check uses it the
other way round, which is just confusing. Clean it up and sanitize the
warning string a bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/irq_64.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -42,7 +42,7 @@ int sysctl_panic_on_stackoverflow;
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
-#define STACK_TOP_MARGIN	128
+#define STACK_MARGIN	128
 	struct orig_ist *oist;
 	u64 irq_stack_top, irq_stack_bottom;
 	u64 estack_top, estack_bottom;
@@ -51,25 +51,25 @@ static inline void stack_overflow_check(
 	if (user_mode(regs))
 		return;
 
-	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_TOP_MARGIN &&
+	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_MARGIN &&
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
-	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
-	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
+	irq_stack_top = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
+	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_bottom = (u64)oist->ist[DEBUG_STACK];
-	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
-	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
+	estack_top = (u64)oist->ist[DEBUG_STACK];
+	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
+	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
 
-	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx,ip:%pF)\n",
+	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx, irq stack:%Lx-%Lx, exception stack: %Lx-%Lx, ip:%pF)\n",
 		current->comm, curbase, regs->sp,
-		irq_stack_top, irq_stack_bottom,
-		estack_top, estack_bottom, (void *)regs->ip);
+		irq_stack_bottom, irq_stack_top,
+		estack_bottom, estack_top, (void *)regs->ip);
 
 	if (sysctl_panic_on_stackoverflow)
 		panic("low stack detected by irq handler - check messages\n");



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 06/32] x86/idt: Remove unused macro SISTG
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 05/32] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:05   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 07/32] x86/64: Remove stale CURRENT_MASK Thomas Gleixner
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

commit d8ba61ba58c8 ("x86/entry/64: Don't use IST entry for #BP stack")
removed the last user but left the macro around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/idt.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -45,10 +45,6 @@ struct idt_data {
 #define ISTG(_vector, _addr, _ist)			\
 	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
-/* System interrupt gate with interrupt stack */
-#define SISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL3, __KERNEL_CS)
-
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
 	G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 07/32] x86/64: Remove stale CURRENT_MASK
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 06/32] x86/idt: Remove unused macro SISTG Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:06   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 08/32] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Nothing uses that and before people get the wrong ideas, get rid of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/page_64_types.h |    1 -
 1 file changed, 1 deletion(-)

--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -14,7 +14,6 @@
 
 #define THREAD_SIZE_ORDER	(2 + KASAN_STACK_ORDER)
 #define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
-#define CURRENT_MASK (~(THREAD_SIZE - 1))
 
 #define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
 #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 08/32] x86/exceptions: Remove unused stack defines on 32bit
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 07/32] x86/64: Remove stale CURRENT_MASK Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:06   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 09/32] x86/exceptions: Make IST index zero based Thomas Gleixner
                   ` (23 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Nothing requires those for 32bit builds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/page_32_types.h |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,11 +22,7 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 0
-#define DEBUG_STACK 0
-#define MCE_STACK 0
-#define N_EXCEPTION_STACKS 1
+#define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
 /*



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 09/32] x86/exceptions: Make IST index zero based
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 08/32] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:07   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 10/32] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The defines for the exception stack (IST) array in the TSS are using the
SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
array indices related to IST. That's confusing at best and does not provide
any value.

Make the indices zero based and fixup the usage sites. The only code which
needs to adjust the 0 based index is the interrupt descriptor setup which
needs to add 1 now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 Documentation/x86/kernel-stacks      |    8 ++++----
 arch/x86/entry/entry_64.S            |    4 ++--
 arch/x86/include/asm/page_64_types.h |   13 ++++++++-----
 arch/x86/kernel/cpu/common.c         |    4 ++--
 arch/x86/kernel/dumpstack_64.c       |   14 +++++++-------
 arch/x86/kernel/idt.c                |   15 +++++++++------
 arch/x86/kernel/irq_64.c             |    2 +-
 arch/x86/mm/fault.c                  |    2 +-
 8 files changed, 34 insertions(+), 28 deletions(-)

--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -59,7 +59,7 @@ If that assumption is ever broken then t
 
 The currently assigned IST stacks are :-
 
-* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ESTACK_DF.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).
 
@@ -68,7 +68,7 @@ The currently assigned IST stacks are :-
   Using a separate stack allows the kernel to recover from it well enough
   in many cases to still output an oops.
 
-* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ESTACK_NMI.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for non-maskable interrupts (NMI).
 
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* DEBUG_STACK.  DEBUG_STKSZ
+* ESTACK_DB.  DEBUG_STKSZ
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,7 +86,7 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
-* MCE_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ESTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
 
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -841,7 +841,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8)
 
 /**
  * idtentry - Generate an IDT entry stub
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=ESTACK_DB
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -24,11 +24,14 @@
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 2
-#define DEBUG_STACK 3
-#define MCE_STACK 4
-#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
+/*
+ * The index for the tss.ist[] array. The hardware limit is 7 entries.
+ */
+#define	ESTACK_DF		0
+#define	ESTACK_NMI		1
+#define	ESTACK_DB		2
+#define	ESTACK_MCE		3
+#define	N_EXCEPTION_STACKS	4
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -516,7 +516,7 @@ DEFINE_PER_CPU(struct cpu_entry_area *,
  */
 static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
+	  [ESTACK_DB]				= DEBUG_STKSZ
 };
 #endif
 
@@ -1760,7 +1760,7 @@ void cpu_init(void)
 			estacks += exception_stack_sizes[v];
 			oist->ist[v] = t->x86_tss.ist[v] =
 					(unsigned long)estacks;
-			if (v == DEBUG_STACK-1)
+			if (v == ESTACK_DB)
 				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
 		}
 	}
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -18,16 +18,16 @@
 
 #include <asm/stacktrace.h>
 
-static char *exception_stack_names[N_EXCEPTION_STACKS] = {
-		[ DOUBLEFAULT_STACK-1	]	= "#DF",
-		[ NMI_STACK-1		]	= "NMI",
-		[ DEBUG_STACK-1		]	= "#DB",
-		[ MCE_STACK-1		]	= "#MC",
+static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ ESTACK_DF	]	= "#DF",
+		[ ESTACK_NMI	]	= "NMI",
+		[ ESTACK_DB	]	= "#DB",
+		[ ESTACK_MCE	]	= "#MC",
 };
 
-static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[DEBUG_STACK - 1]			= DEBUG_STKSZ
+	[ESTACK_DB]				= DEBUG_STKSZ
 };
 
 const char *stack_type_name(enum stack_type type)
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -41,9 +41,12 @@ struct idt_data {
 #define SYSG(_vector, _addr)				\
 	G(_vector, _addr, DEFAULT_STACK, GATE_INTERRUPT, DPL3, __KERNEL_CS)
 
-/* Interrupt gate with interrupt stack */
+/*
+ * Interrupt gate with interrupt stack. The _ist index is the index in
+ * the tss.ist[] array, but for the descriptor it needs to start at 1.
+ */
 #define ISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
+	G(_vector, _addr, _ist + 1, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
@@ -180,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] _
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		DEBUG_STACK),
-	ISTG(X86_TRAP_NMI,	nmi,		NMI_STACK),
-	ISTG(X86_TRAP_DF,	double_fault,	DOUBLEFAULT_STACK),
+	ISTG(X86_TRAP_DB,	debug,		ESTACK_DB),
+	ISTG(X86_TRAP_NMI,	nmi,		ESTACK_NMI),
+	ISTG(X86_TRAP_DF,	double_fault,	ESTACK_DF),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	MCE_STACK),
+	ISTG(X86_TRAP_MC,	&machine_check,	ESTACK_MCE),
 #endif
 };
 
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -61,7 +61,7 @@ static inline void stack_overflow_check(
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[DEBUG_STACK];
+	estack_top = (u64)oist->ist[ESTACK_DB];
 	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
 	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -793,7 +793,7 @@ no_context(struct pt_regs *regs, unsigne
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
+		unsigned long stack = this_cpu_read(orig_ist.ist[ESTACK_DF]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 10/32] x86/cpu_entry_area: Cleanup setup functions
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 09/32] x86/exceptions: Make IST index zero based Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:08   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 11/32] x86/exceptions: Add structs for exception stacks Thomas Gleixner
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

No point in retrieving the entry area pointer over and over. Do it once and
use unsigned int for 'cpu' everywhere.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/mm/cpu_entry_area.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -52,10 +52,10 @@ cea_map_percpu_pages(void *cea_vaddr, vo
 		cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
 }
 
-static void __init percpu_setup_debug_store(int cpu)
+static void __init percpu_setup_debug_store(unsigned int cpu)
 {
 #ifdef CONFIG_CPU_SUP_INTEL
-	int npages;
+	unsigned int npages;
 	void *cea;
 
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
@@ -79,8 +79,9 @@ static void __init percpu_setup_debug_st
 }
 
 /* Setup the fixmap mappings only once per-processor */
-static void __init setup_cpu_entry_area(int cpu)
+static void __init setup_cpu_entry_area(unsigned int cpu)
 {
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
 #ifdef CONFIG_X86_64
 	/* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
 	pgprot_t gdt_prot = PAGE_KERNEL_RO;
@@ -101,10 +102,9 @@ static void __init setup_cpu_entry_area(
 	pgprot_t tss_prot = PAGE_KERNEL;
 #endif
 
-	cea_set_pte(&get_cpu_entry_area(cpu)->gdt, get_cpu_gdt_paddr(cpu),
-		    gdt_prot);
+	cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);
 
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->entry_stack_page,
+	cea_map_percpu_pages(&cea->entry_stack_page,
 			     per_cpu_ptr(&entry_stack_storage, cpu), 1,
 			     PAGE_KERNEL);
 
@@ -128,19 +128,18 @@ static void __init setup_cpu_entry_area(
 	BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
 		      offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
 	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->tss,
-			     &per_cpu(cpu_tss_rw, cpu),
+	cea_map_percpu_pages(&cea->tss, &per_cpu(cpu_tss_rw, cpu),
 			     sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
 
 #ifdef CONFIG_X86_32
-	per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
+	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
 		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->exception_stacks,
+	cea_map_percpu_pages(&cea->exception_stacks,
 			     &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 11/32] x86/exceptions: Add structs for exception stacks
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 10/32] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-16 18:20   ` Sean Christopherson
  2019-04-17 14:08   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 12/32] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
                   ` (20 subsequent siblings)
  31 siblings, 2 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

At the moment everything assumes a full linear mapping of the various
exception stacks. Adding guard pages to the cpu entry area mapping of the
exception stacks will break that assumption.

As a preparatory step convert both the real storage and the effective
mapping in the cpu entry area from character arrays to structures.

To ensure that both arrays have the same ordering and the same size of the
individual stacks fill the members with a macro. The guard size is the only
difference between the two resulting structures. For now both have guard
size 0 until the preparation of all usage sites is done.

Provide a couple of helper macros which are used in the following
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2 -> V3: Move the guards below the stacks and add a top guard. Fix the
      	  TOP() macro.
---
 arch/x86/include/asm/cpu_entry_area.h |   52 ++++++++++++++++++++++++++++++----
 arch/x86/kernel/cpu/common.c          |    2 -
 arch/x86/mm/cpu_entry_area.c          |    8 +----
 3 files changed, 51 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -7,6 +7,51 @@
 #include <asm/processor.h>
 #include <asm/intel_ds.h>
 
+#ifdef CONFIG_X86_64
+
+/* Macro to enforce the same ordering and stack sizes */
+#define ESTACKS_MEMBERS(guardsize)		\
+	char	DF_stack_guard[guardsize];	\
+	char	DF_stack[EXCEPTION_STKSZ];	\
+	char	NMI_stack_guard[guardsize];	\
+	char	NMI_stack[EXCEPTION_STKSZ];	\
+	char	DB_stack_guard[guardsize];	\
+	char	DB_stack[DEBUG_STKSZ];		\
+	char	MCE_stack_guard[guardsize];	\
+	char	MCE_stack[EXCEPTION_STKSZ];	\
+	char	IST_top_guard[guardsize];	\
+
+/* The exception stacks linear storage. No guard pages required */
+struct exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+/*
+ * The effective cpu entry area mapping with guard pages. Guard size is
+ * zero until the code which makes assumptions about linear mappings is
+ * cleaned up.
+ */
+struct cea_exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+#define CEA_ESTACK_SIZE(st)					\
+	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
+
+#define CEA_ESTACK_BOT(ceastp, st)				\
+	((unsigned long)&(ceastp)->st## _stack)
+
+#define CEA_ESTACK_TOP(ceastp, st)				\
+	(CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))
+
+#define CEA_ESTACK_OFFS(st)					\
+	offsetof(struct cea_exception_stacks, st## _stack)
+
+#define CEA_ESTACK_PAGES					\
+	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
+
+#endif
+
 /*
  * cpu_entry_area is a percpu region that contains things needed by the CPU
  * and early entry/exit code.  Real types aren't used for all fields here
@@ -32,12 +77,9 @@ struct cpu_entry_area {
 
 #ifdef CONFIG_X86_64
 	/*
-	 * Exception stacks used for IST entries.
-	 *
-	 * In the future, this should have a separate slot for each stack
-	 * with guard pages between them.
+	 * Exception stacks used for IST entries with guard pages.
 	 */
-	char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
+	struct cea_exception_stacks estacks;
 #endif
 #ifdef CONFIG_CPU_SUP_INTEL
 	/*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1754,7 +1754,7 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!oist->ist[0]) {
-		char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
+		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
 
 		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
 			estacks += exception_stack_sizes[v];
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -13,8 +13,7 @@
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
 
 #ifdef CONFIG_X86_64
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
-	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -138,9 +137,8 @@ static void __init setup_cpu_entry_area(
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&cea->exception_stacks,
-			     &per_cpu(exception_stacks, cpu),
+		     sizeof(((struct cpu_entry_area *)0)->estacks));
+	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif
 	percpu_setup_debug_store(cpu);



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 12/32] x86/cpu_entry_area: Prepare for IST guard pages
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (10 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 11/32] x86/exceptions: Add structs for exception stacks Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:09   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 13/32] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

To allow guard pages between the IST stacks each stack needs to be mapped
individually.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/cpu_entry_area.c |   37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -77,6 +77,34 @@ static void __init percpu_setup_debug_st
 #endif
 }
 
+#ifdef CONFIG_X86_64
+
+#define cea_map_stack(name) do {					\
+	npages = sizeof(estacks->name## _stack) / PAGE_SIZE;		\
+	cea_map_percpu_pages(cea->estacks.name## _stack,		\
+			estacks->name## _stack, npages, PAGE_KERNEL);	\
+	} while (0)
+
+static void __init percpu_setup_exception_stacks(unsigned int cpu)
+{
+	struct exception_stacks *estacks = per_cpu_ptr(&exception_stacks, cpu);
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
+	unsigned int npages;
+
+	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+	/*
+	 * The exceptions stack mappings in the per cpu area are protected
+	 * by guard pages so each stack must be mapped separately.
+	 */
+	cea_map_stack(DF);
+	cea_map_stack(NMI);
+	cea_map_stack(DB);
+	cea_map_stack(MCE);
+}
+#else
+static inline void percpu_setup_exception_stacks(unsigned int cpu) {}
+#endif
+
 /* Setup the fixmap mappings only once per-processor */
 static void __init setup_cpu_entry_area(unsigned int cpu)
 {
@@ -134,13 +162,8 @@ static void __init setup_cpu_entry_area(
 	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
-#ifdef CONFIG_X86_64
-	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
-	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->estacks));
-	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
-			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
-#endif
+	percpu_setup_exception_stacks(cpu);
+
 	percpu_setup_debug_store(cpu);
 }
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 13/32] x86/cpu_entry_area: Provide exception stack accessor
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (11 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 12/32] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:10   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 14/32] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Store a pointer to the per cpu entry area exception stack mappings to allow
fast retrieval.

Required for converting various places from using the shadow IST array to
directly doing address calculations on the actual mapping address.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |    4 ++++
 arch/x86/mm/cpu_entry_area.c          |    4 ++++
 2 files changed, 8 insertions(+)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -98,6 +98,7 @@ struct cpu_entry_area {
 #define CPU_ENTRY_AREA_TOT_SIZE	(CPU_ENTRY_AREA_SIZE * NR_CPUS)
 
 DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks);
 
 extern void setup_cpu_entry_areas(void);
 extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
@@ -117,4 +118,7 @@ static inline struct entry_stack *cpu_en
 	return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
 }
 
+#define __this_cpu_ist_top_va(name)					\
+	CEA_ESTACK_TOP(__this_cpu_read(cea_exception_stacks), name)
+
 #endif
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -14,6 +14,7 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(struc
 
 #ifdef CONFIG_X86_64
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
+DEFINE_PER_CPU(struct cea_exception_stacks*, cea_exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -92,6 +93,9 @@ static void __init percpu_setup_exceptio
 	unsigned int npages;
 
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+
+	per_cpu(cea_exception_stacks, cpu) = &cea->estacks;
+
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
 	 * by guard pages so each stack must be mapped separately.



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 14/32] x86/traps: Use cpu_entry_area instead of orig_ist
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (12 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 13/32] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:10   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 15/32] x86/irq/64: Use cpu entry area " Thomas Gleixner
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/fault.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
 #include <asm/efi.h>			/* efi_recover_from_page_fault()*/
 #include <asm/desc.h>			/* store_idt(), ...		*/
+#include <asm/cpu_entry_area.h>		/* exception stack		*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -793,7 +794,7 @@ no_context(struct pt_regs *regs, unsigne
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[ESTACK_DF]) - sizeof(void *);
+		unsigned long stack = __this_cpu_ist_top_va(DF) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 15/32] x86/irq/64: Use cpu entry area instead of orig_ist
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (13 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 14/32] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:11   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 16/32] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq_64.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,8 @@
 #include <linux/uaccess.h>
 #include <linux/smp.h>
 #include <linux/sched/task_stack.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
@@ -43,10 +45,9 @@ static inline void stack_overflow_check(
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 #define STACK_MARGIN	128
-	struct orig_ist *oist;
-	u64 irq_stack_top, irq_stack_bottom;
-	u64 estack_top, estack_bottom;
+	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
 	u64 curbase = (u64)task_stack_page(current);
+	struct cea_exception_stacks *estacks;
 
 	if (user_mode(regs))
 		return;
@@ -60,9 +61,9 @@ static inline void stack_overflow_check(
 	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
 
-	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[ESTACK_DB];
-	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
+	estacks = __this_cpu_read(cea_exception_stacks);
+	estack_top = CEA_ESTACK_TOP(estacks, DB);
+	estack_bottom = CEA_ESTACK_BOT(estacks, DB) + STACK_MARGIN;
 	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 16/32] x86/dumpstack/64: Use cpu_entry_area instead of orig_ist
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (14 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 15/32] x86/irq/64: Use cpu entry area " Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:12   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 17/32] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
                   ` (15 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c |   39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/nmi.h>
 
+#include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
 static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
@@ -25,11 +26,6 @@ static const char *exception_stack_names
 		[ ESTACK_MCE	]	= "#MC",
 };
 
-static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[ESTACK_DB]				= DEBUG_STKSZ
-};
-
 const char *stack_type_name(enum stack_type type)
 {
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
@@ -52,25 +48,44 @@ const char *stack_type_name(enum stack_t
 	return NULL;
 }
 
+struct estack_layout {
+	unsigned int	begin;
+	unsigned int	end;
+};
+
+#define	ESTACK_ENTRY(x)	{						  \
+	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
+	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
+	}
+
+static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+	[ ESTACK_DF	]	= ESTACK_ENTRY(DF),
+	[ ESTACK_NMI	]	= ESTACK_ENTRY(NMI),
+	[ ESTACK_DB	]	= ESTACK_ENTRY(DB),
+	[ ESTACK_MCE	]	= ESTACK_ENTRY(MCE),
+};
+
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin, *end;
+	unsigned long estacks, begin, end, stk = (unsigned long)stack;
 	struct pt_regs *regs;
-	unsigned k;
+	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
+	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		begin = estacks + layout[k].begin;
+		end   = estacks + layout[k].end;
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack < begin || stack >= end)
+		if (stk < begin || stk >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= begin;
-		info->end	= end;
+		info->begin	= (unsigned long *)begin;
+		info->end	= (unsigned long *)end;
 		info->next_sp	= (unsigned long *)regs->sp;
 
 		return true;



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 17/32] x86/cpu: Prepare TSS.IST setup for guard pages
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (15 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 16/32] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:13   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 18/32] x86/cpu: Remove orig_ist array Thomas Gleixner
                   ` (14 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Convert the TSS.IST setup code to use the cpu entry area information
directly instead of assuming a linear mapping of the IST stacks.

The store to orig_ist[] is no longer required as there are no users
anymore.

This is the last preparatory step for IST guard pages.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/common.c |   35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -507,19 +507,6 @@ void load_percpu_segment(int cpu)
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
 #endif
 
-#ifdef CONFIG_X86_64
-/*
- * Special IST stacks which the CPU switches to when it calls
- * an IST-marked descriptor entry. Up to 7 stacks (hardware
- * limit), all of them are 4K, except the debug stack which
- * is 8K.
- */
-static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [ESTACK_DB]				= DEBUG_STKSZ
-};
-#endif
-
 /* Load the original GDT from the per-cpu structure */
 void load_direct_gdt(int cpu)
 {
@@ -1690,17 +1677,14 @@ static void setup_getcpu(int cpu)
  * initialized (naturally) in the bootstrap process, such as the GDT
  * and IDT. We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
- * A lot of state is already set up in PDA init for 64 bit
  */
 #ifdef CONFIG_X86_64
 
 void cpu_init(void)
 {
-	struct orig_ist *oist;
+	int cpu = raw_smp_processor_id();
 	struct task_struct *me;
 	struct tss_struct *t;
-	unsigned long v;
-	int cpu = raw_smp_processor_id();
 	int i;
 
 	wait_for_master_cpu(cpu);
@@ -1715,7 +1699,6 @@ void cpu_init(void)
 		load_ucode_ap();
 
 	t = &per_cpu(cpu_tss_rw, cpu);
-	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
 	if (this_cpu_read(numa_node) == 0 &&
@@ -1753,16 +1736,12 @@ void cpu_init(void)
 	/*
 	 * set up and load the per-CPU TSS
 	 */
-	if (!oist->ist[0]) {
-		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
-
-		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
-			estacks += exception_stack_sizes[v];
-			oist->ist[v] = t->x86_tss.ist[v] =
-					(unsigned long)estacks;
-			if (v == ESTACK_DB)
-				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
-		}
+	if (!t->x86_tss.ist[0]) {
+		t->x86_tss.ist[ESTACK_DF] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[ESTACK_NMI] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[ESTACK_DB] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[ESTACK_MCE] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[ESTACK_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 18/32] x86/cpu: Remove orig_ist array
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (16 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 17/32] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:13   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 19/32] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
                   ` (13 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

All users gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    9 ---------
 arch/x86/kernel/cpu/common.c     |    6 ------
 2 files changed, 15 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -374,16 +374,7 @@ DECLARE_PER_CPU(unsigned long, cpu_curre
 #define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
 #endif
 
-/*
- * Save the original ist values for checking stack pointers during debugging
- */
-struct orig_ist {
-	unsigned long		ist[7];
-};
-
 #ifdef CONFIG_X86_64
-DECLARE_PER_CPU(struct orig_ist, orig_ist);
-
 union irq_stack_union {
 	char irq_stack[IRQ_STACK_SIZE];
 	/*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,12 +1549,6 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-/*
- * Copies of the original ist values from the tss are only accessed during
- * debugging, no special alignment required.
- */
-DEFINE_PER_CPU(struct orig_ist, orig_ist);
-
 static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 19/32] x86/exceptions: Disconnect IST index and stack order
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (17 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 18/32] x86/cpu: Remove orig_ist array Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:14   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 20/32] x86/exceptions: Enable IST guard pages Thomas Gleixner
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The entry order of the TSS.IST array and the order of the stack
storage/mapping are not required to be the same.

With the upcoming split of the debug stack this is going to fall apart as
the number of TSS.IST array entries stays the same while the actual stacks
are increasing.

Make them separate so that code like dumpstack can just utilize the mapping
order. The IST index is solely required for the actual TSS.IST array
initialization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2 -> V3: s/ISTACK_/ESTACK/
---
 arch/x86/entry/entry_64.S             |    2 +-
 arch/x86/include/asm/cpu_entry_area.h |   11 +++++++++++
 arch/x86/include/asm/page_64_types.h  |    9 ++++-----
 arch/x86/include/asm/stacktrace.h     |    2 ++
 arch/x86/kernel/cpu/common.c          |   10 +++++-----
 arch/x86/kernel/idt.c                 |    8 ++++----
 6 files changed, 27 insertions(+), 15 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=ESTACK_DB
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -35,6 +35,17 @@ struct cea_exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
+/*
+ * The exception stack ordering in [cea_]exception_stacks
+ */
+enum exception_stack_ordering {
+	ESTACK_DF,
+	ESTACK_NMI,
+	ESTACK_DB,
+	ESTACK_MCE,
+	N_EXCEPTION_STACKS
+};
+
 #define CEA_ESTACK_SIZE(st)					\
 	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
 
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -27,11 +27,10 @@
 /*
  * The index for the tss.ist[] array. The hardware limit is 7 entries.
  */
-#define	ESTACK_DF		0
-#define	ESTACK_NMI		1
-#define	ESTACK_DB		2
-#define	ESTACK_MCE		3
-#define	N_EXCEPTION_STACKS	4
+#define	IST_INDEX_DF		0
+#define	IST_INDEX_NMI		1
+#define	IST_INDEX_DB		2
+#define	IST_INDEX_MCE		3
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -9,6 +9,8 @@
 
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/switch_to.h>
 
 enum stack_type {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1731,11 +1731,11 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!t->x86_tss.ist[0]) {
-		t->x86_tss.ist[ESTACK_DF] = __this_cpu_ist_top_va(DF);
-		t->x86_tss.ist[ESTACK_NMI] = __this_cpu_ist_top_va(NMI);
-		t->x86_tss.ist[ESTACK_DB] = __this_cpu_ist_top_va(DB);
-		t->x86_tss.ist[ESTACK_MCE] = __this_cpu_ist_top_va(MCE);
-		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[ESTACK_DB];
+		t->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[IST_INDEX_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -183,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] _
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		ESTACK_DB),
-	ISTG(X86_TRAP_NMI,	nmi,		ESTACK_NMI),
-	ISTG(X86_TRAP_DF,	double_fault,	ESTACK_DF),
+	ISTG(X86_TRAP_DB,	debug,		IST_INDEX_DB),
+	ISTG(X86_TRAP_NMI,	nmi,		IST_INDEX_NMI),
+	ISTG(X86_TRAP_DF,	double_fault,	IST_INDEX_DF),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	ESTACK_MCE),
+	ISTG(X86_TRAP_MC,	&machine_check,	IST_INDEX_MCE),
 #endif
 };
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 20/32] x86/exceptions: Enable IST guard pages
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (18 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 19/32] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:15   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 21/32] x86/exceptions: Split debug IST stack Thomas Gleixner
                   ` (11 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

All usage sites which expected that the exception stacks in the CPU entry
area are mapped linearly are fixed up. Enable guard pages between the
IST stacks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -26,13 +26,9 @@ struct exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
-/*
- * The effective cpu entry area mapping with guard pages. Guard size is
- * zero until the code which makes assumptions about linear mappings is
- * cleaned up.
- */
+/* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(PAGE_SIZE)
 };
 
 /*



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 21/32] x86/exceptions: Split debug IST stack
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (19 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 20/32] x86/exceptions: Enable IST guard pages Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-16 22:07   ` Sean Christopherson
  2019-04-17 14:15   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 22/32] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
                   ` (10 subsequent siblings)
  31 siblings, 2 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.

The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.

Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:

      --- top of DB_stack	<- Initial stack
      --- end of DB_stack
      	  guard page

      --- top of DB1_stack	<- Top of stack after entering first #DB
      --- end of DB1_stack
      	  guard page

      --- top of DB2_stack	<- Top of stack after entering second #DB
      --- end of DB2_stack	   
      	  guard page

If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.

The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.

 - Adjust the low level entry code so it adjusts top of #DB with the offset
   between the stacks instead of exception stack size.

 - Make the dumpstack code aware of the new stacks.

 - Adjust the in_debug_stack() implementation and move it into the NMI code
   where it belongs. As this is NMI hotpath code, it just checks the full
   area between top of DB_stack and bottom of DB1_stack without checking
   for the guard page. That's correct because the NMI cannot hit a
   stackpointer pointing to the guard page between DB and DB1 stack.  Even
   if it would, then the NMI operation still is unaffected, but the resume
   of the debug exception on the topmost DB stack will crash by touching
   the guard page.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2 -> V3: Fix off by one in in_debug_stack()
---
 Documentation/x86/kernel-stacks       |    7 ++++++-
 arch/x86/entry/entry_64.S             |    8 ++++----
 arch/x86/include/asm/cpu_entry_area.h |   14 ++++++++++----
 arch/x86/include/asm/debugreg.h       |    2 --
 arch/x86/include/asm/page_64_types.h  |    3 ---
 arch/x86/kernel/asm-offsets_64.c      |    2 ++
 arch/x86/kernel/cpu/common.c          |   11 -----------
 arch/x86/kernel/dumpstack_64.c        |   12 ++++++++----
 arch/x86/kernel/nmi.c                 |   20 +++++++++++++++++++-
 arch/x86/mm/cpu_entry_area.c          |    4 +++-
 10 files changed, 52 insertions(+), 31 deletions(-)

--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* ESTACK_DB.  DEBUG_STKSZ
+* ESTACK_DB.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,6 +86,11 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
+  To handle nested #DB correctly there exist two instances of DB stacks. On
+  #DB entry the IST stackpointer for #DB is switched to the second instance
+  so a nested #DB starts from a clean stack. The nested #DB switches to
+  the IST stackpointer to a guard hole to catch triple nesting.
+
 * ESTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -879,7 +879,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work
  * @paranoid == 2 is special: the stub will never switch stacks.  This is for
  * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
  */
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 
@@ -925,13 +925,13 @@ ENTRY(\sym)
 	.endif
 
 	.if \shift_ist != -1
-	subq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
+	subq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
 	call	\do_sym
 
 	.if \shift_ist != -1
-	addq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
+	addq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
 	/* these procedures expect "no swapgs" flag in ebx */
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,25 +10,29 @@
 #ifdef CONFIG_X86_64
 
 /* Macro to enforce the same ordering and stack sizes */
-#define ESTACKS_MEMBERS(guardsize)		\
+#define ESTACKS_MEMBERS(guardsize, db2_holesize)\
 	char	DF_stack_guard[guardsize];	\
 	char	DF_stack[EXCEPTION_STKSZ];	\
 	char	NMI_stack_guard[guardsize];	\
 	char	NMI_stack[EXCEPTION_STKSZ];	\
+	char	DB2_stack_guard[guardsize];	\
+	char	DB2_stack[db2_holesize];	\
+	char	DB1_stack_guard[guardsize];	\
+	char	DB1_stack[EXCEPTION_STKSZ];	\
 	char	DB_stack_guard[guardsize];	\
-	char	DB_stack[DEBUG_STKSZ];		\
+	char	DB_stack[EXCEPTION_STKSZ];	\
 	char	MCE_stack_guard[guardsize];	\
 	char	MCE_stack[EXCEPTION_STKSZ];	\
 	char	IST_top_guard[guardsize];	\
 
 /* The exception stacks linear storage. No guard pages required */
 struct exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(0, 0)
 };
 
 /* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(PAGE_SIZE)
+	ESTACKS_MEMBERS(PAGE_SIZE, EXCEPTION_STKSZ)
 };
 
 /*
@@ -37,6 +41,8 @@ struct cea_exception_stacks {
 enum exception_stack_ordering {
 	ESTACK_DF,
 	ESTACK_NMI,
+	ESTACK_DB2,
+	ESTACK_DB1,
 	ESTACK_DB,
 	ESTACK_MCE,
 	N_EXCEPTION_STACKS
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -104,11 +104,9 @@ static inline void debug_stack_usage_dec
 {
 	__this_cpu_dec(debug_stack_usage);
 }
-int is_debug_stack(unsigned long addr);
 void debug_stack_set_zero(void);
 void debug_stack_reset(void);
 #else /* !X86_64 */
-static inline int is_debug_stack(unsigned long addr) { return 0; }
 static inline void debug_stack_set_zero(void) { }
 static inline void debug_stack_reset(void) { }
 static inline void debug_stack_usage_inc(void) { }
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -18,9 +18,6 @@
 #define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
 #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
 
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
-
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -68,6 +68,8 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
+	DEFINE(DB_STACK_OFFSET, offsetof(struct cea_exception_stacks, DB_stack) -
+	       offsetof(struct cea_exception_stacks, DB1_stack));
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,17 +1549,7 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
-
-int is_debug_stack(unsigned long addr)
-{
-	return __this_cpu_read(debug_stack_usage) ||
-		(addr <= __this_cpu_read(debug_stack_addr) &&
-		 addr > (__this_cpu_read(debug_stack_addr) - DEBUG_STKSZ));
-}
-NOKPROBE_SYMBOL(is_debug_stack);
-
 DEFINE_PER_CPU(u32, debug_idt_ctr);
 
 void debug_stack_set_zero(void)
@@ -1735,7 +1725,6 @@ void cpu_init(void)
 		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
 		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
 		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
-		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[IST_INDEX_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -19,16 +19,18 @@
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
-static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+static const char *exception_stack_names[] = {
 		[ ESTACK_DF	]	= "#DF",
 		[ ESTACK_NMI	]	= "NMI",
+		[ ESTACK_DB2	]	= "#DB2",
+		[ ESTACK_DB1	]	= "#DB1",
 		[ ESTACK_DB	]	= "#DB",
 		[ ESTACK_MCE	]	= "#MC",
 };
 
 const char *stack_type_name(enum stack_type type)
 {
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	if (type == STACK_TYPE_IRQ)
 		return "IRQ";
@@ -58,9 +60,11 @@ struct estack_layout {
 	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
 	}
 
-static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+static const struct estack_layout layout[] = {
 	[ ESTACK_DF	]	= ESTACK_ENTRY(DF),
 	[ ESTACK_NMI	]	= ESTACK_ENTRY(NMI),
+	[ ESTACK_DB2	]	= { .begin = 0, .end = 0},
+	[ ESTACK_DB1	]	= ESTACK_ENTRY(DB1),
 	[ ESTACK_DB	]	= ESTACK_ENTRY(DB),
 	[ ESTACK_MCE	]	= ESTACK_ENTRY(MCE),
 };
@@ -71,7 +75,7 @@ static bool in_exception_stack(unsigned
 	struct pt_regs *regs;
 	unsigned int k;
 
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
 
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -21,13 +21,14 @@
 #include <linux/ratelimit.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/atomic.h>
 #include <linux/sched/clock.h>
 
 #if defined(CONFIG_EDAC)
 #include <linux/edac.h>
 #endif
 
-#include <linux/atomic.h>
+#include <asm/cpu_entry_area.h>
 #include <asm/traps.h>
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
@@ -487,6 +488,23 @@ static DEFINE_PER_CPU(unsigned long, nmi
  * switch back to the original IDT.
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
+
+static bool notrace is_debug_stack(unsigned long addr)
+{
+	struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
+	unsigned long top = CEA_ESTACK_TOP(cs, DB);
+	unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
+
+	if (__this_cpu_read(debug_stack_usage))
+		return true;
+	/*
+	 * Note, this covers the guard page between DB and DB1 as well to
+	 * avoid two checks. But by all means @addr can never point into
+	 * the guard page.
+	 */
+	return addr >= bot && addr < top;
+}
+NOKPROBE_SYMBOL(is_debug_stack);
 #endif
 
 dotraplinkage notrace void
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -98,10 +98,12 @@ static void __init percpu_setup_exceptio
 
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
-	 * by guard pages so each stack must be mapped separately.
+	 * by guard pages so each stack must be mapped separately. DB2 is
+	 * not mapped; it just exists to catch triple nesting of #DB.
 	 */
 	cea_map_stack(DF);
 	cea_map_stack(NMI);
+	cea_map_stack(DB1);
 	cea_map_stack(DB);
 	cea_map_stack(MCE);
 }



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 22/32] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (20 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 21/32] x86/exceptions: Split debug IST stack Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:16   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 15:59 ` [patch V3 23/32] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
                   ` (9 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
V2: Simplify the macro maze
---
 arch/x86/kernel/dumpstack_64.c |   90 +++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 35 deletions(-)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -50,52 +50,72 @@ const char *stack_type_name(enum stack_t
 	return NULL;
 }
 
-struct estack_layout {
-	unsigned int	begin;
-	unsigned int	end;
+/**
+ * struct estack_pages - Page descriptor for exception stacks
+ * @offs:	Offset from the start of the exception stack area
+ * @size:	Size of the exception stack
+ * @type:	Type to store in the stack_info struct
+ */
+struct estack_pages {
+	u32	offs;
+	u16	size;
+	u16	type;
 };
 
-#define	ESTACK_ENTRY(x)	{						  \
-	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
-	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
-	}
-
-static const struct estack_layout layout[] = {
-	[ ESTACK_DF	]	= ESTACK_ENTRY(DF),
-	[ ESTACK_NMI	]	= ESTACK_ENTRY(NMI),
-	[ ESTACK_DB2	]	= { .begin = 0, .end = 0},
-	[ ESTACK_DB1	]	= ESTACK_ENTRY(DB1),
-	[ ESTACK_DB	]	= ESTACK_ENTRY(DB),
-	[ ESTACK_MCE	]	= ESTACK_ENTRY(MCE),
+#define EPAGERANGE(st)							\
+	[PFN_DOWN(CEA_ESTACK_OFFS(st)) ...				\
+	 PFN_DOWN(CEA_ESTACK_OFFS(st) + CEA_ESTACK_SIZE(st) - 1)] = {	\
+		.offs	= CEA_ESTACK_OFFS(st),				\
+		.size	= CEA_ESTACK_SIZE(st),				\
+		.type	= STACK_TYPE_EXCEPTION + ESTACK_ ##st, }
+
+/*
+ * Array of exception stack page descriptors. If the stack is larger than
+ * PAGE_SIZE, all pages covering a particular stack will have the same
+ * info. The guard pages including the not mapped DB2 stack are zeroed
+ * out.
+ */
+static const
+struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
+	EPAGERANGE(DF),
+	EPAGERANGE(NMI),
+	EPAGERANGE(DB1),
+	EPAGERANGE(DB),
+	EPAGERANGE(MCE),
 };
 
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long estacks, begin, end, stk = (unsigned long)stack;
+	unsigned long begin, end, stk = (unsigned long)stack;
+	const struct estack_pages *ep;
 	struct pt_regs *regs;
 	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
-	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
-
-	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		begin = estacks + layout[k].begin;
-		end   = estacks + layout[k].end;
-		regs  = (struct pt_regs *)end - 1;
-
-		if (stk < begin || stk >= end)
-			continue;
-
-		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= (unsigned long *)begin;
-		info->end	= (unsigned long *)end;
-		info->next_sp	= (unsigned long *)regs->sp;
-
-		return true;
-	}
-
-	return false;
+	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	end = begin + sizeof(struct cea_exception_stacks);
+	/* Bail if @stack is outside the exception stack area. */
+	if (stk < begin || stk >= end)
+		return false;
+
+	/* Calc page offset from start of exception stacks */
+	k = (stk - begin) >> PAGE_SHIFT;
+	/* Lookup the page descriptor */
+	ep = &estack_pages[k];
+	/* Guard page? */
+	if (!ep->size)
+		return false;
+
+	begin += (unsigned long)ep->offs;
+	end = begin + (unsigned long)ep->size;
+	regs = (struct pt_regs *)end - 1;
+
+	info->type	= ep->type;
+	info->begin	= (unsigned long *)begin;
+	info->end	= (unsigned long *)end;
+	info->next_sp	= (unsigned long *)regs->sp;
+	return true;
 }
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 23/32] x86/irq/32: Define IRQ_STACK_SIZE
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (21 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 22/32] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
@ 2019-04-14 15:59 ` Thomas Gleixner
  2019-04-17 14:17   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 24/32] x86/irq/32: Make irq stack a character array Thomas Gleixner
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 15:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

On 32-bit IRQ_STACK_SIZE is the same as THREAD_SIZE.

To allow sharing struct irq_stack with 32-bit, define IRQ_STACK_SIZE for
32-bit and use of for struct irq_stack.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/page_32_types.h |    2 ++
 arch/x86/include/asm/processor.h     |    4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,6 +22,8 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
+#define IRQ_STACK_SIZE		THREAD_SIZE
+
 #define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -422,8 +422,8 @@ DECLARE_PER_CPU_ALIGNED(struct stack_can
  * per-CPU IRQ handling stacks
  */
 struct irq_stack {
-	u32                     stack[THREAD_SIZE/sizeof(u32)];
-} __aligned(THREAD_SIZE);
+	u32			stack[IRQ_STACK_SIZE / sizeof(u32)];
+} __aligned(IRQ_STACK_SIZE);
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 24/32] x86/irq/32: Make irq stack a character array
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (22 preceding siblings ...)
  2019-04-14 15:59 ` [patch V3 23/32] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:18   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 25/32] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

There is no reason to have an u32 array in struct irq_stack. The only
purpose of the array is to size the struct properly.

Preparatory change for sharing struct irq_stack with 64-bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -422,7 +422,7 @@ DECLARE_PER_CPU_ALIGNED(struct stack_can
  * per-CPU IRQ handling stacks
  */
 struct irq_stack {
-	u32			stack[IRQ_STACK_SIZE / sizeof(u32)];
+	char			stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 25/32] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (23 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 24/32] x86/irq/32: Make irq stack a character array Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:18   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 26/32] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
                   ` (6 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

The percpu storage holds a pointer to the stack not the stack
itself. Rename it before sharing struct irq_stack with 64-bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    4 ++--
 arch/x86/kernel/dumpstack_32.c   |    4 ++--
 arch/x86/kernel/irq_32.c         |   19 ++++++++++---------
 3 files changed, 14 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -425,8 +425,8 @@ struct irq_stack {
 	char			stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
-DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
-DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
+DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
 extern unsigned int fpu_kernel_xstate_size;
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -34,7 +34,7 @@ const char *stack_type_name(enum stack_t
 
 static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
@@ -59,7 +59,7 @@ static bool in_hardirq_stack(unsigned lo
 
 static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -51,8 +51,8 @@ static inline int check_stack_overflow(v
 static inline void print_stack_overflow(void) { }
 #endif
 
-DEFINE_PER_CPU(struct irq_stack *, hardirq_stack);
-DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
+DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+DEFINE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 
 static void call_on_stack(void *func, void *stack)
 {
@@ -76,7 +76,7 @@ static inline int execute_on_irq_stack(i
 	u32 *isp, *prev_esp, arg1;
 
 	curstk = (struct irq_stack *) current_stack();
-	irqstk = __this_cpu_read(hardirq_stack);
+	irqstk = __this_cpu_read(hardirq_stack_ptr);
 
 	/*
 	 * this is where we switch to the IRQ stack. However, if we are
@@ -113,21 +113,22 @@ void irq_ctx_init(int cpu)
 {
 	struct irq_stack *irqstk;
 
-	if (per_cpu(hardirq_stack, cpu))
+	if (per_cpu(hardirq_stack_ptr, cpu))
 		return;
 
 	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
 					       THREADINFO_GFP,
 					       THREAD_SIZE_ORDER));
-	per_cpu(hardirq_stack, cpu) = irqstk;
+	per_cpu(hardirq_stack_ptr, cpu) = irqstk;
 
 	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
 					       THREADINFO_GFP,
 					       THREAD_SIZE_ORDER));
-	per_cpu(softirq_stack, cpu) = irqstk;
+	per_cpu(softirq_stack_ptr, cpu) = irqstk;
 
-	printk(KERN_DEBUG "CPU %u irqstacks, hard=%p soft=%p\n",
-	       cpu, per_cpu(hardirq_stack, cpu),  per_cpu(softirq_stack, cpu));
+	pr_debug("CPU %u irqstacks, hard=%p soft=%p\n",
+		 cpu, per_cpu(hardirq_stack_ptr, cpu),
+		 per_cpu(softirq_stack_ptr, cpu));
 }
 
 void do_softirq_own_stack(void)
@@ -135,7 +136,7 @@ void do_softirq_own_stack(void)
 	struct irq_stack *irqstk;
 	u32 *isp, *prev_esp;
 
-	irqstk = __this_cpu_read(softirq_stack);
+	irqstk = __this_cpu_read(softirq_stack_ptr);
 
 	/* build the stack frame on the softirq stack */
 	isp = (u32 *) ((char *)irqstk + sizeof(*irqstk));



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 26/32] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (24 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 25/32] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:19   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 27/32] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Preparatory patch to share code with 32bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S        |    2 +-
 arch/x86/include/asm/processor.h |    2 +-
 arch/x86/kernel/cpu/common.c     |    2 +-
 arch/x86/kernel/dumpstack_64.c   |    2 +-
 arch/x86/kernel/irq_64.c         |    2 +-
 arch/x86/kernel/setup_percpu.c   |    2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -431,7 +431,7 @@ END(irq_entries_start)
 	 */
 
 	movq	\old_rsp, PER_CPU_VAR(irq_stack_union + IRQ_STACK_SIZE - 8)
-	movq	PER_CPU_VAR(irq_stack_ptr), %rsp
+	movq	PER_CPU_VAR(hardirq_stack_ptr), %rsp
 
 #ifdef CONFIG_DEBUG_ENTRY
 	/*
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -396,7 +396,7 @@ static inline unsigned long cpu_kernelmo
 	return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
 }
 
-DECLARE_PER_CPU(char *, irq_stack_ptr);
+DECLARE_PER_CPU(char *, hardirq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1510,7 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, cur
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, irq_stack_ptr) =
+DEFINE_PER_CPU(char *, hardirq_stack_ptr) =
 	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -120,7 +120,7 @@ static bool in_exception_stack(unsigned
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
+	unsigned long *end   = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
 
 	/*
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -56,7 +56,7 @@ static inline void stack_overflow_check(
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_top = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_top = (u64)__this_cpu_read(hardirq_stack_ptr);
 	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
 	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -245,7 +245,7 @@ void __init setup_per_cpu_areas(void)
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
 #ifdef CONFIG_X86_64
-		per_cpu(irq_stack_ptr, cpu) =
+		per_cpu(hardirq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
 			IRQ_STACK_SIZE;
 #endif



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 27/32] x86/irq/32: Invoke irq_ctx_init() from init_IRQ()
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (25 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 26/32] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:20   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 28/32] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
                   ` (4 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson, Juergen Gross

irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
code. There is no reason to have this split. The interrupt stacks must be
allocated no matter what.

Invoke it from init_IRQ() before invoking the native or XEN init
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kernel/irqinit.c        |    4 ++--
 drivers/xen/events/events_base.c |    1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -91,6 +91,8 @@ void __init init_IRQ(void)
 	for (i = 0; i < nr_legacy_irqs(); i++)
 		per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
 
+	irq_ctx_init(smp_processor_id());
+
 	x86_init.irqs.intr_init();
 }
 
@@ -104,6 +106,4 @@ void __init native_init_IRQ(void)
 
 	if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs())
 		setup_irq(2, &irq2);
-
-	irq_ctx_init(smp_processor_id());
 }
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1687,7 +1687,6 @@ void __init xen_init_IRQ(void)
 
 #ifdef CONFIG_X86
 	if (xen_pv_domain()) {
-		irq_ctx_init(smp_processor_id());
 		if (xen_initial_domain())
 			pci_xen_initial_domain();
 	}



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 28/32] x86/irq/32: Handle irq stack allocation failure proper
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (26 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 27/32] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:20   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 29/32] x86/irq/64: Init hardirq_stack_ptr during CPU hotplug Thomas Gleixner
                   ` (3 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

irq_ctx_init() crashes hard on page allocation failures. While that's ok
during early boot, it's just wrong in the CPU hotplug bringup code.

Check the page allocation failure and return -ENOMEM and handle it at the
call sites. On early boot the only way out is to BUG(), but on CPU hotplug
there is no reason to crash, so just abort the operation.

Rename the function to something more sensible while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irq.h |    4 ++--
 arch/x86/include/asm/smp.h |    2 +-
 arch/x86/kernel/irq_32.c   |   34 +++++++++++++++++-----------------
 arch/x86/kernel/irqinit.c  |    2 +-
 arch/x86/kernel/smpboot.c  |   15 ++++++++++++---
 arch/x86/xen/smp_pv.c      |    4 +++-
 6 files changed, 36 insertions(+), 25 deletions(-)

--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -17,9 +17,9 @@ static inline int irq_canonicalize(int i
 }
 
 #ifdef CONFIG_X86_32
-extern void irq_ctx_init(int cpu);
+extern int irq_init_percpu_irqstack(unsigned int cpu);
 #else
-# define irq_ctx_init(cpu) do { } while (0)
+static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
 #endif
 
 #define __ARCH_HAS_DO_SOFTIRQ
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -131,7 +131,7 @@ void native_smp_prepare_boot_cpu(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void calculate_max_logical_packages(void);
 void native_smp_cpus_done(unsigned int max_cpus);
-void common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
+int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_disable(void);
 int common_cpu_die(unsigned int cpu);
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -107,28 +107,28 @@ static inline int execute_on_irq_stack(i
 }
 
 /*
- * allocate per-cpu stacks for hardirq and for softirq processing
+ * Allocate per-cpu stacks for hardirq and softirq processing
  */
-void irq_ctx_init(int cpu)
+int irq_init_percpu_irqstack(unsigned int cpu)
 {
-	struct irq_stack *irqstk;
+	int node = cpu_to_node(cpu);
+	struct page *ph, *ps;
 
 	if (per_cpu(hardirq_stack_ptr, cpu))
-		return;
+		return 0;
 
-	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
-					       THREADINFO_GFP,
-					       THREAD_SIZE_ORDER));
-	per_cpu(hardirq_stack_ptr, cpu) = irqstk;
-
-	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
-					       THREADINFO_GFP,
-					       THREAD_SIZE_ORDER));
-	per_cpu(softirq_stack_ptr, cpu) = irqstk;
-
-	pr_debug("CPU %u irqstacks, hard=%p soft=%p\n",
-		 cpu, per_cpu(hardirq_stack_ptr, cpu),
-		 per_cpu(softirq_stack_ptr, cpu));
+	ph = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
+	if (!ph)
+		return -ENOMEM;
+	ps = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
+	if (!ps) {
+		__free_pages(ph, THREAD_SIZE_ORDER);
+		return -ENOMEM;
+	}
+
+	per_cpu(hardirq_stack_ptr, cpu) = page_address(ph);
+	per_cpu(softirq_stack_ptr, cpu) = page_address(ps);
+	return 0;
 }
 
 void do_softirq_own_stack(void)
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -91,7 +91,7 @@ void __init init_IRQ(void)
 	for (i = 0; i < nr_legacy_irqs(); i++)
 		per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
 
-	irq_ctx_init(smp_processor_id());
+	BUG_ON(irq_init_percpu_irqstack(smp_processor_id()));
 
 	x86_init.irqs.intr_init();
 }
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -935,20 +935,27 @@ wakeup_cpu_via_init_nmi(int cpu, unsigne
 	return boot_error;
 }
 
-void common_cpu_up(unsigned int cpu, struct task_struct *idle)
+int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 {
+	int ret;
+
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
 
 	per_cpu(current_task, cpu) = idle;
 
+	/* Initialize the interrupt stack(s) */
+	ret = irq_init_percpu_irqstack(cpu);
+	if (ret)
+		return ret;
+
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
-	irq_ctx_init(cpu);
 	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #else
 	initial_gs = per_cpu_offset(cpu);
 #endif
+	return 0;
 }
 
 /*
@@ -1106,7 +1113,9 @@ int native_cpu_up(unsigned int cpu, stru
 	/* the FPU context is blank, nobody can own it */
 	per_cpu(fpu_fpregs_owner_ctx, cpu) = NULL;
 
-	common_cpu_up(cpu, tidle);
+	err = common_cpu_up(cpu, tidle);
+	if (err)
+		return err;
 
 	err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
 	if (err) {
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -361,7 +361,9 @@ static int xen_pv_cpu_up(unsigned int cp
 {
 	int rc;
 
-	common_cpu_up(cpu, idle);
+	rc = common_cpu_up(cpu, idle);
+	if (rc)
+		return rc;
 
 	xen_setup_runstate_info(cpu);
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 29/32] x86/irq/64: Init hardirq_stack_ptr during CPU hotplug
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (27 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 28/32] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:21   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 30/32] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

Preparatory change for distangling the irq stack union as a prerequisite
for irq stacks with guard pages.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irq.h   |    4 ----
 arch/x86/kernel/cpu/common.c |    4 +---
 arch/x86/kernel/irq_64.c     |   15 +++++++++++++++
 3 files changed, 16 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -16,11 +16,7 @@ static inline int irq_canonicalize(int i
 	return ((irq == 2) ? 9 : irq);
 }
 
-#ifdef CONFIG_X86_32
 extern int irq_init_percpu_irqstack(unsigned int cpu);
-#else
-static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
-#endif
 
 #define __ARCH_HAS_DO_SOFTIRQ
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1510,9 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, cur
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, hardirq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
-
+DEFINE_PER_CPU(char *, hardirq_stack_ptr);
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -87,3 +87,18 @@ bool handle_irq(struct irq_desc *desc, s
 	generic_handle_irq_desc(desc);
 	return true;
 }
+
+static int map_irq_stack(unsigned int cpu)
+{
+	void *va = per_cpu_ptr(irq_stack_union.irq_stack, cpu);
+
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	return 0;
+}
+
+int irq_init_percpu_irqstack(unsigned int cpu)
+{
+	if (per_cpu(hardirq_stack_ptr, cpu))
+		return 0;
+	return map_irq_stack(cpu);
+}



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 30/32] x86/irq/64: Split the IRQ stack into its own pages
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (28 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 29/32] x86/irq/64: Init hardirq_stack_ptr during CPU hotplug Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:22   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
  2019-04-14 16:00 ` [patch V3 31/32] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
  2019-04-14 16:00 ` [patch V3 32/32] x86/irq/64: Remove stack overflow debug code Thomas Gleixner
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

Currently the IRQ stack is hardcoded as the first page of the percpu area,
and the stack canary lives on the IRQ stack.  The former gets in the way of
adding an IRQ stack guard page, and the latter is a potential weakness in
the stack canary mechanism.

Split the IRQ stack into its own private percpu pages.

[ tglx: Make 64 and 32 bit share struct irq_stack ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S             |    4 ++--
 arch/x86/include/asm/processor.h      |   32 ++++++++++++++------------------
 arch/x86/include/asm/stackprotector.h |    6 +++---
 arch/x86/kernel/asm-offsets_64.c      |    2 +-
 arch/x86/kernel/cpu/common.c          |    8 ++++----
 arch/x86/kernel/head_64.S             |    2 +-
 arch/x86/kernel/irq_64.c              |    5 ++++-
 arch/x86/kernel/setup_percpu.c        |    5 -----
 arch/x86/kernel/vmlinux.lds.S         |    7 ++++---
 arch/x86/tools/relocs.c               |    2 +-
 arch/x86/xen/xen-head.S               |   10 +++++-----
 11 files changed, 39 insertions(+), 44 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -298,7 +298,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(fixed_percpu_data) + stack_canary_offset
 #endif
 
 #ifdef CONFIG_RETPOLINE
@@ -430,7 +430,7 @@ END(irq_entries_start)
 	 * it before we actually move ourselves to the IRQ stack.
 	 */
 
-	movq	\old_rsp, PER_CPU_VAR(irq_stack_union + IRQ_STACK_SIZE - 8)
+	movq	\old_rsp, PER_CPU_VAR(irq_stack_backing_store + IRQ_STACK_SIZE - 8)
 	movq	PER_CPU_VAR(hardirq_stack_ptr), %rsp
 
 #ifdef CONFIG_DEBUG_ENTRY
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -367,6 +367,13 @@ DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_
 #define __KERNEL_TSS_LIMIT	\
 	(IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
 
+/* Per CPU interrupt stacks */
+struct irq_stack {
+	char		stack[IRQ_STACK_SIZE];
+} __aligned(IRQ_STACK_SIZE);
+
+DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+
 #ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
 #else
@@ -375,28 +382,24 @@ DECLARE_PER_CPU(unsigned long, cpu_curre
 #endif
 
 #ifdef CONFIG_X86_64
-union irq_stack_union {
-	char irq_stack[IRQ_STACK_SIZE];
+struct fixed_percpu_data {
 	/*
 	 * GCC hardcodes the stack canary as %gs:40.  Since the
 	 * irq_stack is the object at %gs:0, we reserve the bottom
 	 * 48 bytes of the irq stack for the canary.
 	 */
-	struct {
-		char gs_base[40];
-		unsigned long stack_canary;
-	};
+	char		gs_base[40];
+	unsigned long	stack_canary;
 };
 
-DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
-DECLARE_INIT_PER_CPU(irq_stack_union);
+DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible;
+DECLARE_INIT_PER_CPU(fixed_percpu_data);
 
 static inline unsigned long cpu_kernelmode_gs_base(int cpu)
 {
-	return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
+	return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu);
 }
 
-DECLARE_PER_CPU(char *, hardirq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 
@@ -418,14 +421,7 @@ struct stack_canary {
 };
 DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
-/*
- * per-CPU IRQ handling stacks
- */
-struct irq_stack {
-	char			stack[IRQ_STACK_SIZE];
-} __aligned(IRQ_STACK_SIZE);
-
-DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+/* Per CPU softirq stack pointer */
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -13,7 +13,7 @@
  * On x86_64, %gs is shared by percpu area and stack canary.  All
  * percpu symbols are zero based and %gs points to the base of percpu
  * area.  The first occupant of the percpu area is always
- * irq_stack_union which contains stack_canary at offset 40.  Userland
+ * fixed_percpu_data which contains stack_canary at offset 40.  Userland
  * %gs is always saved and restored on kernel entry and exit using
  * swapgs, so stack protector doesn't add any complexity there.
  *
@@ -64,7 +64,7 @@ static __always_inline void boot_init_st
 	u64 tsc;
 
 #ifdef CONFIG_X86_64
-	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
+	BUILD_BUG_ON(offsetof(struct fixed_percpu_data, stack_canary) != 40);
 #endif
 	/*
 	 * We both use the random pool and the current TSC as a source
@@ -79,7 +79,7 @@ static __always_inline void boot_init_st
 
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
-	this_cpu_write(irq_stack_union.stack_canary, canary);
+	this_cpu_write(fixed_percpu_data.stack_canary, canary);
 #else
 	this_cpu_write(stack_canary.canary, canary);
 #endif
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -73,7 +73,7 @@ int main(void)
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
-	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+	DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_canary));
 	BLANK();
 #endif
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1498,9 +1498,9 @@ static __init int setup_clearcpuid(char
 __setup("clearcpuid=", setup_clearcpuid);
 
 #ifdef CONFIG_X86_64
-DEFINE_PER_CPU_FIRST(union irq_stack_union,
-		     irq_stack_union) __aligned(PAGE_SIZE) __visible;
-EXPORT_PER_CPU_SYMBOL_GPL(irq_stack_union);
+DEFINE_PER_CPU_FIRST(struct fixed_percpu_data,
+		     fixed_percpu_data) __aligned(PAGE_SIZE) __visible;
+EXPORT_PER_CPU_SYMBOL_GPL(fixed_percpu_data);
 
 /*
  * The following percpu variables are hot.  Align current_task to
@@ -1510,7 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, cur
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, hardirq_stack_ptr);
+DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -265,7 +265,7 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
-	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+	.quad	INIT_PER_CPU_VAR(fixed_percpu_data)
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -23,6 +23,9 @@
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
+DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
+DECLARE_INIT_PER_CPU(irq_stack_backing_store);
+
 int sysctl_panic_on_stackoverflow;
 
 /*
@@ -90,7 +93,7 @@ bool handle_irq(struct irq_desc *desc, s
 
 static int map_irq_stack(unsigned int cpu)
 {
-	void *va = per_cpu_ptr(irq_stack_union.irq_stack, cpu);
+	void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
 
 	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
 	return 0;
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -244,11 +244,6 @@ void __init setup_per_cpu_areas(void)
 		per_cpu(x86_cpu_to_logical_apicid, cpu) =
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
-#ifdef CONFIG_X86_64
-		per_cpu(hardirq_stack_ptr, cpu) =
-			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_STACK_SIZE;
-#endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
 			early_per_cpu_map(x86_cpu_to_node_map, cpu);
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -403,7 +403,8 @@ SECTIONS
  */
 #define INIT_PER_CPU(x) init_per_cpu__##x = ABSOLUTE(x) + __per_cpu_load
 INIT_PER_CPU(gdt_page);
-INIT_PER_CPU(irq_stack_union);
+INIT_PER_CPU(fixed_percpu_data);
+INIT_PER_CPU(irq_stack_backing_store);
 
 /*
  * Build-time check on the image size:
@@ -412,8 +413,8 @@ INIT_PER_CPU(irq_stack_union);
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
 #ifdef CONFIG_SMP
-. = ASSERT((irq_stack_union == 0),
-           "irq_stack_union is not at start of per-cpu area");
+. = ASSERT((fixed_percpu_data == 0),
+           "fixed_percpu_data is not at start of per-cpu area");
 #endif
 
 #endif /* CONFIG_X86_32 */
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -738,7 +738,7 @@ static void percpu_init(void)
  *	__per_cpu_load
  *
  * The "gold" linker incorrectly associates:
- *	init_per_cpu__irq_stack_union
+ *	init_per_cpu__fixed_percpu_data
  *	init_per_cpu__gdt_page
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -40,13 +40,13 @@ ENTRY(startup_xen)
 #ifdef CONFIG_X86_64
 	/* Set up %gs.
 	 *
-	 * The base of %gs always points to the bottom of the irqstack
-	 * union.  If the stack protector canary is enabled, it is
-	 * located at %gs:40.  Note that, on SMP, the boot cpu uses
-	 * init data section till per cpu areas are set up.
+	 * The base of %gs always points to fixed_percpu_data.  If the
+	 * stack protector canary is enabled, it is located at %gs:40.
+	 * Note that, on SMP, the boot cpu uses init data section until
+	 * the per cpu areas are set up.
 	 */
 	movl	$MSR_GS_BASE,%ecx
-	movq	$INIT_PER_CPU_VAR(irq_stack_union),%rax
+	movq	$INIT_PER_CPU_VAR(fixed_percpu_data),%rax
 	cdq
 	wrmsr
 #endif



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 31/32] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (29 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 30/32] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:22   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
  2019-04-14 16:00 ` [patch V3 32/32] x86/irq/64: Remove stack overflow debug code Thomas Gleixner
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

From: Andy Lutomirski <luto@kernel.org>

The IRQ stack lives in percpu space, so an IRQ handler that overflows it
will overwrite other data structures.

Use vmap() to remap the IRQ stack so that it will have the usual guard
pages that vmap/vmalloc allocations have. With this the kernel will panic
immediately on an IRQ stack overflow.

[ tglx: Move the map code to a proper place and invoke it only when a CPU
  	is about to be brought online. No point in installing the map at
  	early boot for all possible CPUs. Fail the CPU bringup if the vmap
  	fails as done for all other preparatory stages in cpu hotplug. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq_64.c |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -91,6 +91,35 @@ bool handle_irq(struct irq_desc *desc, s
 	return true;
 }
 
+#ifdef CONFIG_VMAP_STACK
+/*
+ * VMAP the backing store with guard pages
+ */
+static int map_irq_stack(unsigned int cpu)
+{
+	char *stack = (char *)per_cpu_ptr(&irq_stack_backing_store, cpu);
+	struct page *pages[IRQ_STACK_SIZE / PAGE_SIZE];
+	void *va;
+	int i;
+
+	for (i = 0; i < IRQ_STACK_SIZE / PAGE_SIZE; i++) {
+		phys_addr_t pa = per_cpu_ptr_to_phys(stack + (i << PAGE_SHIFT));
+
+		pages[i] = pfn_to_page(pa >> PAGE_SHIFT);
+	}
+
+	va = vmap(pages, IRQ_STACK_SIZE / PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+	if (!va)
+		return -ENOMEM;
+
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	return 0;
+}
+#else
+/*
+ * If VMAP stacks are disabled due to KASAN, just use the per cpu
+ * backing store without guard pages.
+ */
 static int map_irq_stack(unsigned int cpu)
 {
 	void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
@@ -98,6 +127,7 @@ static int map_irq_stack(unsigned int cp
 	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
 	return 0;
 }
+#endif
 
 int irq_init_percpu_irqstack(unsigned int cpu)
 {



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V3 32/32] x86/irq/64: Remove stack overflow debug code
  2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
                   ` (30 preceding siblings ...)
  2019-04-14 16:00 ` [patch V3 31/32] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
@ 2019-04-14 16:00 ` Thomas Gleixner
  2019-04-17 14:23   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  31 siblings, 1 reply; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Sean Christopherson

All stack types on x86 64-bit have guard pages now.

So there is no point in executing probabilistic overflow checks as the
guard pages are a accurate and reliable overflow prevention.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig         |    2 -
 arch/x86/kernel/irq_64.c |   56 -----------------------------------------------
 2 files changed, 1 insertion(+), 57 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -14,6 +14,7 @@ config X86_32
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select CLKSRC_I8253
 	select CLONE_BACKWARDS
+	select HAVE_DEBUG_STACKOVERFLOW
 	select MODULES_USE_ELF_REL
 	select OLD_SIGACTION
 
@@ -138,7 +139,6 @@ config X86
 	select HAVE_COPY_THREAD_TLS
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_DEBUG_KMEMLEAK
-	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -26,64 +26,8 @@
 DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
 DECLARE_INIT_PER_CPU(irq_stack_backing_store);
 
-int sysctl_panic_on_stackoverflow;
-
-/*
- * Probabilistic stack overflow check:
- *
- * Regular device interrupts can enter on the following stacks:
- *
- * - User stack
- *
- * - Kernel task stack
- *
- * - Interrupt stack if a device driver reenables interrupts
- *   which should only happen in really old drivers.
- *
- * - Debug IST stack
- *
- * All other contexts are invalid.
- */
-static inline void stack_overflow_check(struct pt_regs *regs)
-{
-#ifdef CONFIG_DEBUG_STACKOVERFLOW
-#define STACK_MARGIN	128
-	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
-	u64 curbase = (u64)task_stack_page(current);
-	struct cea_exception_stacks *estacks;
-
-	if (user_mode(regs))
-		return;
-
-	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_MARGIN &&
-	    regs->sp <= curbase + THREAD_SIZE)
-		return;
-
-	irq_stack_top = (u64)__this_cpu_read(hardirq_stack_ptr);
-	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
-	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
-		return;
-
-	estacks = __this_cpu_read(cea_exception_stacks);
-	estack_top = CEA_ESTACK_TOP(estacks, DB);
-	estack_bottom = CEA_ESTACK_BOT(estacks, DB) + STACK_MARGIN;
-	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
-		return;
-
-	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx, irq stack:%Lx-%Lx, exception stack: %Lx-%Lx, ip:%pF)\n",
-		current->comm, curbase, regs->sp,
-		irq_stack_bottom, irq_stack_top,
-		estack_bottom, estack_top, (void *)regs->ip);
-
-	if (sysctl_panic_on_stackoverflow)
-		panic("low stack detected by irq handler - check messages\n");
-#endif
-}
-
 bool handle_irq(struct irq_desc *desc, struct pt_regs *regs)
 {
-	stack_overflow_check(regs);
-
 	if (IS_ERR_OR_NULL(desc))
 		return false;
 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 01/32] mm/slab: Fix broken stack trace storage
  2019-04-14 15:59 ` [patch V3 01/32] mm/slab: Fix broken stack trace storage Thomas Gleixner
@ 2019-04-14 16:16     ` Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: Andy Lutomirski @ 2019-04-14 16:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Andy Lutomirski, Josh Poimboeuf,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Sun, Apr 14, 2019 at 9:02 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> kstack_end() is broken on interrupt stacks as they are not guaranteed to be
> sized THREAD_SIZE and THREAD_SIZE aligned.
>
> Use the stack tracer instead. Remove the pointless pointer increment at the
> end of the function while at it.
>
> Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: linux-mm@kvack.org
> ---
>  mm/slab.c |   28 ++++++++++++----------------
>  1 file changed, 12 insertions(+), 16 deletions(-)
>
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1470,33 +1470,29 @@ static bool is_debug_pagealloc_cache(str
>  static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
>                             unsigned long caller)
>  {
> -       int size = cachep->object_size;
> +       int size = cachep->object_size / sizeof(unsigned long);
>
>         addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
>
> -       if (size < 5 * sizeof(unsigned long))
> +       if (size < 5)
>                 return;
>
>         *addr++ = 0x12345678;
>         *addr++ = caller;
>         *addr++ = smp_processor_id();
> -       size -= 3 * sizeof(unsigned long);
> +#ifdef CONFIG_STACKTRACE
>         {
> -               unsigned long *sptr = &caller;
> -               unsigned long svalue;
> -
> -               while (!kstack_end(sptr)) {
> -                       svalue = *sptr++;
> -                       if (kernel_text_address(svalue)) {
> -                               *addr++ = svalue;
> -                               size -= sizeof(unsigned long);
> -                               if (size <= sizeof(unsigned long))
> -                                       break;
> -                       }
> -               }
> +               struct stack_trace trace = {
> +                       .max_entries    = size - 4;
> +                       .entries        = addr;
> +                       .skip           = 3;
> +               };

This looks correct, but I think that it would have been clearer if you
left the size -= 3 above.  You're still incrementing addr, but you're
not decrementing size, so they're out of sync and the resulting code
is hard to follow.

--Andy

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-14 16:16     ` Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: Andy Lutomirski @ 2019-04-14 16:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Andy Lutomirski, Josh Poimboeuf,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Sun, Apr 14, 2019 at 9:02 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> kstack_end() is broken on interrupt stacks as they are not guaranteed to be
> sized THREAD_SIZE and THREAD_SIZE aligned.
>
> Use the stack tracer instead. Remove the pointless pointer increment at the
> end of the function while at it.
>
> Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: linux-mm@kvack.org
> ---
>  mm/slab.c |   28 ++++++++++++----------------
>  1 file changed, 12 insertions(+), 16 deletions(-)
>
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1470,33 +1470,29 @@ static bool is_debug_pagealloc_cache(str
>  static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
>                             unsigned long caller)
>  {
> -       int size = cachep->object_size;
> +       int size = cachep->object_size / sizeof(unsigned long);
>
>         addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
>
> -       if (size < 5 * sizeof(unsigned long))
> +       if (size < 5)
>                 return;
>
>         *addr++ = 0x12345678;
>         *addr++ = caller;
>         *addr++ = smp_processor_id();
> -       size -= 3 * sizeof(unsigned long);
> +#ifdef CONFIG_STACKTRACE
>         {
> -               unsigned long *sptr = &caller;
> -               unsigned long svalue;
> -
> -               while (!kstack_end(sptr)) {
> -                       svalue = *sptr++;
> -                       if (kernel_text_address(svalue)) {
> -                               *addr++ = svalue;
> -                               size -= sizeof(unsigned long);
> -                               if (size <= sizeof(unsigned long))
> -                                       break;
> -                       }
> -               }
> +               struct stack_trace trace = {
> +                       .max_entries    = size - 4;
> +                       .entries        = addr;
> +                       .skip           = 3;
> +               };

This looks correct, but I think that it would have been clearer if you
left the size -= 3 above.  You're still incrementing addr, but you're
not decrementing size, so they're out of sync and the resulting code
is hard to follow.

--Andy


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 01/32] mm/slab: Fix broken stack trace storage
  2019-04-14 16:16     ` Andy Lutomirski
@ 2019-04-14 16:34       ` Thomas Gleixner
  -1 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM

On Sun, 14 Apr 2019, Andy Lutomirski wrote:
> > +               struct stack_trace trace = {
> > +                       .max_entries    = size - 4;
> > +                       .entries        = addr;
> > +                       .skip           = 3;
> > +               };
> 
> This looks correct, but I think that it would have been clearer if you
> left the size -= 3 above.  You're still incrementing addr, but you're
> not decrementing size, so they're out of sync and the resulting code
> is hard to follow.

What about the below?

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1480,10 +1480,12 @@ static void store_stackinfo(struct kmem_
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
+	size -= 3;
 #ifdef CONFIG_STACKTRACE
 	{
 		struct stack_trace trace = {
-			.max_entries	= size - 4;
+			/* Leave one for the end marker below */
+			.max_entries	= size - 1;
 			.entries	= addr;
 			.skip		= 3;
 		};

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-14 16:34       ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-14 16:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM

On Sun, 14 Apr 2019, Andy Lutomirski wrote:
> > +               struct stack_trace trace = {
> > +                       .max_entries    = size - 4;
> > +                       .entries        = addr;
> > +                       .skip           = 3;
> > +               };
> 
> This looks correct, but I think that it would have been clearer if you
> left the size -= 3 above.  You're still incrementing addr, but you're
> not decrementing size, so they're out of sync and the resulting code
> is hard to follow.

What about the below?

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1480,10 +1480,12 @@ static void store_stackinfo(struct kmem_
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
+	size -= 3;
 #ifdef CONFIG_STACKTRACE
 	{
 		struct stack_trace trace = {
-			.max_entries	= size - 4;
+			/* Leave one for the end marker below */
+			.max_entries	= size - 1;
 			.entries	= addr;
 			.skip		= 3;
 		};


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-14 16:34       ` Thomas Gleixner
@ 2019-04-15  9:02         ` Thomas Gleixner
  -1 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15  9:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM

kstack_end() is broken on interrupt stacks as they are not guaranteed to be
sized THREAD_SIZE and THREAD_SIZE aligned.

Use the stack tracer instead. Remove the pointless pointer increment at the
end of the function while at it.

Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
---
V4: Made the code simpler to understand (Andy) and make it actually compile
---
 mm/slab.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1470,33 +1470,31 @@ static bool is_debug_pagealloc_cache(str
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 			    unsigned long caller)
 {
-	int size = cachep->object_size;
+	int size = cachep->object_size / sizeof(unsigned long);
 
 	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
 
-	if (size < 5 * sizeof(unsigned long))
+	if (size < 5)
 		return;
 
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
-	size -= 3 * sizeof(unsigned long);
+	size -= 3;
+#ifdef CONFIG_STACKTRACE
 	{
-		unsigned long *sptr = &caller;
-		unsigned long svalue;
-
-		while (!kstack_end(sptr)) {
-			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
-				*addr++ = svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
-		}
+		struct stack_trace trace = {
+			/* Leave one for the end marker below */
+			.max_entries	= size - 1,
+			.entries	= addr,
+			.skip		= 3,
+		};
 
+		save_stack_trace(&trace);
+		addr += trace.nr_entries;
 	}
-	*addr++ = 0x87654321;
+#endif
+	*addr = 0x87654321;
 }
 
 static void slab_kernel_map(struct kmem_cache *cachep, void *objp,

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V4 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-15  9:02         ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15  9:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Josh Poimboeuf, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM

kstack_end() is broken on interrupt stacks as they are not guaranteed to be
sized THREAD_SIZE and THREAD_SIZE aligned.

Use the stack tracer instead. Remove the pointless pointer increment at the
end of the function while at it.

Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
---
V4: Made the code simpler to understand (Andy) and make it actually compile
---
 mm/slab.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1470,33 +1470,31 @@ static bool is_debug_pagealloc_cache(str
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 			    unsigned long caller)
 {
-	int size = cachep->object_size;
+	int size = cachep->object_size / sizeof(unsigned long);
 
 	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
 
-	if (size < 5 * sizeof(unsigned long))
+	if (size < 5)
 		return;
 
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
-	size -= 3 * sizeof(unsigned long);
+	size -= 3;
+#ifdef CONFIG_STACKTRACE
 	{
-		unsigned long *sptr = &caller;
-		unsigned long svalue;
-
-		while (!kstack_end(sptr)) {
-			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
-				*addr++ = svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
-		}
+		struct stack_trace trace = {
+			/* Leave one for the end marker below */
+			.max_entries	= size - 1,
+			.entries	= addr,
+			.skip		= 3,
+		};
 
+		save_stack_trace(&trace);
+		addr += trace.nr_entries;
 	}
-	*addr++ = 0x87654321;
+#endif
+	*addr = 0x87654321;
 }
 
 static void slab_kernel_map(struct kmem_cache *cachep, void *objp,


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15  9:02         ` Thomas Gleixner
  (?)
@ 2019-04-15 13:23         ` Josh Poimboeuf
  2019-04-15 16:07             ` Thomas Gleixner
  -1 siblings, 1 reply; 89+ messages in thread
From: Josh Poimboeuf @ 2019-04-15 13:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Sean Christopherson,
	Andrew Morton, Pekka Enberg, Linux-MM

On Mon, Apr 15, 2019 at 11:02:58AM +0200, Thomas Gleixner wrote:
> kstack_end() is broken on interrupt stacks as they are not guaranteed to be
> sized THREAD_SIZE and THREAD_SIZE aligned.
> 
> Use the stack tracer instead. Remove the pointless pointer increment at the
> end of the function while at it.
> 
> Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: linux-mm@kvack.org
> ---
> V4: Made the code simpler to understand (Andy) and make it actually compile
> ---
>  mm/slab.c |   30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1470,33 +1470,31 @@ static bool is_debug_pagealloc_cache(str
>  static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
>  			    unsigned long caller)
>  {
> -	int size = cachep->object_size;
> +	int size = cachep->object_size / sizeof(unsigned long);
>  
>  	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
>  
> -	if (size < 5 * sizeof(unsigned long))
> +	if (size < 5)
>  		return;
>  
>  	*addr++ = 0x12345678;
>  	*addr++ = caller;
>  	*addr++ = smp_processor_id();
> -	size -= 3 * sizeof(unsigned long);
> +	size -= 3;
> +#ifdef CONFIG_STACKTRACE
>  	{
> -		unsigned long *sptr = &caller;
> -		unsigned long svalue;
> -
> -		while (!kstack_end(sptr)) {
> -			svalue = *sptr++;
> -			if (kernel_text_address(svalue)) {
> -				*addr++ = svalue;
> -				size -= sizeof(unsigned long);
> -				if (size <= sizeof(unsigned long))
> -					break;
> -			}
> -		}
> +		struct stack_trace trace = {
> +			/* Leave one for the end marker below */
> +			.max_entries	= size - 1,
> +			.entries	= addr,
> +			.skip		= 3,
> +		};
>  
> +		save_stack_trace(&trace);
> +		addr += trace.nr_entries;
>  	}
> -	*addr++ = 0x87654321;
> +#endif
> +	*addr = 0x87654321;

Looks like stack_trace.nr_entries isn't initialized?  (though this code
gets eventually replaced by a later patch)

Who actually reads this stack trace?  I couldn't find a consumer.

-- 
Josh

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 13:23         ` Josh Poimboeuf
@ 2019-04-15 16:07             ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15 16:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, LKML, X86 ML, Sean Christopherson,
	Andrew Morton, Pekka Enberg, Linux-MM

On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> On Mon, Apr 15, 2019 at 11:02:58AM +0200, Thomas Gleixner wrote:
> >  	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
> >  
> > -	if (size < 5 * sizeof(unsigned long))
> > +	if (size < 5)
> >  		return;
> >  
> >  	*addr++ = 0x12345678;
> >  	*addr++ = caller;
> >  	*addr++ = smp_processor_id();
> > -	size -= 3 * sizeof(unsigned long);
> > +	size -= 3;
> > +#ifdef CONFIG_STACKTRACE
> >  	{
> > -		unsigned long *sptr = &caller;
> > -		unsigned long svalue;
> > -
> > -		while (!kstack_end(sptr)) {
> > -			svalue = *sptr++;
> > -			if (kernel_text_address(svalue)) {
> > -				*addr++ = svalue;
> > -				size -= sizeof(unsigned long);
> > -				if (size <= sizeof(unsigned long))
> > -					break;
> > -			}
> > -		}
> > +		struct stack_trace trace = {
> > +			/* Leave one for the end marker below */
> > +			.max_entries	= size - 1,
> > +			.entries	= addr,
> > +			.skip		= 3,
> > +		};
> >  
> > +		save_stack_trace(&trace);
> > +		addr += trace.nr_entries;
> >  	}
> > -	*addr++ = 0x87654321;
> > +#endif
> > +	*addr = 0x87654321;
> 
> Looks like stack_trace.nr_entries isn't initialized?  (though this code
> gets eventually replaced by a later patch)

struct initializer initialized the non mentioned fields to 0, if I'm not
totally mistaken.

> Who actually reads this stack trace?  I couldn't find a consumer.

It's stored directly in the memory pointed to by @addr and that's the freed
cache memory. If that is used later (UAF) then the stack trace can be
printed to see where it was freed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-15 16:07             ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15 16:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, LKML, X86 ML, Sean Christopherson,
	Andrew Morton, Pekka Enberg, Linux-MM

On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> On Mon, Apr 15, 2019 at 11:02:58AM +0200, Thomas Gleixner wrote:
> >  	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
> >  
> > -	if (size < 5 * sizeof(unsigned long))
> > +	if (size < 5)
> >  		return;
> >  
> >  	*addr++ = 0x12345678;
> >  	*addr++ = caller;
> >  	*addr++ = smp_processor_id();
> > -	size -= 3 * sizeof(unsigned long);
> > +	size -= 3;
> > +#ifdef CONFIG_STACKTRACE
> >  	{
> > -		unsigned long *sptr = &caller;
> > -		unsigned long svalue;
> > -
> > -		while (!kstack_end(sptr)) {
> > -			svalue = *sptr++;
> > -			if (kernel_text_address(svalue)) {
> > -				*addr++ = svalue;
> > -				size -= sizeof(unsigned long);
> > -				if (size <= sizeof(unsigned long))
> > -					break;
> > -			}
> > -		}
> > +		struct stack_trace trace = {
> > +			/* Leave one for the end marker below */
> > +			.max_entries	= size - 1,
> > +			.entries	= addr,
> > +			.skip		= 3,
> > +		};
> >  
> > +		save_stack_trace(&trace);
> > +		addr += trace.nr_entries;
> >  	}
> > -	*addr++ = 0x87654321;
> > +#endif
> > +	*addr = 0x87654321;
> 
> Looks like stack_trace.nr_entries isn't initialized?  (though this code
> gets eventually replaced by a later patch)

struct initializer initialized the non mentioned fields to 0, if I'm not
totally mistaken.

> Who actually reads this stack trace?  I couldn't find a consumer.

It's stored directly in the memory pointed to by @addr and that's the freed
cache memory. If that is used later (UAF) then the stack trace can be
printed to see where it was freed.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 16:07             ` Thomas Gleixner
  (?)
@ 2019-04-15 16:16             ` Josh Poimboeuf
  2019-04-15 17:05                 ` Andy Lutomirski
  2019-04-15 21:20                 ` Thomas Gleixner
  -1 siblings, 2 replies; 89+ messages in thread
From: Josh Poimboeuf @ 2019-04-15 16:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Sean Christopherson,
	Andrew Morton, Pekka Enberg, Linux-MM

On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> > On Mon, Apr 15, 2019 at 11:02:58AM +0200, Thomas Gleixner wrote:
> > >  	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
> > >  
> > > -	if (size < 5 * sizeof(unsigned long))
> > > +	if (size < 5)
> > >  		return;
> > >  
> > >  	*addr++ = 0x12345678;
> > >  	*addr++ = caller;
> > >  	*addr++ = smp_processor_id();
> > > -	size -= 3 * sizeof(unsigned long);
> > > +	size -= 3;
> > > +#ifdef CONFIG_STACKTRACE
> > >  	{
> > > -		unsigned long *sptr = &caller;
> > > -		unsigned long svalue;
> > > -
> > > -		while (!kstack_end(sptr)) {
> > > -			svalue = *sptr++;
> > > -			if (kernel_text_address(svalue)) {
> > > -				*addr++ = svalue;
> > > -				size -= sizeof(unsigned long);
> > > -				if (size <= sizeof(unsigned long))
> > > -					break;
> > > -			}
> > > -		}
> > > +		struct stack_trace trace = {
> > > +			/* Leave one for the end marker below */
> > > +			.max_entries	= size - 1,
> > > +			.entries	= addr,
> > > +			.skip		= 3,
> > > +		};
> > >  
> > > +		save_stack_trace(&trace);
> > > +		addr += trace.nr_entries;
> > >  	}
> > > -	*addr++ = 0x87654321;
> > > +#endif
> > > +	*addr = 0x87654321;
> > 
> > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > gets eventually replaced by a later patch)
> 
> struct initializer initialized the non mentioned fields to 0, if I'm not
> totally mistaken.

Hm, it seems you are correct.  And I thought I knew C.

> > Who actually reads this stack trace?  I couldn't find a consumer.
> 
> It's stored directly in the memory pointed to by @addr and that's the freed
> cache memory. If that is used later (UAF) then the stack trace can be
> printed to see where it was freed.

Right... but who reads it?

-- 
Josh

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 16:07             ` Thomas Gleixner
  (?)
  (?)
@ 2019-04-15 16:21             ` Peter Zijlstra
  -1 siblings, 0 replies; 89+ messages in thread
From: Peter Zijlstra @ 2019-04-15 16:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josh Poimboeuf, Andy Lutomirski, LKML, X86 ML,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> > > +		struct stack_trace trace = {
> > > +			/* Leave one for the end marker below */
> > > +			.max_entries	= size - 1,
> > > +			.entries	= addr,
> > > +			.skip		= 3,
> > > +		};

> > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > gets eventually replaced by a later patch)
> 
> struct initializer initialized the non mentioned fields to 0, if I'm not
> totally mistaken.

Correct.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 01/32] mm/slab: Fix broken stack trace storage
  2019-04-14 16:34       ` Thomas Gleixner
@ 2019-04-15 16:58         ` Andy Lutomirski
  -1 siblings, 0 replies; 89+ messages in thread
From: Andy Lutomirski @ 2019-04-15 16:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Sun, Apr 14, 2019 at 9:34 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, 14 Apr 2019, Andy Lutomirski wrote:
> > > +               struct stack_trace trace = {
> > > +                       .max_entries    = size - 4;
> > > +                       .entries        = addr;
> > > +                       .skip           = 3;
> > > +               };
> >
> > This looks correct, but I think that it would have been clearer if you
> > left the size -= 3 above.  You're still incrementing addr, but you're
> > not decrementing size, so they're out of sync and the resulting code
> > is hard to follow.
>
> What about the below?
>
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1480,10 +1480,12 @@ static void store_stackinfo(struct kmem_
>         *addr++ = 0x12345678;
>         *addr++ = caller;
>         *addr++ = smp_processor_id();
> +       size -= 3;
>  #ifdef CONFIG_STACKTRACE
>         {
>                 struct stack_trace trace = {
> -                       .max_entries    = size - 4;
> +                       /* Leave one for the end marker below */
> +                       .max_entries    = size - 1;
>                         .entries        = addr;
>                         .skip           = 3;
>                 };

Looks good to me.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-15 16:58         ` Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: Andy Lutomirski @ 2019-04-15 16:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Josh Poimboeuf,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Sun, Apr 14, 2019 at 9:34 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, 14 Apr 2019, Andy Lutomirski wrote:
> > > +               struct stack_trace trace = {
> > > +                       .max_entries    = size - 4;
> > > +                       .entries        = addr;
> > > +                       .skip           = 3;
> > > +               };
> >
> > This looks correct, but I think that it would have been clearer if you
> > left the size -= 3 above.  You're still incrementing addr, but you're
> > not decrementing size, so they're out of sync and the resulting code
> > is hard to follow.
>
> What about the below?
>
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1480,10 +1480,12 @@ static void store_stackinfo(struct kmem_
>         *addr++ = 0x12345678;
>         *addr++ = caller;
>         *addr++ = smp_processor_id();
> +       size -= 3;
>  #ifdef CONFIG_STACKTRACE
>         {
>                 struct stack_trace trace = {
> -                       .max_entries    = size - 4;
> +                       /* Leave one for the end marker below */
> +                       .max_entries    = size - 1;
>                         .entries        = addr;
>                         .skip           = 3;
>                 };

Looks good to me.


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 16:16             ` Josh Poimboeuf
@ 2019-04-15 17:05                 ` Andy Lutomirski
  2019-04-15 21:20                 ` Thomas Gleixner
  1 sibling, 0 replies; 89+ messages in thread
From: Andy Lutomirski @ 2019-04-15 17:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Andy Lutomirski, LKML, X86 ML,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Mon, Apr 15, 2019 at 9:17 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> > On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> > > On Mon, Apr 15, 2019 at 11:02:58AM +0200, Thomas Gleixner wrote:
> > > >   addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
> > > >
> > > > - if (size < 5 * sizeof(unsigned long))
> > > > + if (size < 5)
> > > >           return;
> > > >
> > > >   *addr++ = 0x12345678;
> > > >   *addr++ = caller;
> > > >   *addr++ = smp_processor_id();
> > > > - size -= 3 * sizeof(unsigned long);
> > > > + size -= 3;
> > > > +#ifdef CONFIG_STACKTRACE
> > > >   {
> > > > -         unsigned long *sptr = &caller;
> > > > -         unsigned long svalue;
> > > > -
> > > > -         while (!kstack_end(sptr)) {
> > > > -                 svalue = *sptr++;
> > > > -                 if (kernel_text_address(svalue)) {
> > > > -                         *addr++ = svalue;
> > > > -                         size -= sizeof(unsigned long);
> > > > -                         if (size <= sizeof(unsigned long))
> > > > -                                 break;
> > > > -                 }
> > > > -         }
> > > > +         struct stack_trace trace = {
> > > > +                 /* Leave one for the end marker below */
> > > > +                 .max_entries    = size - 1,
> > > > +                 .entries        = addr,
> > > > +                 .skip           = 3,
> > > > +         };
> > > >
> > > > +         save_stack_trace(&trace);
> > > > +         addr += trace.nr_entries;
> > > >   }
> > > > - *addr++ = 0x87654321;
> > > > +#endif
> > > > + *addr = 0x87654321;
> > >
> > > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > > gets eventually replaced by a later patch)
> >
> > struct initializer initialized the non mentioned fields to 0, if I'm not
> > totally mistaken.
>
> Hm, it seems you are correct.  And I thought I knew C.
>
> > > Who actually reads this stack trace?  I couldn't find a consumer.
> >
> > It's stored directly in the memory pointed to by @addr and that's the freed
> > cache memory. If that is used later (UAF) then the stack trace can be
> > printed to see where it was freed.
>
> Right... but who reads it?

That seems like a reasonable question.  After some grepping and some
git searching, it looks like there might not be any users.  I found
SLAB_STORE_USER, but that seems to be independent.

So maybe the whole mess should just be deleted.  If anyone ever
notices, they can re-add it better.

--Andy

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-15 17:05                 ` Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: Andy Lutomirski @ 2019-04-15 17:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Andy Lutomirski, LKML, X86 ML,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM

On Mon, Apr 15, 2019 at 9:17 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> > On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> > > On Mon, Apr 15, 2019 at 11:02:58AM +0200, Thomas Gleixner wrote:
> > > >   addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
> > > >
> > > > - if (size < 5 * sizeof(unsigned long))
> > > > + if (size < 5)
> > > >           return;
> > > >
> > > >   *addr++ = 0x12345678;
> > > >   *addr++ = caller;
> > > >   *addr++ = smp_processor_id();
> > > > - size -= 3 * sizeof(unsigned long);
> > > > + size -= 3;
> > > > +#ifdef CONFIG_STACKTRACE
> > > >   {
> > > > -         unsigned long *sptr = &caller;
> > > > -         unsigned long svalue;
> > > > -
> > > > -         while (!kstack_end(sptr)) {
> > > > -                 svalue = *sptr++;
> > > > -                 if (kernel_text_address(svalue)) {
> > > > -                         *addr++ = svalue;
> > > > -                         size -= sizeof(unsigned long);
> > > > -                         if (size <= sizeof(unsigned long))
> > > > -                                 break;
> > > > -                 }
> > > > -         }
> > > > +         struct stack_trace trace = {
> > > > +                 /* Leave one for the end marker below */
> > > > +                 .max_entries    = size - 1,
> > > > +                 .entries        = addr,
> > > > +                 .skip           = 3,
> > > > +         };
> > > >
> > > > +         save_stack_trace(&trace);
> > > > +         addr += trace.nr_entries;
> > > >   }
> > > > - *addr++ = 0x87654321;
> > > > +#endif
> > > > + *addr = 0x87654321;
> > >
> > > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > > gets eventually replaced by a later patch)
> >
> > struct initializer initialized the non mentioned fields to 0, if I'm not
> > totally mistaken.
>
> Hm, it seems you are correct.  And I thought I knew C.
>
> > > Who actually reads this stack trace?  I couldn't find a consumer.
> >
> > It's stored directly in the memory pointed to by @addr and that's the freed
> > cache memory. If that is used later (UAF) then the stack trace can be
> > printed to see where it was freed.
>
> Right... but who reads it?

That seems like a reasonable question.  After some grepping and some
git searching, it looks like there might not be any users.  I found
SLAB_STORE_USER, but that seems to be independent.

So maybe the whole mess should just be deleted.  If anyone ever
notices, they can re-add it better.

--Andy


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 16:16             ` Josh Poimboeuf
@ 2019-04-15 21:20                 ` Thomas Gleixner
  2019-04-15 21:20                 ` Thomas Gleixner
  1 sibling, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15 21:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, LKML, X86 ML, Sean Christopherson,
	Andrew Morton, Pekka Enberg, Linux-MM

On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> > > 
> > > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > > gets eventually replaced by a later patch)
> > 
> > struct initializer initialized the non mentioned fields to 0, if I'm not
> > totally mistaken.
> 
> Hm, it seems you are correct.  And I thought I knew C.

:)

> > > Who actually reads this stack trace?  I couldn't find a consumer.
> > 
> > It's stored directly in the memory pointed to by @addr and that's the freed
> > cache memory. If that is used later (UAF) then the stack trace can be
> > printed to see where it was freed.
> 
> Right... but who reads it?

Indeed. I didn't check but I know that I saw that info printed at least a
decade ago. Looks like that debug magic in slab.c has seen major changes
since then.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-15 21:20                 ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15 21:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, LKML, X86 ML, Sean Christopherson,
	Andrew Morton, Pekka Enberg, Linux-MM

On Mon, 15 Apr 2019, Josh Poimboeuf wrote:
> On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> > > 
> > > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > > gets eventually replaced by a later patch)
> > 
> > struct initializer initialized the non mentioned fields to 0, if I'm not
> > totally mistaken.
> 
> Hm, it seems you are correct.  And I thought I knew C.

:)

> > > Who actually reads this stack trace?  I couldn't find a consumer.
> > 
> > It's stored directly in the memory pointed to by @addr and that's the freed
> > cache memory. If that is used later (UAF) then the stack trace can be
> > printed to see where it was freed.
> 
> Right... but who reads it?

Indeed. I didn't check but I know that I saw that info printed at least a
decade ago. Looks like that debug magic in slab.c has seen major changes
since then.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 17:05                 ` Andy Lutomirski
@ 2019-04-15 21:22                   ` Thomas Gleixner
  -1 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15 21:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Josh Poimboeuf, LKML, X86 ML, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM

On Mon, 15 Apr 2019, Andy Lutomirski wrote:
> On Mon, Apr 15, 2019 at 9:17 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> > > > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > > > gets eventually replaced by a later patch)
> > >
> > > struct initializer initialized the non mentioned fields to 0, if I'm not
> > > totally mistaken.
> >
> > Hm, it seems you are correct.  And I thought I knew C.
> >
> > > > Who actually reads this stack trace?  I couldn't find a consumer.
> > >
> > > It's stored directly in the memory pointed to by @addr and that's the freed
> > > cache memory. If that is used later (UAF) then the stack trace can be
> > > printed to see where it was freed.
> >
> > Right... but who reads it?
> 
> That seems like a reasonable question.  After some grepping and some
> git searching, it looks like there might not be any users.  I found

Anymore. There was something 10y+ ago...

> SLAB_STORE_USER, but that seems to be independent.
> 
> So maybe the whole mess should just be deleted.  If anyone ever
> notices, they can re-add it better.

No objections from my side, but the mm people might have opinions.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
@ 2019-04-15 21:22                   ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-15 21:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Josh Poimboeuf, LKML, X86 ML, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM

On Mon, 15 Apr 2019, Andy Lutomirski wrote:
> On Mon, Apr 15, 2019 at 9:17 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
> > > > Looks like stack_trace.nr_entries isn't initialized?  (though this code
> > > > gets eventually replaced by a later patch)
> > >
> > > struct initializer initialized the non mentioned fields to 0, if I'm not
> > > totally mistaken.
> >
> > Hm, it seems you are correct.  And I thought I knew C.
> >
> > > > Who actually reads this stack trace?  I couldn't find a consumer.
> > >
> > > It's stored directly in the memory pointed to by @addr and that's the freed
> > > cache memory. If that is used later (UAF) then the stack trace can be
> > > printed to see where it was freed.
> >
> > Right... but who reads it?
> 
> That seems like a reasonable question.  After some grepping and some
> git searching, it looks like there might not be any users.  I found

Anymore. There was something 10y+ ago...

> SLAB_STORE_USER, but that seems to be independent.
> 
> So maybe the whole mess should just be deleted.  If anyone ever
> notices, they can re-add it better.

No objections from my side, but the mm people might have opinions.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V4 01/32] mm/slab: Fix broken stack trace storage
  2019-04-15 21:22                   ` Thomas Gleixner
  (?)
@ 2019-04-16 11:37                   ` Vlastimil Babka
  2019-04-16 14:10                       ` Thomas Gleixner
  -1 siblings, 1 reply; 89+ messages in thread
From: Vlastimil Babka @ 2019-04-16 11:37 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Josh Poimboeuf, LKML, X86 ML, Sean Christopherson, Andrew Morton,
	Pekka Enberg, Linux-MM, David Rientjes

On 4/15/19 11:22 PM, Thomas Gleixner wrote:
> On Mon, 15 Apr 2019, Andy Lutomirski wrote:
>> On Mon, Apr 15, 2019 at 9:17 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>> On Mon, Apr 15, 2019 at 06:07:44PM +0200, Thomas Gleixner wrote:
>>>>> Looks like stack_trace.nr_entries isn't initialized?  (though this code
>>>>> gets eventually replaced by a later patch)
>>>>
>>>> struct initializer initialized the non mentioned fields to 0, if I'm not
>>>> totally mistaken.
>>>
>>> Hm, it seems you are correct.  And I thought I knew C.
>>>
>>>>> Who actually reads this stack trace?  I couldn't find a consumer.
>>>>
>>>> It's stored directly in the memory pointed to by @addr and that's the freed
>>>> cache memory. If that is used later (UAF) then the stack trace can be
>>>> printed to see where it was freed.
>>>
>>> Right... but who reads it?
>>
>> That seems like a reasonable question.  After some grepping and some
>> git searching, it looks like there might not be any users.  I found
> 
> Anymore. There was something 10y+ ago.

In theory it can be useful in a crash dump. But I don't see any related
debugging
check that would trigger a panic, in order to get one.

>> SLAB_STORE_USER, but that seems to be independent.
>>
>> So maybe the whole mess should just be deleted.  If anyone ever
>> notices, they can re-add it better.
> 
> No objections from my side, but the mm people might have opinions.

Anyone who wants to debug wrong slab usage probably uses SLUB anyway, so
I don't think it's a problem to remove broken SLAB debugging. Perhaps
even SLAB itself will be removed soon if there's performance data
supporting it [1].

[1]
https://lore.kernel.org/linux-mm/20190412112816.GD18914@techsingularity.net/T/#u

> Thanks,
> 
> 	tglx
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V5 01/32] mm/slab: Remove broken stack trace storage
  2019-04-16 11:37                   ` Vlastimil Babka
@ 2019-04-16 14:10                       ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-16 14:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andy Lutomirski, Josh Poimboeuf, LKML, X86 ML,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM,
	David Rientjes

kstack_end() is broken on interrupt stacks as they are not guaranteed to be
sized THREAD_SIZE and THREAD_SIZE aligned.

As SLAB seems not to be used much with debugging enabled and might just go
away completely according to:

  https://lkml.kernel.org/r/612f9b99-a75b-6aeb-cf92-7dc5421cd950@suse.cz

just remove the bogus code instead of trying to fix it.

Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
---
V5: Remove the cruft.
V4: Make it actually work
V2: Made the code simpler to understand (Andy)
---
 mm/slab.c |   22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1470,33 +1470,17 @@ static bool is_debug_pagealloc_cache(str
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 			    unsigned long caller)
 {
-	int size = cachep->object_size;
+	int size = cachep->object_size / sizeof(unsigned long);
 
 	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
 
-	if (size < 5 * sizeof(unsigned long))
+	if (size < 4)
 		return;
 
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
-	size -= 3 * sizeof(unsigned long);
-	{
-		unsigned long *sptr = &caller;
-		unsigned long svalue;
-
-		while (!kstack_end(sptr)) {
-			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
-				*addr++ = svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
-		}
-
-	}
-	*addr++ = 0x87654321;
+	*addr = 0x87654321;
 }
 
 static void slab_kernel_map(struct kmem_cache *cachep, void *objp,

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [patch V5 01/32] mm/slab: Remove broken stack trace storage
@ 2019-04-16 14:10                       ` Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: Thomas Gleixner @ 2019-04-16 14:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andy Lutomirski, Josh Poimboeuf, LKML, X86 ML,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM,
	David Rientjes

kstack_end() is broken on interrupt stacks as they are not guaranteed to be
sized THREAD_SIZE and THREAD_SIZE aligned.

As SLAB seems not to be used much with debugging enabled and might just go
away completely according to:

  https://lkml.kernel.org/r/612f9b99-a75b-6aeb-cf92-7dc5421cd950@suse.cz

just remove the bogus code instead of trying to fix it.

Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
---
V5: Remove the cruft.
V4: Make it actually work
V2: Made the code simpler to understand (Andy)
---
 mm/slab.c |   22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1470,33 +1470,17 @@ static bool is_debug_pagealloc_cache(str
 static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
 			    unsigned long caller)
 {
-	int size = cachep->object_size;
+	int size = cachep->object_size / sizeof(unsigned long);
 
 	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
 
-	if (size < 5 * sizeof(unsigned long))
+	if (size < 4)
 		return;
 
 	*addr++ = 0x12345678;
 	*addr++ = caller;
 	*addr++ = smp_processor_id();
-	size -= 3 * sizeof(unsigned long);
-	{
-		unsigned long *sptr = &caller;
-		unsigned long svalue;
-
-		while (!kstack_end(sptr)) {
-			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
-				*addr++ = svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
-		}
-
-	}
-	*addr++ = 0x87654321;
+	*addr = 0x87654321;
 }
 
 static void slab_kernel_map(struct kmem_cache *cachep, void *objp,


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V5 01/32] mm/slab: Remove broken stack trace storage
  2019-04-16 14:10                       ` Thomas Gleixner
  (?)
@ 2019-04-16 15:16                       ` Vlastimil Babka
  -1 siblings, 0 replies; 89+ messages in thread
From: Vlastimil Babka @ 2019-04-16 15:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Josh Poimboeuf, LKML, X86 ML,
	Sean Christopherson, Andrew Morton, Pekka Enberg, Linux-MM,
	David Rientjes

On 4/16/19 4:10 PM, Thomas Gleixner wrote:
> kstack_end() is broken on interrupt stacks as they are not guaranteed to be
> sized THREAD_SIZE and THREAD_SIZE aligned.
> 
> As SLAB seems not to be used much with debugging enabled and might just go
> away completely according to:
> 
>   https://lkml.kernel.org/r/612f9b99-a75b-6aeb-cf92-7dc5421cd950@suse.cz
> 
> just remove the bogus code instead of trying to fix it.
> 
> Fixes: 98eb235b7feb ("[PATCH] page unmapping debug") - History tree
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: linux-mm@kvack.org

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.

> ---
> V5: Remove the cruft.
> V4: Make it actually work
> V2: Made the code simpler to understand (Andy)
> ---
>  mm/slab.c |   22 +++-------------------
>  1 file changed, 3 insertions(+), 19 deletions(-)
> 
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1470,33 +1470,17 @@ static bool is_debug_pagealloc_cache(str
>  static void store_stackinfo(struct kmem_cache *cachep, unsigned long *addr,
>  			    unsigned long caller)
>  {
> -	int size = cachep->object_size;
> +	int size = cachep->object_size / sizeof(unsigned long);
>  
>  	addr = (unsigned long *)&((char *)addr)[obj_offset(cachep)];
>  
> -	if (size < 5 * sizeof(unsigned long))
> +	if (size < 4)
>  		return;
>  
>  	*addr++ = 0x12345678;
>  	*addr++ = caller;
>  	*addr++ = smp_processor_id();
> -	size -= 3 * sizeof(unsigned long);
> -	{
> -		unsigned long *sptr = &caller;
> -		unsigned long svalue;
> -
> -		while (!kstack_end(sptr)) {
> -			svalue = *sptr++;
> -			if (kernel_text_address(svalue)) {
> -				*addr++ = svalue;
> -				size -= sizeof(unsigned long);
> -				if (size <= sizeof(unsigned long))
> -					break;
> -			}
> -		}
> -
> -	}
> -	*addr++ = 0x87654321;
> +	*addr = 0x87654321;
>  }
>  
>  static void slab_kernel_map(struct kmem_cache *cachep, void *objp,
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 11/32] x86/exceptions: Add structs for exception stacks
  2019-04-14 15:59 ` [patch V3 11/32] x86/exceptions: Add structs for exception stacks Thomas Gleixner
@ 2019-04-16 18:20   ` Sean Christopherson
  2019-04-17 14:08   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 89+ messages in thread
From: Sean Christopherson @ 2019-04-16 18:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Sun, Apr 14, 2019 at 05:59:47PM +0200, Thomas Gleixner wrote:
> At the moment everything assumes a full linear mapping of the various
> exception stacks. Adding guard pages to the cpu entry area mapping of the
> exception stacks will break that assumption.
> 
> As a preparatory step convert both the real storage and the effective
> mapping in the cpu entry area from character arrays to structures.
> 
> To ensure that both arrays have the same ordering and the same size of the
> individual stacks fill the members with a macro. The guard size is the only
> difference between the two resulting structures. For now both have guard
> size 0 until the preparation of all usage sites is done.
> 
> Provide a couple of helper macros which are used in the following
> conversions.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

One comment nit below, but other than that:

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

> V2 -> V3: Move the guards below the stacks and add a top guard. Fix the
>       	  TOP() macro.
> ---
>  arch/x86/include/asm/cpu_entry_area.h |   52 ++++++++++++++++++++++++++++++----
>  arch/x86/kernel/cpu/common.c          |    2 -
>  arch/x86/mm/cpu_entry_area.c          |    8 +----
>  3 files changed, 51 insertions(+), 11 deletions(-)
> 
> --- a/arch/x86/include/asm/cpu_entry_area.h
> +++ b/arch/x86/include/asm/cpu_entry_area.h
> @@ -7,6 +7,51 @@
>  #include <asm/processor.h>
>  #include <asm/intel_ds.h>
>  
> +#ifdef CONFIG_X86_64
> +
> +/* Macro to enforce the same ordering and stack sizes */
> +#define ESTACKS_MEMBERS(guardsize)		\
> +	char	DF_stack_guard[guardsize];	\
> +	char	DF_stack[EXCEPTION_STKSZ];	\
> +	char	NMI_stack_guard[guardsize];	\
> +	char	NMI_stack[EXCEPTION_STKSZ];	\
> +	char	DB_stack_guard[guardsize];	\
> +	char	DB_stack[DEBUG_STKSZ];		\
> +	char	MCE_stack_guard[guardsize];	\
> +	char	MCE_stack[EXCEPTION_STKSZ];	\
> +	char	IST_top_guard[guardsize];	\
> +
> +/* The exception stacks linear storage. No guard pages required */

How about s/linear/physical?  "linear" is confusing for us Intel wonks
(or maybe just me) because in Intel's parlance it means virtual addresses
for all intents and purposes.  And an apostrophe to show possession would
help readability, e.g.:

  /* The exception stacks' physical storage. No guard pages required */

> +struct exception_stacks {
> +	ESTACKS_MEMBERS(0)
> +};
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [patch V3 21/32] x86/exceptions: Split debug IST stack
  2019-04-14 15:59 ` [patch V3 21/32] x86/exceptions: Split debug IST stack Thomas Gleixner
@ 2019-04-16 22:07   ` Sean Christopherson
  2019-04-17 14:15   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 89+ messages in thread
From: Sean Christopherson @ 2019-04-16 22:07 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Sun, Apr 14, 2019 at 05:59:57PM +0200, Thomas Gleixner wrote:
> The debug IST stack is actually two separate debug stacks to handle #DB
> recursion. This is required because the CPU starts always at top of stack
> on exception entry, which means on #DB recursion the second #DB would
> overwrite the stack of the first.
> 
> The low level entry code therefore adjusts the top of stack on entry so a
> secondary #DB starts from a different stack page. But the stack pages are
> adjacent without a guard page between them.
> 
> Split the debug stack into 3 stacks which are separated by guard pages. The
> 3rd stack is never mapped into the cpu_entry_area and is only there to
> catch triple #DB nesting:
> 
>       --- top of DB_stack	<- Initial stack
>       --- end of DB_stack
>       	  guard page
> 
>       --- top of DB1_stack	<- Top of stack after entering first #DB
>       --- end of DB1_stack
>       	  guard page
> 
>       --- top of DB2_stack	<- Top of stack after entering second #DB
>       --- end of DB2_stack	   
>       	  guard page
> 
> If DB2 would not act as the final guard hole, a second #DB would point the
> top of #DB stack to the stack below #DB1 which would be valid and not catch
> the not so desired triple nesting.
> 
> The backing store does not allocate any memory for DB2 and its guard page
> as it is not going to be mapped into the cpu_entry_area.
> 
>  - Adjust the low level entry code so it adjusts top of #DB with the offset
>    between the stacks instead of exception stack size.
> 
>  - Make the dumpstack code aware of the new stacks.
> 
>  - Adjust the in_debug_stack() implementation and move it into the NMI code
>    where it belongs. As this is NMI hotpath code, it just checks the full
>    area between top of DB_stack and bottom of DB1_stack without checking
>    for the guard page. That's correct because the NMI cannot hit a
>    stackpointer pointing to the guard page between DB and DB1 stack.  Even
>    if it would, then the NMI operation still is unaffected, but the resume
>    of the debug exception on the topmost DB stack will crash by touching
>    the guard page.
> 
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

One nit below on the docs, otherwise:

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

> V2 -> V3: Fix off by one in in_debug_stack()
> ---
>  Documentation/x86/kernel-stacks       |    7 ++++++-
>  arch/x86/entry/entry_64.S             |    8 ++++----
>  arch/x86/include/asm/cpu_entry_area.h |   14 ++++++++++----
>  arch/x86/include/asm/debugreg.h       |    2 --
>  arch/x86/include/asm/page_64_types.h  |    3 ---
>  arch/x86/kernel/asm-offsets_64.c      |    2 ++
>  arch/x86/kernel/cpu/common.c          |   11 -----------
>  arch/x86/kernel/dumpstack_64.c        |   12 ++++++++----
>  arch/x86/kernel/nmi.c                 |   20 +++++++++++++++++++-
>  arch/x86/mm/cpu_entry_area.c          |    4 +++-
>  10 files changed, 52 insertions(+), 31 deletions(-)
> 
> --- a/Documentation/x86/kernel-stacks
> +++ b/Documentation/x86/kernel-stacks
> @@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
>    middle of switching stacks.  Using IST for NMI events avoids making
>    assumptions about the previous state of the kernel stack.
>  
> -* ESTACK_DB.  DEBUG_STKSZ
> +* ESTACK_DB.  EXCEPTION_STKSZ (PAGE_SIZE).
>  
>    Used for hardware debug interrupts (interrupt 1) and for software
>    debug interrupts (INT3).
> @@ -86,6 +86,11 @@ The currently assigned IST stacks are :-
>    avoids making assumptions about the previous state of the kernel
>    stack.
>  
> +  To handle nested #DB correctly there exist two instances of DB stacks. On
> +  #DB entry the IST stackpointer for #DB is switched to the second instance
> +  so a nested #DB starts from a clean stack. The nested #DB switches to

Pretty sure the "to" at the end is unwanted.

> +  the IST stackpointer to a guard hole to catch triple nesting.
> +
>  * ESTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
>  
>    Used for interrupt 18 - Machine Check Exception (#MC).

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Limit IST stack overflow check to #DB stack
  2019-04-14 15:59 ` [patch V3 02/32] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
@ 2019-04-17 14:02   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, linux-kernel, jpoimboe, tglx, sean.j.christopherson,
	nstange, bp, mitsuo.hayasaka.hu, x86, hpa, mingo, mingo

Commit-ID:  7dbcf2b0b770eeb803a416ee8dcbef78e6389d40
Gitweb:     https://git.kernel.org/tip/7dbcf2b0b770eeb803a416ee8dcbef78e6389d40
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:38 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:06:56 +0200

x86/irq/64: Limit IST stack overflow check to #DB stack

Commit

  37fe6a42b343 ("x86: Check stack overflow in detail")

added a broad check for the full exception stack area, i.e. it considers
the full exception stack area as valid.

That's wrong in two aspects:

 1) It does not check the individual areas one by one

 2) #DF, NMI and #MCE are not enabling interrupts which means that a
    regular device interrupt cannot happen in their context. In fact if a
    device interrupt hits one of those IST stacks that's a bug because some
    code path enabled interrupts while handling the exception.

Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.de
---
 arch/x86/kernel/irq_64.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 0469cd078db1..b50ac9c7397b 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -26,9 +26,18 @@ int sysctl_panic_on_stackoverflow;
 /*
  * Probabilistic stack overflow check:
  *
- * Only check the stack in process context, because everything else
- * runs on the big interrupt stacks. Checking reliably is too expensive,
- * so we just check from interrupts.
+ * Regular device interrupts can enter on the following stacks:
+ *
+ * - User stack
+ *
+ * - Kernel task stack
+ *
+ * - Interrupt stack if a device driver reenables interrupts
+ *   which should only happen in really old drivers.
+ *
+ * - Debug IST stack
+ *
+ * All other contexts are invalid.
  */
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
@@ -53,8 +62,8 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[0] - EXCEPTION_STKSZ + STACK_TOP_MARGIN;
-	estack_bottom = (u64)oist->ist[N_EXCEPTION_STACKS - 1];
+	estack_bottom = (u64)oist->ist[DEBUG_STACK];
+	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
 	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
 		return;
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/dumpstack: Fix off-by-one errors in stack identification
  2019-04-14 15:59 ` [patch V3 03/32] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
@ 2019-04-17 14:03   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-04-17 14:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, sean.j.christopherson, tglx, x86, luto,
	jpoimboe, bp, mingo, hpa

Commit-ID:  fa33215422fd415a07ec2a00e9f1acdaf0fa8e94
Gitweb:     https://git.kernel.org/tip/fa33215422fd415a07ec2a00e9f1acdaf0fa8e94
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 14 Apr 2019 17:59:39 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:26:50 +0200

x86/dumpstack: Fix off-by-one errors in stack identification

The get_stack_info() function is off-by-one when checking whether an
address is on a IRQ stack or a IST stack. This prevents an overflowed
IRQ or IST stack from being dumped properly.

[ tglx: Do the same for 32-bit ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160143.785651055@linutronix.de
---
 arch/x86/kernel/dumpstack_32.c | 4 ++--
 arch/x86/kernel/dumpstack_64.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index cd53f3030e40..d305440ebe9c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -41,7 +41,7 @@ static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack > end)
 		return false;
 
 	info->type	= STACK_TYPE_IRQ;
@@ -66,7 +66,7 @@ static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack > end)
 		return false;
 
 	info->type	= STACK_TYPE_SOFTIRQ;
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 5cdb9e84da57..90f0fa88cbb3 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -65,7 +65,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 		begin = end - (exception_stack_sizes[k] / sizeof(long));
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack <= begin || stack >= end)
+		if (stack < begin || stack >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
@@ -88,7 +88,7 @@ static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 	 * This is a software stack, so 'end' can be a valid stack pointer.
 	 * It just means the stack is empty.
 	 */
-	if (stack <= begin || stack > end)
+	if (stack < begin || stack >= end)
 		return false;
 
 	info->type	= STACK_TYPE_IRQ;

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Remove a hardcoded irq_stack_union access
  2019-04-14 15:59 ` [patch V3 04/32] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
@ 2019-04-17 14:03   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-04-17 14:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: nstange, mingo, tglx, jpoimboe, luto, mingo, hpa, bp,
	linux-kernel, x86, sean.j.christopherson

Commit-ID:  4f44b8f0b33b7111216f0fad353315f796b81617
Gitweb:     https://git.kernel.org/tip/4f44b8f0b33b7111216f0fad353315f796b81617
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 14 Apr 2019 17:59:40 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:31:38 +0200

x86/irq/64: Remove a hardcoded irq_stack_union access

stack_overflow_check() is using both irq_stack_ptr and irq_stack_union
to find the IRQ stack. That's going to break when vmapped irq stacks are
introduced.

Change it to just use irq_stack_ptr.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160143.872549191@linutronix.de
---
 arch/x86/kernel/irq_64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index b50ac9c7397b..f6dcc8fea5c0 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -55,9 +55,8 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_top = (u64)this_cpu_ptr(irq_stack_union.irq_stack) +
-			STACK_TOP_MARGIN;
 	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
 	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
 		return;
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Sanitize the top/bottom confusion
  2019-04-14 15:59 ` [patch V3 05/32] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
@ 2019-04-17 14:04   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: x86, bp, luto, linux-kernel, sean.j.christopherson, nstange,
	mingo, tglx, hpa, jpoimboe, mingo

Commit-ID:  df835e7083bee33e98635aca26b39b63ebc6cca7
Gitweb:     https://git.kernel.org/tip/df835e7083bee33e98635aca26b39b63ebc6cca7
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:41 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:34:49 +0200

x86/irq/64: Sanitize the top/bottom confusion

On x86, stacks go top to bottom, but the stack overflow check uses it
the other way round, which is just confusing. Clean it up and sanitize
the warning string a bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160143.961241397@linutronix.de
---
 arch/x86/kernel/irq_64.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index f6dcc8fea5c0..cf200466d5c8 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -42,7 +42,7 @@ int sysctl_panic_on_stackoverflow;
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
-#define STACK_TOP_MARGIN	128
+#define STACK_MARGIN	128
 	struct orig_ist *oist;
 	u64 irq_stack_top, irq_stack_bottom;
 	u64 estack_top, estack_bottom;
@@ -51,25 +51,25 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 	if (user_mode(regs))
 		return;
 
-	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_TOP_MARGIN &&
+	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_MARGIN &&
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_bottom = (u64)__this_cpu_read(irq_stack_ptr);
-	irq_stack_top = irq_stack_bottom - IRQ_STACK_SIZE + STACK_TOP_MARGIN;
-	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
+	irq_stack_top = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
+	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_bottom = (u64)oist->ist[DEBUG_STACK];
-	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
-	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
+	estack_top = (u64)oist->ist[DEBUG_STACK];
+	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
+	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
 
-	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx,ip:%pF)\n",
+	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx, irq stack:%Lx-%Lx, exception stack: %Lx-%Lx, ip:%pF)\n",
 		current->comm, curbase, regs->sp,
-		irq_stack_top, irq_stack_bottom,
-		estack_top, estack_bottom, (void *)regs->ip);
+		irq_stack_bottom, irq_stack_top,
+		estack_bottom, estack_top, (void *)regs->ip);
 
 	if (sysctl_panic_on_stackoverflow)
 		panic("low stack detected by irq handler - check messages\n");

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/idt: Remove unused macro SISTG
  2019-04-14 15:59 ` [patch V3 06/32] x86/idt: Remove unused macro SISTG Thomas Gleixner
@ 2019-04-17 14:05   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, x86, mingo, douly.fnst, tglx, sean.j.christopherson,
	nstange, linux-kernel, mingo, bp, jpoimboe, luto

Commit-ID:  99d334511b337884cadbdfae28da912a4edb1001
Gitweb:     https://git.kernel.org/tip/99d334511b337884cadbdfae28da912a4edb1001
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:42 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:36:19 +0200

x86/idt: Remove unused macro SISTG

Commit

  d8ba61ba58c8 ("x86/entry/64: Don't use IST entry for #BP stack")

removed the last user but left the macro around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.050689789@linutronix.de
---
 arch/x86/kernel/idt.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 01adea278a71..2877606e97de 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -45,10 +45,6 @@ struct idt_data {
 #define ISTG(_vector, _addr, _ist)			\
 	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
-/* System interrupt gate with interrupt stack */
-#define SISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL3, __KERNEL_CS)
-
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
 	G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/64: Remove stale CURRENT_MASK
  2019-04-14 15:59 ` [patch V3 07/32] x86/64: Remove stale CURRENT_MASK Thomas Gleixner
@ 2019-04-17 14:06   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, bp, jpoimboe, cai, kirill.shutemov, mingo, mingo,
	linux-kernel, x86, bhe, sean.j.christopherson, luto, tglx

Commit-ID:  6f36bd8d2e8c221eaaf4ce5b0ebbb11c00b0ac98
Gitweb:     https://git.kernel.org/tip/6f36bd8d2e8c221eaaf4ce5b0ebbb11c00b0ac98
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:43 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:38:36 +0200

x86/64: Remove stale CURRENT_MASK

Nothing uses that and before people get the wrong ideas, get rid of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.139284839@linutronix.de
---
 arch/x86/include/asm/page_64_types.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 8f657286d599..bcd8c0518604 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -14,7 +14,6 @@
 
 #define THREAD_SIZE_ORDER	(2 + KASAN_STACK_ORDER)
 #define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
-#define CURRENT_MASK (~(THREAD_SIZE - 1))
 
 #define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
 #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/exceptions: Remove unused stack defines on 32bit
  2019-04-14 15:59 ` [patch V3 08/32] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
@ 2019-04-17 14:06   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, mhocko, bp, dave.hansen, sean.j.christopherson, mingo,
	x86, tglx, luto, mingo, linux-kernel, hpa

Commit-ID:  30842211506e376b76394a9cb4e6d0c9d258b8d4
Gitweb:     https://git.kernel.org/tip/30842211506e376b76394a9cb4e6d0c9d258b8d4
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:44 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:45:29 +0200

x86/exceptions: Remove unused stack defines on 32bit

Nothing requires those for 32bit builds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.227822695@linutronix.de
---
 arch/x86/include/asm/page_32_types.h | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
index 0d5c739eebd7..57b5dc15ca7e 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,11 +22,7 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 0
-#define DEBUG_STACK 0
-#define MCE_STACK 0
-#define N_EXCEPTION_STACKS 1
+#define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
 /*

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/exceptions: Make IST index zero based
  2019-04-14 15:59 ` [patch V3 09/32] x86/exceptions: Make IST index zero based Thomas Gleixner
@ 2019-04-17 14:07   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: corbet, dave.hansen, bhe, mingo, cai, douly.fnst, x86, jpoimboe,
	nstange, tglx, bp, luto, peterz, konrad.wilk, linux-kernel,
	chang.seok.bae, mingo, sean.j.christopherson, kirill.shutemov,
	linux, hpa

Commit-ID:  8f34c5b5afce91d171bb0802631197484cb69b8b
Gitweb:     https://git.kernel.org/tip/8f34c5b5afce91d171bb0802631197484cb69b8b
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:45 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:48:00 +0200

x86/exceptions: Make IST index zero based

The defines for the exception stack (IST) array in the TSS are using the
SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
array indices related to IST. That's confusing at best and does not provide
any value.

Make the indices zero based and fixup the usage sites. The only code which
needs to adjust the 0 based index is the interrupt descriptor setup which
needs to add 1 now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.331772825@linutronix.de
---
 Documentation/x86/kernel-stacks      |  8 ++++----
 arch/x86/entry/entry_64.S            |  4 ++--
 arch/x86/include/asm/page_64_types.h | 13 ++++++++-----
 arch/x86/kernel/cpu/common.c         |  4 ++--
 arch/x86/kernel/dumpstack_64.c       | 14 +++++++-------
 arch/x86/kernel/idt.c                | 15 +++++++++------
 arch/x86/kernel/irq_64.c             |  2 +-
 arch/x86/mm/fault.c                  |  2 +-
 8 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index 9a0aa4d3a866..1b04596caea9 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -59,7 +59,7 @@ If that assumption is ever broken then the stacks will become corrupt.
 
 The currently assigned IST stacks are :-
 
-* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ESTACK_DF.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).
 
@@ -68,7 +68,7 @@ The currently assigned IST stacks are :-
   Using a separate stack allows the kernel to recover from it well enough
   in many cases to still output an oops.
 
-* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ESTACK_NMI.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for non-maskable interrupts (NMI).
 
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* DEBUG_STACK.  DEBUG_STKSZ
+* ESTACK_DB.  DEBUG_STKSZ
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,7 +86,7 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
-* MCE_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* ESTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb7b629..fd0a50452cb3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -841,7 +841,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8)
 
 /**
  * idtentry - Generate an IDT entry stub
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=ESTACK_DB
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index bcd8c0518604..6ab2c54c1bf9 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -24,11 +24,14 @@
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 2
-#define DEBUG_STACK 3
-#define MCE_STACK 4
-#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
+/*
+ * The index for the tss.ist[] array. The hardware limit is 7 entries.
+ */
+#define	ESTACK_DF		0
+#define	ESTACK_NMI		1
+#define	ESTACK_DB		2
+#define	ESTACK_MCE		3
+#define	N_EXCEPTION_STACKS	4
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..0e4cb718fc4a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -516,7 +516,7 @@ DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
  */
 static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
+	  [ESTACK_DB]				= DEBUG_STKSZ
 };
 #endif
 
@@ -1760,7 +1760,7 @@ void cpu_init(void)
 			estacks += exception_stack_sizes[v];
 			oist->ist[v] = t->x86_tss.ist[v] =
 					(unsigned long)estacks;
-			if (v == DEBUG_STACK-1)
+			if (v == ESTACK_DB)
 				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
 		}
 	}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 90f0fa88cbb3..455b47ef9250 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -18,16 +18,16 @@
 
 #include <asm/stacktrace.h>
 
-static char *exception_stack_names[N_EXCEPTION_STACKS] = {
-		[ DOUBLEFAULT_STACK-1	]	= "#DF",
-		[ NMI_STACK-1		]	= "NMI",
-		[ DEBUG_STACK-1		]	= "#DB",
-		[ MCE_STACK-1		]	= "#MC",
+static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ ESTACK_DF	]	= "#DF",
+		[ ESTACK_NMI	]	= "NMI",
+		[ ESTACK_DB	]	= "#DB",
+		[ ESTACK_MCE	]	= "#MC",
 };
 
-static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[DEBUG_STACK - 1]			= DEBUG_STKSZ
+	[ESTACK_DB]				= DEBUG_STKSZ
 };
 
 const char *stack_type_name(enum stack_type type)
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2877606e97de..2188f734ec61 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -41,9 +41,12 @@ struct idt_data {
 #define SYSG(_vector, _addr)				\
 	G(_vector, _addr, DEFAULT_STACK, GATE_INTERRUPT, DPL3, __KERNEL_CS)
 
-/* Interrupt gate with interrupt stack */
+/*
+ * Interrupt gate with interrupt stack. The _ist index is the index in
+ * the tss.ist[] array, but for the descriptor it needs to start at 1.
+ */
 #define ISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
+	G(_vector, _addr, _ist + 1, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
@@ -180,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] __page_aligned_bss;
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		DEBUG_STACK),
-	ISTG(X86_TRAP_NMI,	nmi,		NMI_STACK),
-	ISTG(X86_TRAP_DF,	double_fault,	DOUBLEFAULT_STACK),
+	ISTG(X86_TRAP_DB,	debug,		ESTACK_DB),
+	ISTG(X86_TRAP_NMI,	nmi,		ESTACK_NMI),
+	ISTG(X86_TRAP_DF,	double_fault,	ESTACK_DF),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	MCE_STACK),
+	ISTG(X86_TRAP_MC,	&machine_check,	ESTACK_MCE),
 #endif
 };
 
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index cf200466d5c8..182e8b245e06 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -61,7 +61,7 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[DEBUG_STACK];
+	estack_top = (u64)oist->ist[ESTACK_DB];
 	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
 	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 667f1da36208..0524e1d74f24 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -793,7 +793,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
+		unsigned long stack = this_cpu_read(orig_ist.ist[ESTACK_DF]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/cpu_entry_area: Cleanup setup functions
  2019-04-14 15:59 ` [patch V3 10/32] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
@ 2019-04-17 14:08   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sean.j.christopherson, mingo, hpa, mingo, bp, x86, luto,
	linux-kernel, peterz, dave.hansen, tglx, jpoimboe

Commit-ID:  881a463cf21dbf83aab2cf6c9a359f34f88c2491
Gitweb:     https://git.kernel.org/tip/881a463cf21dbf83aab2cf6c9a359f34f88c2491
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:46 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:50:46 +0200

x86/cpu_entry_area: Cleanup setup functions

No point in retrieving the entry area pointer over and over. Do it once
and use unsigned int for 'cpu' everywhere.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.419653165@linutronix.de
---
 arch/x86/mm/cpu_entry_area.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index 19c6abf9ea31..c2a54f75d335 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -52,10 +52,10 @@ cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot)
 		cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
 }
 
-static void __init percpu_setup_debug_store(int cpu)
+static void __init percpu_setup_debug_store(unsigned int cpu)
 {
 #ifdef CONFIG_CPU_SUP_INTEL
-	int npages;
+	unsigned int npages;
 	void *cea;
 
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
@@ -79,8 +79,9 @@ static void __init percpu_setup_debug_store(int cpu)
 }
 
 /* Setup the fixmap mappings only once per-processor */
-static void __init setup_cpu_entry_area(int cpu)
+static void __init setup_cpu_entry_area(unsigned int cpu)
 {
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
 #ifdef CONFIG_X86_64
 	/* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
 	pgprot_t gdt_prot = PAGE_KERNEL_RO;
@@ -101,10 +102,9 @@ static void __init setup_cpu_entry_area(int cpu)
 	pgprot_t tss_prot = PAGE_KERNEL;
 #endif
 
-	cea_set_pte(&get_cpu_entry_area(cpu)->gdt, get_cpu_gdt_paddr(cpu),
-		    gdt_prot);
+	cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);
 
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->entry_stack_page,
+	cea_map_percpu_pages(&cea->entry_stack_page,
 			     per_cpu_ptr(&entry_stack_storage, cpu), 1,
 			     PAGE_KERNEL);
 
@@ -128,19 +128,18 @@ static void __init setup_cpu_entry_area(int cpu)
 	BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
 		      offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
 	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->tss,
-			     &per_cpu(cpu_tss_rw, cpu),
+	cea_map_percpu_pages(&cea->tss, &per_cpu(cpu_tss_rw, cpu),
 			     sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
 
 #ifdef CONFIG_X86_32
-	per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
+	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
 		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->exception_stacks,
+	cea_map_percpu_pages(&cea->exception_stacks,
 			     &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/exceptions: Add structs for exception stacks
  2019-04-14 15:59 ` [patch V3 11/32] x86/exceptions: Add structs for exception stacks Thomas Gleixner
  2019-04-16 18:20   ` Sean Christopherson
@ 2019-04-17 14:08   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sean.j.christopherson, peterz, tglx, luto, mingo, dave.hansen,
	chang.seok.bae, x86, bp, hpa, linux-kernel, jpoimboe, linux,
	mingo, konrad.wilk

Commit-ID:  019b17b3ffe48100e52f609ca1c6ed6e5a40cba1
Gitweb:     https://git.kernel.org/tip/019b17b3ffe48100e52f609ca1c6ed6e5a40cba1
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:47 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:55:18 +0200

x86/exceptions: Add structs for exception stacks

At the moment everything assumes a full linear mapping of the various
exception stacks. Adding guard pages to the cpu entry area mapping of the
exception stacks will break that assumption.

As a preparatory step convert both the real storage and the effective
mapping in the cpu entry area from character arrays to structures.

To ensure that both arrays have the same ordering and the same size of the
individual stacks fill the members with a macro. The guard size is the only
difference between the two resulting structures. For now both have guard
size 0 until the preparation of all usage sites is done.

Provide a couple of helper macros which are used in the following
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.506807893@linutronix.de
---
 arch/x86/include/asm/cpu_entry_area.h | 52 +++++++++++++++++++++++++++++++----
 arch/x86/kernel/cpu/common.c          |  2 +-
 arch/x86/mm/cpu_entry_area.c          |  8 ++----
 3 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 29c706415443..af8c312673de 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -7,6 +7,51 @@
 #include <asm/processor.h>
 #include <asm/intel_ds.h>
 
+#ifdef CONFIG_X86_64
+
+/* Macro to enforce the same ordering and stack sizes */
+#define ESTACKS_MEMBERS(guardsize)		\
+	char	DF_stack_guard[guardsize];	\
+	char	DF_stack[EXCEPTION_STKSZ];	\
+	char	NMI_stack_guard[guardsize];	\
+	char	NMI_stack[EXCEPTION_STKSZ];	\
+	char	DB_stack_guard[guardsize];	\
+	char	DB_stack[DEBUG_STKSZ];		\
+	char	MCE_stack_guard[guardsize];	\
+	char	MCE_stack[EXCEPTION_STKSZ];	\
+	char	IST_top_guard[guardsize];	\
+
+/* The exception stacks' physical storage. No guard pages required */
+struct exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+/*
+ * The effective cpu entry area mapping with guard pages. Guard size is
+ * zero until the code which makes assumptions about linear mappings is
+ * cleaned up.
+ */
+struct cea_exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+#define CEA_ESTACK_SIZE(st)					\
+	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
+
+#define CEA_ESTACK_BOT(ceastp, st)				\
+	((unsigned long)&(ceastp)->st## _stack)
+
+#define CEA_ESTACK_TOP(ceastp, st)				\
+	(CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))
+
+#define CEA_ESTACK_OFFS(st)					\
+	offsetof(struct cea_exception_stacks, st## _stack)
+
+#define CEA_ESTACK_PAGES					\
+	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
+
+#endif
+
 /*
  * cpu_entry_area is a percpu region that contains things needed by the CPU
  * and early entry/exit code.  Real types aren't used for all fields here
@@ -32,12 +77,9 @@ struct cpu_entry_area {
 
 #ifdef CONFIG_X86_64
 	/*
-	 * Exception stacks used for IST entries.
-	 *
-	 * In the future, this should have a separate slot for each stack
-	 * with guard pages between them.
+	 * Exception stacks used for IST entries with guard pages.
 	 */
-	char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
+	struct cea_exception_stacks estacks;
 #endif
 #ifdef CONFIG_CPU_SUP_INTEL
 	/*
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0e4cb718fc4a..24b801ea7522 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1754,7 +1754,7 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!oist->ist[0]) {
-		char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
+		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
 
 		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
 			estacks += exception_stack_sizes[v];
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index c2a54f75d335..6a09b84c13fe 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -13,8 +13,7 @@
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
 
 #ifdef CONFIG_X86_64
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
-	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -138,9 +137,8 @@ static void __init setup_cpu_entry_area(unsigned int cpu)
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&cea->exception_stacks,
-			     &per_cpu(exception_stacks, cpu),
+		     sizeof(((struct cpu_entry_area *)0)->estacks));
+	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif
 	percpu_setup_debug_store(cpu);

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/cpu_entry_area: Prepare for IST guard pages
  2019-04-14 15:59 ` [patch V3 12/32] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
@ 2019-04-17 14:09   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, dave.hansen, jpoimboe, mingo, hpa, sean.j.christopherson,
	mingo, tglx, bp, x86, linux-kernel, luto

Commit-ID:  a4af767ae59cc579569bbfe49513a0037d5989ee
Gitweb:     https://git.kernel.org/tip/a4af767ae59cc579569bbfe49513a0037d5989ee
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:48 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 12:58:21 +0200

x86/cpu_entry_area: Prepare for IST guard pages

To allow guard pages between the IST stacks each stack needs to be
mapped individually.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.592691557@linutronix.de
---
 arch/x86/mm/cpu_entry_area.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index 6a09b84c13fe..2b1407662a6d 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -77,6 +77,34 @@ static void __init percpu_setup_debug_store(unsigned int cpu)
 #endif
 }
 
+#ifdef CONFIG_X86_64
+
+#define cea_map_stack(name) do {					\
+	npages = sizeof(estacks->name## _stack) / PAGE_SIZE;		\
+	cea_map_percpu_pages(cea->estacks.name## _stack,		\
+			estacks->name## _stack, npages, PAGE_KERNEL);	\
+	} while (0)
+
+static void __init percpu_setup_exception_stacks(unsigned int cpu)
+{
+	struct exception_stacks *estacks = per_cpu_ptr(&exception_stacks, cpu);
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
+	unsigned int npages;
+
+	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+	/*
+	 * The exceptions stack mappings in the per cpu area are protected
+	 * by guard pages so each stack must be mapped separately.
+	 */
+	cea_map_stack(DF);
+	cea_map_stack(NMI);
+	cea_map_stack(DB);
+	cea_map_stack(MCE);
+}
+#else
+static inline void percpu_setup_exception_stacks(unsigned int cpu) {}
+#endif
+
 /* Setup the fixmap mappings only once per-processor */
 static void __init setup_cpu_entry_area(unsigned int cpu)
 {
@@ -134,13 +162,8 @@ static void __init setup_cpu_entry_area(unsigned int cpu)
 	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
-#ifdef CONFIG_X86_64
-	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
-	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->estacks));
-	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
-			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
-#endif
+	percpu_setup_exception_stacks(cpu);
+
 	percpu_setup_debug_store(cpu);
 }
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/cpu_entry_area: Provide exception stack accessor
  2019-04-14 15:59 ` [patch V3 13/32] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
@ 2019-04-17 14:10   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, tglx, mingo, dave.hansen, mingo, sean.j.christopherson,
	x86, luto, linux-kernel, hpa, bp, jpoimboe

Commit-ID:  7623f37e411156e6e09b95cf5c76e509c5fda640
Gitweb:     https://git.kernel.org/tip/7623f37e411156e6e09b95cf5c76e509c5fda640
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:49 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 13:00:22 +0200

x86/cpu_entry_area: Provide exception stack accessor

Store a pointer to the per cpu entry area exception stack mappings to allow
fast retrieval.

Required for converting various places from using the shadow IST array to
directly doing address calculations on the actual mapping address.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.680960459@linutronix.de
---
 arch/x86/include/asm/cpu_entry_area.h | 4 ++++
 arch/x86/mm/cpu_entry_area.c          | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index af8c312673de..9b406f067ecf 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -99,6 +99,7 @@ struct cpu_entry_area {
 #define CPU_ENTRY_AREA_TOT_SIZE	(CPU_ENTRY_AREA_SIZE * NR_CPUS)
 
 DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks);
 
 extern void setup_cpu_entry_areas(void);
 extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
@@ -118,4 +119,7 @@ static inline struct entry_stack *cpu_entry_stack(int cpu)
 	return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
 }
 
+#define __this_cpu_ist_top_va(name)					\
+	CEA_ESTACK_TOP(__this_cpu_read(cea_exception_stacks), name)
+
 #endif
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index 2b1407662a6d..a00d0d059c8a 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -14,6 +14,7 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage)
 
 #ifdef CONFIG_X86_64
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
+DEFINE_PER_CPU(struct cea_exception_stacks*, cea_exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -92,6 +93,9 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
 	unsigned int npages;
 
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+
+	per_cpu(cea_exception_stacks, cpu) = &cea->estacks;
+
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
 	 * by guard pages so each stack must be mapped separately.

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/traps: Use cpu_entry_area instead of orig_ist
  2019-04-14 15:59 ` [patch V3 14/32] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
@ 2019-04-17 14:10   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, sean.j.christopherson, mingo, jpoimboe, peterz, bp,
	hpa, mingo, x86, tglx, luto, linux-kernel

Commit-ID:  d876b67343a648f3613506c7dbfed088fa0c875b
Gitweb:     https://git.kernel.org/tip/d876b67343a648f3613506c7dbfed088fa0c875b
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:50 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 13:01:59 +0200

x86/traps: Use cpu_entry_area instead of orig_ist

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.784487230@linutronix.de
---
 arch/x86/mm/fault.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 0524e1d74f24..06c089513d39 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
 #include <asm/efi.h>			/* efi_recover_from_page_fault()*/
 #include <asm/desc.h>			/* store_idt(), ...		*/
+#include <asm/cpu_entry_area.h>		/* exception stack		*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -793,7 +794,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[ESTACK_DF]) - sizeof(void *);
+		unsigned long stack = __this_cpu_ist_top_va(DF) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Use cpu entry area instead of orig_ist
  2019-04-14 15:59 ` [patch V3 15/32] x86/irq/64: Use cpu entry area " Thomas Gleixner
@ 2019-04-17 14:11   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: nstange, sean.j.christopherson, bp, mingo, linux-kernel, luto,
	jpoimboe, x86, hpa, mingo, tglx

Commit-ID:  bf5882abab773afd1277415e2f826b21de28f30d
Gitweb:     https://git.kernel.org/tip/bf5882abab773afd1277415e2f826b21de28f30d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:51 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 13:02:48 +0200

x86/irq/64: Use cpu entry area instead of orig_ist

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.885741626@linutronix.de
---
 arch/x86/kernel/irq_64.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 182e8b245e06..7eb6f8d11bfd 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,8 @@
 #include <linux/uaccess.h>
 #include <linux/smp.h>
 #include <linux/sched/task_stack.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
@@ -43,10 +45,9 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 #define STACK_MARGIN	128
-	struct orig_ist *oist;
-	u64 irq_stack_top, irq_stack_bottom;
-	u64 estack_top, estack_bottom;
+	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
 	u64 curbase = (u64)task_stack_page(current);
+	struct cea_exception_stacks *estacks;
 
 	if (user_mode(regs))
 		return;
@@ -60,9 +61,9 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
 
-	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[ESTACK_DB];
-	estack_bottom = estack_top - DEBUG_STKSZ + STACK_MARGIN;
+	estacks = __this_cpu_read(cea_exception_stacks);
+	estack_top = CEA_ESTACK_TOP(estacks, DB);
+	estack_bottom = CEA_ESTACK_BOT(estacks, DB) + STACK_MARGIN;
 	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
 		return;
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/dumpstack/64: Use cpu_entry_area instead of orig_ist
  2019-04-14 15:59 ` [patch V3 16/32] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
@ 2019-04-17 14:12   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, luto, mingo, sean.j.christopherson, hpa,
	jpoimboe, bp, tglx, x86

Commit-ID:  afcd21dad88b68d646e91ed36948117d58b4c197
Gitweb:     https://git.kernel.org/tip/afcd21dad88b68d646e91ed36948117d58b4c197
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:52 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 13:05:06 +0200

x86/dumpstack/64: Use cpu_entry_area instead of orig_ist

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.974900463@linutronix.de
---
 arch/x86/kernel/dumpstack_64.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 455b47ef9250..f6fbd0438f9e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/nmi.h>
 
+#include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
 static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
@@ -25,11 +26,6 @@ static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
 		[ ESTACK_MCE	]	= "#MC",
 };
 
-static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[ESTACK_DB]				= DEBUG_STKSZ
-};
-
 const char *stack_type_name(enum stack_type type)
 {
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
@@ -52,25 +48,44 @@ const char *stack_type_name(enum stack_type type)
 	return NULL;
 }
 
+struct estack_layout {
+	unsigned int	begin;
+	unsigned int	end;
+};
+
+#define	ESTACK_ENTRY(x)	{						  \
+	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
+	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
+	}
+
+static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+	[ ESTACK_DF	]	= ESTACK_ENTRY(DF),
+	[ ESTACK_NMI	]	= ESTACK_ENTRY(NMI),
+	[ ESTACK_DB	]	= ESTACK_ENTRY(DB),
+	[ ESTACK_MCE	]	= ESTACK_ENTRY(MCE),
+};
+
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin, *end;
+	unsigned long estacks, begin, end, stk = (unsigned long)stack;
 	struct pt_regs *regs;
-	unsigned k;
+	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
+	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		begin = estacks + layout[k].begin;
+		end   = estacks + layout[k].end;
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack < begin || stack >= end)
+		if (stk < begin || stk >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= begin;
-		info->end	= end;
+		info->begin	= (unsigned long *)begin;
+		info->end	= (unsigned long *)end;
 		info->next_sp	= (unsigned long *)regs->sp;
 
 		return true;

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/cpu: Prepare TSS.IST setup for guard pages
  2019-04-14 15:59 ` [patch V3 17/32] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
@ 2019-04-17 14:13   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, peterz, hpa, x86, chang.seok.bae, mingo, mingo, linux,
	bp, tglx, konrad.wilk, sean.j.christopherson, linux-kernel, luto

Commit-ID:  f6ef73224a0f0400c3979c8bc68b383f9d2eb9d8
Gitweb:     https://git.kernel.org/tip/f6ef73224a0f0400c3979c8bc68b383f9d2eb9d8
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:53 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 14:29:22 +0200

x86/cpu: Prepare TSS.IST setup for guard pages

Convert the TSS.IST setup code to use the cpu entry area information
directly instead of assuming a linear mapping of the IST stacks.

The store to orig_ist[] is no longer required as there are no users
anymore.

This is the last preparatory step towards IST guard pages.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.061686012@linutronix.de
---
 arch/x86/kernel/cpu/common.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 24b801ea7522..4b01b71415f5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -507,19 +507,6 @@ void load_percpu_segment(int cpu)
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
 #endif
 
-#ifdef CONFIG_X86_64
-/*
- * Special IST stacks which the CPU switches to when it calls
- * an IST-marked descriptor entry. Up to 7 stacks (hardware
- * limit), all of them are 4K, except the debug stack which
- * is 8K.
- */
-static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [ESTACK_DB]				= DEBUG_STKSZ
-};
-#endif
-
 /* Load the original GDT from the per-cpu structure */
 void load_direct_gdt(int cpu)
 {
@@ -1690,17 +1677,14 @@ static void setup_getcpu(int cpu)
  * initialized (naturally) in the bootstrap process, such as the GDT
  * and IDT. We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
- * A lot of state is already set up in PDA init for 64 bit
  */
 #ifdef CONFIG_X86_64
 
 void cpu_init(void)
 {
-	struct orig_ist *oist;
+	int cpu = raw_smp_processor_id();
 	struct task_struct *me;
 	struct tss_struct *t;
-	unsigned long v;
-	int cpu = raw_smp_processor_id();
 	int i;
 
 	wait_for_master_cpu(cpu);
@@ -1715,7 +1699,6 @@ void cpu_init(void)
 		load_ucode_ap();
 
 	t = &per_cpu(cpu_tss_rw, cpu);
-	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
 	if (this_cpu_read(numa_node) == 0 &&
@@ -1753,16 +1736,12 @@ void cpu_init(void)
 	/*
 	 * set up and load the per-CPU TSS
 	 */
-	if (!oist->ist[0]) {
-		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
-
-		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
-			estacks += exception_stack_sizes[v];
-			oist->ist[v] = t->x86_tss.ist[v] =
-					(unsigned long)estacks;
-			if (v == ESTACK_DB)
-				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
-		}
+	if (!t->x86_tss.ist[0]) {
+		t->x86_tss.ist[ESTACK_DF] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[ESTACK_NMI] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[ESTACK_DB] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[ESTACK_MCE] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[ESTACK_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/cpu: Remove orig_ist array
  2019-04-14 15:59 ` [patch V3 18/32] x86/cpu: Remove orig_ist array Thomas Gleixner
@ 2019-04-17 14:13   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, jpoimboe, x86, chang.seok.bae, bp, konrad.wilk, tglx, mingo,
	jgross, kernelfans, luto, ndesaulniers, peterz,
	sean.j.christopherson, linux-kernel, akpm, vbabka, puwen,
	jkosina, mingo, linux, dave.hansen

Commit-ID:  4d68c3d0ecd5fcba8876e8a58ac41ffb360de43e
Gitweb:     https://git.kernel.org/tip/4d68c3d0ecd5fcba8876e8a58ac41ffb360de43e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:54 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 14:44:17 +0200

x86/cpu: Remove orig_ist array

All users gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.151435667@linutronix.de
---
 arch/x86/include/asm/processor.h | 9 ---------
 arch/x86/kernel/cpu/common.c     | 6 ------
 2 files changed, 15 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2bb3a648fc12..8fcfcd1a8375 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -374,16 +374,7 @@ DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
 #define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
 #endif
 
-/*
- * Save the original ist values for checking stack pointers during debugging
- */
-struct orig_ist {
-	unsigned long		ist[7];
-};
-
 #ifdef CONFIG_X86_64
-DECLARE_PER_CPU(struct orig_ist, orig_ist);
-
 union irq_stack_union {
 	char irq_stack[IRQ_STACK_SIZE];
 	/*
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4b01b71415f5..8243f198fb7f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,12 +1549,6 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-/*
- * Copies of the original ist values from the tss are only accessed during
- * debugging, no special alignment required.
- */
-DEFINE_PER_CPU(struct orig_ist, orig_ist);
-
 static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/exceptions: Disconnect IST index and stack order
  2019-04-14 15:59 ` [patch V3 19/32] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
@ 2019-04-17 14:14   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, cai, mingo, peterz, mingo, konrad.wilk, douly.fnst, luto,
	sean.j.christopherson, x86, nstange, linux, chang.seok.bae,
	kirill.shutemov, jpoimboe, linux-kernel, bp, keescook, jannh,
	bhe, hpa

Commit-ID:  3207426925d2b4da390be8068df1d1c2b36e5918
Gitweb:     https://git.kernel.org/tip/3207426925d2b4da390be8068df1d1c2b36e5918
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:55 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:01:09 +0200

x86/exceptions: Disconnect IST index and stack order

The entry order of the TSS.IST array and the order of the stack
storage/mapping are not required to be the same.

With the upcoming split of the debug stack this is going to fall apart as
the number of TSS.IST array entries stays the same while the actual stacks
are increasing.

Make them separate so that code like dumpstack can just utilize the mapping
order. The IST index is solely required for the actual TSS.IST array
initialization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.241588113@linutronix.de
---
 arch/x86/entry/entry_64.S             |  2 +-
 arch/x86/include/asm/cpu_entry_area.h | 11 +++++++++++
 arch/x86/include/asm/page_64_types.h  |  9 ++++-----
 arch/x86/include/asm/stacktrace.h     |  2 ++
 arch/x86/kernel/cpu/common.c          | 10 +++++-----
 arch/x86/kernel/idt.c                 |  8 ++++----
 6 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index fd0a50452cb3..5c0348504a4b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=ESTACK_DB
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 9b406f067ecf..310eeb62d418 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -35,6 +35,17 @@ struct cea_exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
+/*
+ * The exception stack ordering in [cea_]exception_stacks
+ */
+enum exception_stack_ordering {
+	ESTACK_DF,
+	ESTACK_NMI,
+	ESTACK_DB,
+	ESTACK_MCE,
+	N_EXCEPTION_STACKS
+};
+
 #define CEA_ESTACK_SIZE(st)					\
 	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
 
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6ab2c54c1bf9..056de887b220 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -27,11 +27,10 @@
 /*
  * The index for the tss.ist[] array. The hardware limit is 7 entries.
  */
-#define	ESTACK_DF		0
-#define	ESTACK_NMI		1
-#define	ESTACK_DB		2
-#define	ESTACK_MCE		3
-#define	N_EXCEPTION_STACKS	4
+#define	IST_INDEX_DF		0
+#define	IST_INDEX_NMI		1
+#define	IST_INDEX_DB		2
+#define	IST_INDEX_MCE		3
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index f335aad404a4..d6d758a187b6 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -9,6 +9,8 @@
 
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/switch_to.h>
 
 enum stack_type {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8243f198fb7f..143aceaf9a9a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1731,11 +1731,11 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!t->x86_tss.ist[0]) {
-		t->x86_tss.ist[ESTACK_DF] = __this_cpu_ist_top_va(DF);
-		t->x86_tss.ist[ESTACK_NMI] = __this_cpu_ist_top_va(NMI);
-		t->x86_tss.ist[ESTACK_DB] = __this_cpu_ist_top_va(DB);
-		t->x86_tss.ist[ESTACK_MCE] = __this_cpu_ist_top_va(MCE);
-		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[ESTACK_DB];
+		t->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[IST_INDEX_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2188f734ec61..6d8917875f44 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -183,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] __page_aligned_bss;
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		ESTACK_DB),
-	ISTG(X86_TRAP_NMI,	nmi,		ESTACK_NMI),
-	ISTG(X86_TRAP_DF,	double_fault,	ESTACK_DF),
+	ISTG(X86_TRAP_DB,	debug,		IST_INDEX_DB),
+	ISTG(X86_TRAP_NMI,	nmi,		IST_INDEX_NMI),
+	ISTG(X86_TRAP_DF,	double_fault,	IST_INDEX_DF),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	ESTACK_MCE),
+	ISTG(X86_TRAP_MC,	&machine_check,	IST_INDEX_MCE),
 #endif
 };
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/exceptions: Enable IST guard pages
  2019-04-14 15:59 ` [patch V3 20/32] x86/exceptions: Enable IST guard pages Thomas Gleixner
@ 2019-04-17 14:15   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sean.j.christopherson, mingo, x86, mingo, bp, linux-kernel, tglx,
	luto, hpa, jpoimboe

Commit-ID:  1bdb67e5aa2d5d43c48cb7d93393fcba276c9e71
Gitweb:     https://git.kernel.org/tip/1bdb67e5aa2d5d43c48cb7d93393fcba276c9e71
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:56 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:05:32 +0200

x86/exceptions: Enable IST guard pages

All usage sites which expected that the exception stacks in the CPU entry
area are mapped linearly are fixed up. Enable guard pages between the
IST stacks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.349862042@linutronix.de
---
 arch/x86/include/asm/cpu_entry_area.h | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 310eeb62d418..9c96406e6d2b 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -26,13 +26,9 @@ struct exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
-/*
- * The effective cpu entry area mapping with guard pages. Guard size is
- * zero until the code which makes assumptions about linear mappings is
- * cleaned up.
- */
+/* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(PAGE_SIZE)
 };
 
 /*

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/exceptions: Split debug IST stack
  2019-04-14 15:59 ` [patch V3 21/32] x86/exceptions: Split debug IST stack Thomas Gleixner
  2019-04-16 22:07   ` Sean Christopherson
@ 2019-04-17 14:15   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: yamada.masahiro, hpa, chang.seok.bae, cai, peterz, mingo,
	konrad.wilk, dave.hansen, bhe, x86, jroedel, jgross, linux, bp,
	linux-kernel, kirill.shutemov, corbet, tglx, luto, jpoimboe,
	sean.j.christopherson, mingo

Commit-ID:  2a594d4ccf3f10f80b77d71bd3dad10813ac0137
Gitweb:     https://git.kernel.org/tip/2a594d4ccf3f10f80b77d71bd3dad10813ac0137
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:57 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:14:28 +0200

x86/exceptions: Split debug IST stack

The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.

The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.

Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:

      --- top of DB_stack	<- Initial stack
      --- end of DB_stack
      	  guard page

      --- top of DB1_stack	<- Top of stack after entering first #DB
      --- end of DB1_stack
      	  guard page

      --- top of DB2_stack	<- Top of stack after entering second #DB
      --- end of DB2_stack
      	  guard page

If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.

The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.

 - Adjust the low level entry code so it adjusts top of #DB with the offset
   between the stacks instead of exception stack size.

 - Make the dumpstack code aware of the new stacks.

 - Adjust the in_debug_stack() implementation and move it into the NMI code
   where it belongs. As this is NMI hotpath code, it just checks the full
   area between top of DB_stack and bottom of DB1_stack without checking
   for the guard page. That's correct because the NMI cannot hit a
   stackpointer pointing to the guard page between DB and DB1 stack.  Even
   if it would, then the NMI operation still is unaffected, but the resume
   of the debug exception on the topmost DB stack will crash by touching
   the guard page.

  [ bp: Make exception_stack_names static const char * const ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
---
 Documentation/x86/kernel-stacks       |  7 ++++++-
 arch/x86/entry/entry_64.S             |  8 ++++----
 arch/x86/include/asm/cpu_entry_area.h | 14 ++++++++++----
 arch/x86/include/asm/debugreg.h       |  2 --
 arch/x86/include/asm/page_64_types.h  |  3 ---
 arch/x86/kernel/asm-offsets_64.c      |  2 ++
 arch/x86/kernel/cpu/common.c          | 11 -----------
 arch/x86/kernel/dumpstack_64.c        | 12 ++++++++----
 arch/x86/kernel/nmi.c                 | 20 +++++++++++++++++++-
 arch/x86/mm/cpu_entry_area.c          |  4 +++-
 10 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index 1b04596caea9..d1bfb0b95ee0 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* ESTACK_DB.  DEBUG_STKSZ
+* ESTACK_DB.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,6 +86,11 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
+  To handle nested #DB correctly there exist two instances of DB stacks. On
+  #DB entry the IST stackpointer for #DB is switched to the second instance
+  so a nested #DB starts from a clean stack. The nested #DB switches
+  the IST stackpointer to a guard hole to catch triple nesting.
+
 * ESTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5c0348504a4b..ee649f1f279e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -879,7 +879,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
  * @paranoid == 2 is special: the stub will never switch stacks.  This is for
  * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
  */
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 
@@ -925,13 +925,13 @@ ENTRY(\sym)
 	.endif
 
 	.if \shift_ist != -1
-	subq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
+	subq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
 	call	\do_sym
 
 	.if \shift_ist != -1
-	addq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
+	addq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
 	/* these procedures expect "no swapgs" flag in ebx */
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 9c96406e6d2b..cff3f3f3bfe0 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,25 +10,29 @@
 #ifdef CONFIG_X86_64
 
 /* Macro to enforce the same ordering and stack sizes */
-#define ESTACKS_MEMBERS(guardsize)		\
+#define ESTACKS_MEMBERS(guardsize, db2_holesize)\
 	char	DF_stack_guard[guardsize];	\
 	char	DF_stack[EXCEPTION_STKSZ];	\
 	char	NMI_stack_guard[guardsize];	\
 	char	NMI_stack[EXCEPTION_STKSZ];	\
+	char	DB2_stack_guard[guardsize];	\
+	char	DB2_stack[db2_holesize];	\
+	char	DB1_stack_guard[guardsize];	\
+	char	DB1_stack[EXCEPTION_STKSZ];	\
 	char	DB_stack_guard[guardsize];	\
-	char	DB_stack[DEBUG_STKSZ];		\
+	char	DB_stack[EXCEPTION_STKSZ];	\
 	char	MCE_stack_guard[guardsize];	\
 	char	MCE_stack[EXCEPTION_STKSZ];	\
 	char	IST_top_guard[guardsize];	\
 
 /* The exception stacks' physical storage. No guard pages required */
 struct exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(0, 0)
 };
 
 /* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(PAGE_SIZE)
+	ESTACKS_MEMBERS(PAGE_SIZE, EXCEPTION_STKSZ)
 };
 
 /*
@@ -37,6 +41,8 @@ struct cea_exception_stacks {
 enum exception_stack_ordering {
 	ESTACK_DF,
 	ESTACK_NMI,
+	ESTACK_DB2,
+	ESTACK_DB1,
 	ESTACK_DB,
 	ESTACK_MCE,
 	N_EXCEPTION_STACKS
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 9e5ca30738e5..1a8609a15856 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -104,11 +104,9 @@ static inline void debug_stack_usage_dec(void)
 {
 	__this_cpu_dec(debug_stack_usage);
 }
-int is_debug_stack(unsigned long addr);
 void debug_stack_set_zero(void);
 void debug_stack_reset(void);
 #else /* !X86_64 */
-static inline int is_debug_stack(unsigned long addr) { return 0; }
 static inline void debug_stack_set_zero(void) { }
 static inline void debug_stack_reset(void) { }
 static inline void debug_stack_usage_inc(void) { }
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 056de887b220..793c14c372cb 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -18,9 +18,6 @@
 #define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
 #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
 
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
-
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index ddced33184b5..f5281567e28e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -68,6 +68,8 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
+	DEFINE(DB_STACK_OFFSET, offsetof(struct cea_exception_stacks, DB_stack) -
+	       offsetof(struct cea_exception_stacks, DB1_stack));
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 143aceaf9a9a..88cab45707a9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,17 +1549,7 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
-
-int is_debug_stack(unsigned long addr)
-{
-	return __this_cpu_read(debug_stack_usage) ||
-		(addr <= __this_cpu_read(debug_stack_addr) &&
-		 addr > (__this_cpu_read(debug_stack_addr) - DEBUG_STKSZ));
-}
-NOKPROBE_SYMBOL(is_debug_stack);
-
 DEFINE_PER_CPU(u32, debug_idt_ctr);
 
 void debug_stack_set_zero(void)
@@ -1735,7 +1725,6 @@ void cpu_init(void)
 		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
 		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
 		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
-		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[IST_INDEX_DB];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f6fbd0438f9e..fca97bd3d8ae 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -19,16 +19,18 @@
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
-static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+static const char * const exception_stack_names[] = {
 		[ ESTACK_DF	]	= "#DF",
 		[ ESTACK_NMI	]	= "NMI",
+		[ ESTACK_DB2	]	= "#DB2",
+		[ ESTACK_DB1	]	= "#DB1",
 		[ ESTACK_DB	]	= "#DB",
 		[ ESTACK_MCE	]	= "#MC",
 };
 
 const char *stack_type_name(enum stack_type type)
 {
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	if (type == STACK_TYPE_IRQ)
 		return "IRQ";
@@ -58,9 +60,11 @@ struct estack_layout {
 	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
 	}
 
-static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+static const struct estack_layout layout[] = {
 	[ ESTACK_DF	]	= ESTACK_ENTRY(DF),
 	[ ESTACK_NMI	]	= ESTACK_ENTRY(NMI),
+	[ ESTACK_DB2	]	= { .begin = 0, .end = 0},
+	[ ESTACK_DB1	]	= ESTACK_ENTRY(DB1),
 	[ ESTACK_DB	]	= ESTACK_ENTRY(DB),
 	[ ESTACK_MCE	]	= ESTACK_ENTRY(MCE),
 };
@@ -71,7 +75,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 	struct pt_regs *regs;
 	unsigned int k;
 
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 18bc9b51ac9b..3755d0310026 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -21,13 +21,14 @@
 #include <linux/ratelimit.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/atomic.h>
 #include <linux/sched/clock.h>
 
 #if defined(CONFIG_EDAC)
 #include <linux/edac.h>
 #endif
 
-#include <linux/atomic.h>
+#include <asm/cpu_entry_area.h>
 #include <asm/traps.h>
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
@@ -487,6 +488,23 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2);
  * switch back to the original IDT.
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
+
+static bool notrace is_debug_stack(unsigned long addr)
+{
+	struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
+	unsigned long top = CEA_ESTACK_TOP(cs, DB);
+	unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
+
+	if (__this_cpu_read(debug_stack_usage))
+		return true;
+	/*
+	 * Note, this covers the guard page between DB and DB1 as well to
+	 * avoid two checks. But by all means @addr can never point into
+	 * the guard page.
+	 */
+	return addr >= bot && addr < top;
+}
+NOKPROBE_SYMBOL(is_debug_stack);
 #endif
 
 dotraplinkage notrace void
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index a00d0d059c8a..752ad11d6868 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -98,10 +98,12 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
 
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
-	 * by guard pages so each stack must be mapped separately.
+	 * by guard pages so each stack must be mapped separately. DB2 is
+	 * not mapped; it just exists to catch triple nesting of #DB.
 	 */
 	cea_map_stack(DF);
 	cea_map_stack(NMI);
+	cea_map_stack(DB1);
 	cea_map_stack(DB);
 	cea_map_stack(MCE);
 }

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-14 15:59 ` [patch V3 22/32] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
@ 2019-04-17 14:16   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: x86, sean.j.christopherson, tglx, hpa, luto, linux-kernel, mingo,
	jpoimboe, mingo, bp

Commit-ID:  c450c8f532b63475b30e29bc600c25ab0a4ab282
Gitweb:     https://git.kernel.org/tip/c450c8f532b63475b30e29bc600c25ab0a4ab282
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:58 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:16:57 +0200

x86/dumpstack/64: Speedup in_exception_stack()

The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.543320386@linutronix.de
---
 arch/x86/kernel/dumpstack_64.c | 82 ++++++++++++++++++++++++++----------------
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index fca97bd3d8ae..f356d3ea0c70 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -50,52 +50,72 @@ const char *stack_type_name(enum stack_type type)
 	return NULL;
 }
 
-struct estack_layout {
-	unsigned int	begin;
-	unsigned int	end;
+/**
+ * struct estack_pages - Page descriptor for exception stacks
+ * @offs:	Offset from the start of the exception stack area
+ * @size:	Size of the exception stack
+ * @type:	Type to store in the stack_info struct
+ */
+struct estack_pages {
+	u32	offs;
+	u16	size;
+	u16	type;
 };
 
-#define	ESTACK_ENTRY(x)	{						  \
-	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
-	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
-	}
+#define EPAGERANGE(st)							\
+	[PFN_DOWN(CEA_ESTACK_OFFS(st)) ...				\
+	 PFN_DOWN(CEA_ESTACK_OFFS(st) + CEA_ESTACK_SIZE(st) - 1)] = {	\
+		.offs	= CEA_ESTACK_OFFS(st),				\
+		.size	= CEA_ESTACK_SIZE(st),				\
+		.type	= STACK_TYPE_EXCEPTION + ESTACK_ ##st, }
 
-static const struct estack_layout layout[] = {
-	[ ESTACK_DF	]	= ESTACK_ENTRY(DF),
-	[ ESTACK_NMI	]	= ESTACK_ENTRY(NMI),
-	[ ESTACK_DB2	]	= { .begin = 0, .end = 0},
-	[ ESTACK_DB1	]	= ESTACK_ENTRY(DB1),
-	[ ESTACK_DB	]	= ESTACK_ENTRY(DB),
-	[ ESTACK_MCE	]	= ESTACK_ENTRY(MCE),
+/*
+ * Array of exception stack page descriptors. If the stack is larger than
+ * PAGE_SIZE, all pages covering a particular stack will have the same
+ * info. The guard pages including the not mapped DB2 stack are zeroed
+ * out.
+ */
+static const
+struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
+	EPAGERANGE(DF),
+	EPAGERANGE(NMI),
+	EPAGERANGE(DB1),
+	EPAGERANGE(DB),
+	EPAGERANGE(MCE),
 };
 
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long estacks, begin, end, stk = (unsigned long)stack;
+	unsigned long begin, end, stk = (unsigned long)stack;
+	const struct estack_pages *ep;
 	struct pt_regs *regs;
 	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
-	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
-
-	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		begin = estacks + layout[k].begin;
-		end   = estacks + layout[k].end;
-		regs  = (struct pt_regs *)end - 1;
+	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	end = begin + sizeof(struct cea_exception_stacks);
+	/* Bail if @stack is outside the exception stack area. */
+	if (stk < begin || stk >= end)
+		return false;
 
-		if (stk < begin || stk >= end)
-			continue;
+	/* Calc page offset from start of exception stacks */
+	k = (stk - begin) >> PAGE_SHIFT;
+	/* Lookup the page descriptor */
+	ep = &estack_pages[k];
+	/* Guard page? */
+	if (!ep->size)
+		return false;
 
-		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= (unsigned long *)begin;
-		info->end	= (unsigned long *)end;
-		info->next_sp	= (unsigned long *)regs->sp;
+	begin += (unsigned long)ep->offs;
+	end = begin + (unsigned long)ep->size;
+	regs = (struct pt_regs *)end - 1;
 
-		return true;
-	}
-
-	return false;
+	info->type	= ep->type;
+	info->begin	= (unsigned long *)begin;
+	info->end	= (unsigned long *)end;
+	info->next_sp	= (unsigned long *)regs->sp;
+	return true;
 }
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/32: Define IRQ_STACK_SIZE
  2019-04-14 15:59 ` [patch V3 23/32] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
@ 2019-04-17 14:17   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, akpm, bp, tglx, mingo, mhocko, jpoimboe, vbabka, luto,
	sean.j.christopherson, dave.hansen, x86, hpa, jkosina,
	linux-kernel, suravee.suthikulpanit, ndesaulniers, jgross

Commit-ID:  aa641c287b2f7676f6f0064a8351daf08eca6b0a
Gitweb:     https://git.kernel.org/tip/aa641c287b2f7676f6f0064a8351daf08eca6b0a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 17:59:59 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:18:36 +0200

x86/irq/32: Define IRQ_STACK_SIZE

On 32-bit IRQ_STACK_SIZE is the same as THREAD_SIZE.

To allow sharing struct irq_stack with 32-bit, define IRQ_STACK_SIZE for
32-bit and use it for struct irq_stack.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.632513987@linutronix.de
---
 arch/x86/include/asm/page_32_types.h | 2 ++
 arch/x86/include/asm/processor.h     | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
index 57b5dc15ca7e..565ad755c785 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,6 +22,8 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
+#define IRQ_STACK_SIZE		THREAD_SIZE
+
 #define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8fcfcd1a8375..8b4bf732e1b5 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -422,8 +422,8 @@ DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
  * per-CPU IRQ handling stacks
  */
 struct irq_stack {
-	u32                     stack[THREAD_SIZE/sizeof(u32)];
-} __aligned(THREAD_SIZE);
+	u32			stack[IRQ_STACK_SIZE / sizeof(u32)];
+} __aligned(IRQ_STACK_SIZE);
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/32: Make irq stack a character array
  2019-04-14 16:00 ` [patch V3 24/32] x86/irq/32: Make irq stack a character array Thomas Gleixner
@ 2019-04-17 14:18   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, akpm, luto, bp, hpa, puwen, mingo, x86, mingo, jkosina,
	ndesaulniers, linux-kernel, vbabka, kernelfans, jpoimboe,
	sean.j.christopherson, adobriyan

Commit-ID:  231c4846b106d526fa212b02b37447d3f2fcc99d
Gitweb:     https://git.kernel.org/tip/231c4846b106d526fa212b02b37447d3f2fcc99d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:00 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:21:21 +0200

x86/irq/32: Make irq stack a character array

There is no reason to have an u32 array in struct irq_stack. The only
purpose of the array is to size the struct properly.

Preparatory change for sharing struct irq_stack with 64-bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.736241969@linutronix.de
---
 arch/x86/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8b4bf732e1b5..ff3f469ab2d4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -422,7 +422,7 @@ DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
  * per-CPU IRQ handling stacks
  */
 struct irq_stack {
-	u32			stack[IRQ_STACK_SIZE / sizeof(u32)];
+	char			stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr
  2019-04-14 16:00 ` [patch V3 25/32] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
@ 2019-04-17 14:18   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, bp, dave.hansen, vbabka, linux-kernel, jkosina,
	mingo, ndesaulniers, nstange, tglx, jgross, jpoimboe, x86,
	sean.j.christopherson, akpm, luto

Commit-ID:  a754fe2b76d1d6bce7069657bba975034f3ad961
Gitweb:     https://git.kernel.org/tip/a754fe2b76d1d6bce7069657bba975034f3ad961
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:01 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:24:18 +0200

x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr

The percpu storage holds a pointer to the stack not the stack
itself. Rename it before sharing struct irq_stack with 64-bit.

No functional changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.824805922@linutronix.de
---
 arch/x86/include/asm/processor.h |  4 ++--
 arch/x86/kernel/dumpstack_32.c   |  4 ++--
 arch/x86/kernel/irq_32.c         | 19 ++++++++++---------
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ff3f469ab2d4..0b7d866a1ad5 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -425,8 +425,8 @@ struct irq_stack {
 	char			stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
-DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
-DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
+DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
 extern unsigned int fpu_kernel_xstate_size;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index d305440ebe9c..64a59d726639 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -34,7 +34,7 @@ const char *stack_type_name(enum stack_type type)
 
 static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
@@ -59,7 +59,7 @@ static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 
 static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index 95600a99ae93..f37489c806fa 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -51,8 +51,8 @@ static inline int check_stack_overflow(void) { return 0; }
 static inline void print_stack_overflow(void) { }
 #endif
 
-DEFINE_PER_CPU(struct irq_stack *, hardirq_stack);
-DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
+DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+DEFINE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 
 static void call_on_stack(void *func, void *stack)
 {
@@ -76,7 +76,7 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc)
 	u32 *isp, *prev_esp, arg1;
 
 	curstk = (struct irq_stack *) current_stack();
-	irqstk = __this_cpu_read(hardirq_stack);
+	irqstk = __this_cpu_read(hardirq_stack_ptr);
 
 	/*
 	 * this is where we switch to the IRQ stack. However, if we are
@@ -113,21 +113,22 @@ void irq_ctx_init(int cpu)
 {
 	struct irq_stack *irqstk;
 
-	if (per_cpu(hardirq_stack, cpu))
+	if (per_cpu(hardirq_stack_ptr, cpu))
 		return;
 
 	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
 					       THREADINFO_GFP,
 					       THREAD_SIZE_ORDER));
-	per_cpu(hardirq_stack, cpu) = irqstk;
+	per_cpu(hardirq_stack_ptr, cpu) = irqstk;
 
 	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
 					       THREADINFO_GFP,
 					       THREAD_SIZE_ORDER));
-	per_cpu(softirq_stack, cpu) = irqstk;
+	per_cpu(softirq_stack_ptr, cpu) = irqstk;
 
-	printk(KERN_DEBUG "CPU %u irqstacks, hard=%p soft=%p\n",
-	       cpu, per_cpu(hardirq_stack, cpu),  per_cpu(softirq_stack, cpu));
+	pr_debug("CPU %u irqstacks, hard=%p soft=%p\n",
+		 cpu, per_cpu(hardirq_stack_ptr, cpu),
+		 per_cpu(softirq_stack_ptr, cpu));
 }
 
 void do_softirq_own_stack(void)
@@ -135,7 +136,7 @@ void do_softirq_own_stack(void)
 	struct irq_stack *irqstk;
 	u32 *isp, *prev_esp;
 
-	irqstk = __this_cpu_read(softirq_stack);
+	irqstk = __this_cpu_read(softirq_stack_ptr);
 
 	/* build the stack frame on the softirq stack */
 	isp = (u32 *) ((char *)irqstk + sizeof(*irqstk));

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr
  2019-04-14 16:00 ` [patch V3 26/32] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
@ 2019-04-17 14:19   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: chang.seok.bae, nstange, kernelfans, mhocko, linux, konrad.wilk,
	jpoimboe, ndesaulniers, mingo, adobriyan, bp, x86, rppt, sfr,
	peterz, linux-kernel, hpa, akpm, mingo, tglx, luto, jkosina,
	vbabka, sean.j.christopherson

Commit-ID:  758a2e312228410f2f5092ade558109e93dc3ee8
Gitweb:     https://git.kernel.org/tip/758a2e312228410f2f5092ade558109e93dc3ee8
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:02 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:27:10 +0200

x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr

Preparatory patch to share code with 32bit.

No functional changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.912584074@linutronix.de
---
 arch/x86/entry/entry_64.S        | 2 +-
 arch/x86/include/asm/processor.h | 2 +-
 arch/x86/kernel/cpu/common.c     | 2 +-
 arch/x86/kernel/dumpstack_64.c   | 2 +-
 arch/x86/kernel/irq_64.c         | 2 +-
 arch/x86/kernel/setup_percpu.c   | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ee649f1f279e..726abbe6c6d8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -431,7 +431,7 @@ END(irq_entries_start)
 	 */
 
 	movq	\old_rsp, PER_CPU_VAR(irq_stack_union + IRQ_STACK_SIZE - 8)
-	movq	PER_CPU_VAR(irq_stack_ptr), %rsp
+	movq	PER_CPU_VAR(hardirq_stack_ptr), %rsp
 
 #ifdef CONFIG_DEBUG_ENTRY
 	/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0b7d866a1ad5..5e3dd4e2136d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -396,7 +396,7 @@ static inline unsigned long cpu_kernelmode_gs_base(int cpu)
 	return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
 }
 
-DECLARE_PER_CPU(char *, irq_stack_ptr);
+DECLARE_PER_CPU(char *, hardirq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 88cab45707a9..13ec72bb8f36 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1510,7 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, irq_stack_ptr) =
+DEFINE_PER_CPU(char *, hardirq_stack_ptr) =
 	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f356d3ea0c70..753b8cfe8b8a 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -120,7 +120,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
+	unsigned long *end   = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
 
 	/*
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 7eb6f8d11bfd..f0c7356c8969 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -56,7 +56,7 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 	    regs->sp <= curbase + THREAD_SIZE)
 		return;
 
-	irq_stack_top = (u64)__this_cpu_read(irq_stack_ptr);
+	irq_stack_top = (u64)__this_cpu_read(hardirq_stack_ptr);
 	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
 	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
 		return;
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 4bf46575568a..657343ecc2da 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -245,7 +245,7 @@ void __init setup_per_cpu_areas(void)
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
 #ifdef CONFIG_X86_64
-		per_cpu(irq_stack_ptr, cpu) =
+		per_cpu(hardirq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
 			IRQ_STACK_SIZE;
 #endif

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/32: Invoke irq_ctx_init() from init_IRQ()
  2019-04-14 16:00 ` [patch V3 27/32] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
@ 2019-04-17 14:20   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rppt, linux-kernel, jpoimboe, sean.j.christopherson, nstange,
	jgross, j.abraham1776, mingo, hpa, boris.ostrovsky, tglx, bp,
	sfr, mingo, luto, sstabellini, x86

Commit-ID:  451f743a64e1cf979f5fe21a1b2a015feb559f72
Gitweb:     https://git.kernel.org/tip/451f743a64e1cf979f5fe21a1b2a015feb559f72
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:03 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:30:12 +0200

x86/irq/32: Invoke irq_ctx_init() from init_IRQ()

irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
code. There is no reason to have this split. The interrupt stacks must be
allocated no matter what.

Invoke it from init_IRQ() before invoking the native or XEN init
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Abraham <j.abraham1776@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20190414160146.001162606@linutronix.de
---
 arch/x86/kernel/irqinit.c        | 4 ++--
 drivers/xen/events/events_base.c | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index a0693b71cfc1..26b5cb5386b9 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -91,6 +91,8 @@ void __init init_IRQ(void)
 	for (i = 0; i < nr_legacy_irqs(); i++)
 		per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
 
+	irq_ctx_init(smp_processor_id());
+
 	x86_init.irqs.intr_init();
 }
 
@@ -104,6 +106,4 @@ void __init native_init_IRQ(void)
 
 	if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs())
 		setup_irq(2, &irq2);
-
-	irq_ctx_init(smp_processor_id());
 }
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 117e76b2f939..084e45882c73 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1687,7 +1687,6 @@ void __init xen_init_IRQ(void)
 
 #ifdef CONFIG_X86
 	if (xen_pv_domain()) {
-		irq_ctx_init(smp_processor_id());
 		if (xen_initial_domain())
 			pci_xen_initial_domain();
 	}

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/32: Handle irq stack allocation failure proper
  2019-04-14 16:00 ` [patch V3 28/32] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
@ 2019-04-17 14:20   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jgross, nstange, puwen, akpm, mingo, suravee.suthikulpanit,
	yazen.ghannam, anshuman.khandual, mingo, zhangshaokun,
	zhenzhong.duan, linux-kernel, sstabellini, wang.yi59, bp,
	konrad.wilk, tglx, boris.ostrovsky, alison.schofield, jpoimboe,
	x86, luto, hpa, sean.j.christopherson

Commit-ID:  66c7ceb47f628c8bd4f84a6d01c2725ded6a342d
Gitweb:     https://git.kernel.org/tip/66c7ceb47f628c8bd4f84a6d01c2725ded6a342d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:04 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:31:42 +0200

x86/irq/32: Handle irq stack allocation failure proper

irq_ctx_init() crashes hard on page allocation failures. While that's ok
during early boot, it's just wrong in the CPU hotplug bringup code.

Check the page allocation failure and return -ENOMEM and handle it at the
call sites. On early boot the only way out is to BUG(), but on CPU hotplug
there is no reason to crash, so just abort the operation.

Rename the function to something more sensible while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/20190414160146.089060584@linutronix.de
---
 arch/x86/include/asm/irq.h |  4 ++--
 arch/x86/include/asm/smp.h |  2 +-
 arch/x86/kernel/irq_32.c   | 32 ++++++++++++++++----------------
 arch/x86/kernel/irqinit.c  |  2 +-
 arch/x86/kernel/smpboot.c  | 15 ++++++++++++---
 arch/x86/xen/smp_pv.c      |  4 +++-
 6 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index fbb16e6b6c18..d751e8440a6b 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -17,9 +17,9 @@ static inline int irq_canonicalize(int irq)
 }
 
 #ifdef CONFIG_X86_32
-extern void irq_ctx_init(int cpu);
+extern int irq_init_percpu_irqstack(unsigned int cpu);
 #else
-# define irq_ctx_init(cpu) do { } while (0)
+static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
 #endif
 
 #define __ARCH_HAS_DO_SOFTIRQ
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 2e95b6c1bca3..da545df207b2 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -131,7 +131,7 @@ void native_smp_prepare_boot_cpu(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void calculate_max_logical_packages(void);
 void native_smp_cpus_done(unsigned int max_cpus);
-void common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
+int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_disable(void);
 int common_cpu_die(unsigned int cpu);
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index f37489c806fa..fc34816c6f04 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -107,28 +107,28 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc)
 }
 
 /*
- * allocate per-cpu stacks for hardirq and for softirq processing
+ * Allocate per-cpu stacks for hardirq and softirq processing
  */
-void irq_ctx_init(int cpu)
+int irq_init_percpu_irqstack(unsigned int cpu)
 {
-	struct irq_stack *irqstk;
+	int node = cpu_to_node(cpu);
+	struct page *ph, *ps;
 
 	if (per_cpu(hardirq_stack_ptr, cpu))
-		return;
-
-	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
-					       THREADINFO_GFP,
-					       THREAD_SIZE_ORDER));
-	per_cpu(hardirq_stack_ptr, cpu) = irqstk;
+		return 0;
 
-	irqstk = page_address(alloc_pages_node(cpu_to_node(cpu),
-					       THREADINFO_GFP,
-					       THREAD_SIZE_ORDER));
-	per_cpu(softirq_stack_ptr, cpu) = irqstk;
+	ph = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
+	if (!ph)
+		return -ENOMEM;
+	ps = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
+	if (!ps) {
+		__free_pages(ph, THREAD_SIZE_ORDER);
+		return -ENOMEM;
+	}
 
-	pr_debug("CPU %u irqstacks, hard=%p soft=%p\n",
-		 cpu, per_cpu(hardirq_stack_ptr, cpu),
-		 per_cpu(softirq_stack_ptr, cpu));
+	per_cpu(hardirq_stack_ptr, cpu) = page_address(ph);
+	per_cpu(softirq_stack_ptr, cpu) = page_address(ps);
+	return 0;
 }
 
 void do_softirq_own_stack(void)
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 26b5cb5386b9..16919a9671fa 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -91,7 +91,7 @@ void __init init_IRQ(void)
 	for (i = 0; i < nr_legacy_irqs(); i++)
 		per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = irq_to_desc(i);
 
-	irq_ctx_init(smp_processor_id());
+	BUG_ON(irq_init_percpu_irqstack(smp_processor_id()));
 
 	x86_init.irqs.intr_init();
 }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ce1a67b70168..c92b21f9e9dc 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -935,20 +935,27 @@ out:
 	return boot_error;
 }
 
-void common_cpu_up(unsigned int cpu, struct task_struct *idle)
+int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 {
+	int ret;
+
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
 
 	per_cpu(current_task, cpu) = idle;
 
+	/* Initialize the interrupt stack(s) */
+	ret = irq_init_percpu_irqstack(cpu);
+	if (ret)
+		return ret;
+
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
-	irq_ctx_init(cpu);
 	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #else
 	initial_gs = per_cpu_offset(cpu);
 #endif
+	return 0;
 }
 
 /*
@@ -1106,7 +1113,9 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	/* the FPU context is blank, nobody can own it */
 	per_cpu(fpu_fpregs_owner_ctx, cpu) = NULL;
 
-	common_cpu_up(cpu, tidle);
+	err = common_cpu_up(cpu, tidle);
+	if (err)
+		return err;
 
 	err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
 	if (err) {
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 145506f9fdbe..590fcf863006 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -361,7 +361,9 @@ static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)
 {
 	int rc;
 
-	common_cpu_up(cpu, idle);
+	rc = common_cpu_up(cpu, idle);
+	if (rc)
+		return rc;
 
 	xen_setup_runstate_info(cpu);
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Init hardirq_stack_ptr during CPU hotplug
  2019-04-14 16:00 ` [patch V3 29/32] x86/irq/64: Init hardirq_stack_ptr during CPU hotplug Thomas Gleixner
@ 2019-04-17 14:21   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, pasha.tatashin, luto, tglx, mingo, chang.seok.bae,
	sean.j.christopherson, konrad.wilk, nstange, bp, jpoimboe, mingo,
	x86, peterz, wang.yi59, linux-kernel, linux

Commit-ID:  0ac26104208450d35c4e68754ce0c67b3a4d7802
Gitweb:     https://git.kernel.org/tip/0ac26104208450d35c4e68754ce0c67b3a4d7802
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:05 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:34:21 +0200

x86/irq/64: Init hardirq_stack_ptr during CPU hotplug

Preparatory change for disentangling the irq stack union as a
prerequisite for irq stacks with guard pages.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lkml.kernel.org/r/20190414160146.177558566@linutronix.de
---
 arch/x86/include/asm/irq.h   |  4 ----
 arch/x86/kernel/cpu/common.c |  4 +---
 arch/x86/kernel/irq_64.c     | 15 +++++++++++++++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index d751e8440a6b..8f95686ec27e 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -16,11 +16,7 @@ static inline int irq_canonicalize(int irq)
 	return ((irq == 2) ? 9 : irq);
 }
 
-#ifdef CONFIG_X86_32
 extern int irq_init_percpu_irqstack(unsigned int cpu);
-#else
-static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
-#endif
 
 #define __ARCH_HAS_DO_SOFTIRQ
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 13ec72bb8f36..1222080838da 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1510,9 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, hardirq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
-
+DEFINE_PER_CPU(char *, hardirq_stack_ptr);
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index f0c7356c8969..c0bea0d7d76a 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -87,3 +87,18 @@ bool handle_irq(struct irq_desc *desc, struct pt_regs *regs)
 	generic_handle_irq_desc(desc);
 	return true;
 }
+
+static int map_irq_stack(unsigned int cpu)
+{
+	void *va = per_cpu_ptr(irq_stack_union.irq_stack, cpu);
+
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	return 0;
+}
+
+int irq_init_percpu_irqstack(unsigned int cpu)
+{
+	if (per_cpu(hardirq_stack_ptr, cpu))
+		return 0;
+	return map_irq_stack(cpu);
+}

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Split the IRQ stack into its own pages
  2019-04-14 16:00 ` [patch V3 30/32] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
@ 2019-04-17 14:22   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-04-17 14:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mhocko, brijesh.singh, jroedel, maran.wilson, bp, tglx,
	sstabellini, luto, ard.biesheuvel, mingo, feng.tang, hpa, x86,
	JBeulich, sean.j.christopherson, jgross, nstange, rppt,
	linux-kernel, jpoimboe, konrad.wilk, chang.seok.bae,
	boris.ostrovsky, puwen, jkosina, mingo, mail, peterz, vbabka,
	rafael, ndesaulniers, yamada.masahiro, adobriyan, akpm, linux

Commit-ID:  e6401c13093173aad709a5c6de00cf8d692ee786
Gitweb:     https://git.kernel.org/tip/e6401c13093173aad709a5c6de00cf8d692ee786
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 14 Apr 2019 18:00:06 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:37:02 +0200

x86/irq/64: Split the IRQ stack into its own pages

Currently, the IRQ stack is hardcoded as the first page of the percpu
area, and the stack canary lives on the IRQ stack. The former gets in
the way of adding an IRQ stack guard page, and the latter is a potential
weakness in the stack canary mechanism.

Split the IRQ stack into its own private percpu pages.

[ tglx: Make 64 and 32 bit share struct irq_stack ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Maran Wilson <maran.wilson@oracle.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: "Rafael Ávila de Espíndola" <rafael@espindo.la>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20190414160146.267376656@linutronix.de
---
 arch/x86/entry/entry_64.S             |  4 ++--
 arch/x86/include/asm/processor.h      | 32 ++++++++++++++------------------
 arch/x86/include/asm/stackprotector.h |  6 +++---
 arch/x86/kernel/asm-offsets_64.c      |  2 +-
 arch/x86/kernel/cpu/common.c          |  8 ++++----
 arch/x86/kernel/head_64.S             |  2 +-
 arch/x86/kernel/irq_64.c              |  5 ++++-
 arch/x86/kernel/setup_percpu.c        |  5 -----
 arch/x86/kernel/vmlinux.lds.S         |  7 ++++---
 arch/x86/tools/relocs.c               |  2 +-
 arch/x86/xen/xen-head.S               | 10 +++++-----
 11 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 726abbe6c6d8..cfe4d6ea258d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -298,7 +298,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(fixed_percpu_data) + stack_canary_offset
 #endif
 
 #ifdef CONFIG_RETPOLINE
@@ -430,7 +430,7 @@ END(irq_entries_start)
 	 * it before we actually move ourselves to the IRQ stack.
 	 */
 
-	movq	\old_rsp, PER_CPU_VAR(irq_stack_union + IRQ_STACK_SIZE - 8)
+	movq	\old_rsp, PER_CPU_VAR(irq_stack_backing_store + IRQ_STACK_SIZE - 8)
 	movq	PER_CPU_VAR(hardirq_stack_ptr), %rsp
 
 #ifdef CONFIG_DEBUG_ENTRY
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 5e3dd4e2136d..7e99ef67bff0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -367,6 +367,13 @@ DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
 #define __KERNEL_TSS_LIMIT	\
 	(IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
 
+/* Per CPU interrupt stacks */
+struct irq_stack {
+	char		stack[IRQ_STACK_SIZE];
+} __aligned(IRQ_STACK_SIZE);
+
+DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+
 #ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
 #else
@@ -375,28 +382,24 @@ DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
 #endif
 
 #ifdef CONFIG_X86_64
-union irq_stack_union {
-	char irq_stack[IRQ_STACK_SIZE];
+struct fixed_percpu_data {
 	/*
 	 * GCC hardcodes the stack canary as %gs:40.  Since the
 	 * irq_stack is the object at %gs:0, we reserve the bottom
 	 * 48 bytes of the irq stack for the canary.
 	 */
-	struct {
-		char gs_base[40];
-		unsigned long stack_canary;
-	};
+	char		gs_base[40];
+	unsigned long	stack_canary;
 };
 
-DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
-DECLARE_INIT_PER_CPU(irq_stack_union);
+DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible;
+DECLARE_INIT_PER_CPU(fixed_percpu_data);
 
 static inline unsigned long cpu_kernelmode_gs_base(int cpu)
 {
-	return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
+	return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu);
 }
 
-DECLARE_PER_CPU(char *, hardirq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 
@@ -418,14 +421,7 @@ struct stack_canary {
 };
 DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
-/*
- * per-CPU IRQ handling stacks
- */
-struct irq_stack {
-	char			stack[IRQ_STACK_SIZE];
-} __aligned(IRQ_STACK_SIZE);
-
-DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
+/* Per CPU softirq stack pointer */
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 8ec97a62c245..91e29b6a86a5 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -13,7 +13,7 @@
  * On x86_64, %gs is shared by percpu area and stack canary.  All
  * percpu symbols are zero based and %gs points to the base of percpu
  * area.  The first occupant of the percpu area is always
- * irq_stack_union which contains stack_canary at offset 40.  Userland
+ * fixed_percpu_data which contains stack_canary at offset 40.  Userland
  * %gs is always saved and restored on kernel entry and exit using
  * swapgs, so stack protector doesn't add any complexity there.
  *
@@ -64,7 +64,7 @@ static __always_inline void boot_init_stack_canary(void)
 	u64 tsc;
 
 #ifdef CONFIG_X86_64
-	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
+	BUILD_BUG_ON(offsetof(struct fixed_percpu_data, stack_canary) != 40);
 #endif
 	/*
 	 * We both use the random pool and the current TSC as a source
@@ -79,7 +79,7 @@ static __always_inline void boot_init_stack_canary(void)
 
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
-	this_cpu_write(irq_stack_union.stack_canary, canary);
+	this_cpu_write(fixed_percpu_data.stack_canary, canary);
 #else
 	this_cpu_write(stack_canary.canary, canary);
 #endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index f5281567e28e..d3d075226c0a 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -73,7 +73,7 @@ int main(void)
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
-	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+	DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_canary));
 	BLANK();
 #endif
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1222080838da..801c6f040faa 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1498,9 +1498,9 @@ static __init int setup_clearcpuid(char *arg)
 __setup("clearcpuid=", setup_clearcpuid);
 
 #ifdef CONFIG_X86_64
-DEFINE_PER_CPU_FIRST(union irq_stack_union,
-		     irq_stack_union) __aligned(PAGE_SIZE) __visible;
-EXPORT_PER_CPU_SYMBOL_GPL(irq_stack_union);
+DEFINE_PER_CPU_FIRST(struct fixed_percpu_data,
+		     fixed_percpu_data) __aligned(PAGE_SIZE) __visible;
+EXPORT_PER_CPU_SYMBOL_GPL(fixed_percpu_data);
 
 /*
  * The following percpu variables are hot.  Align current_task to
@@ -1510,7 +1510,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 	&init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(char *, hardirq_stack_ptr);
+DEFINE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d1dbe8e4eb82..bcd206c8ac90 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -265,7 +265,7 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
-	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+	.quad	INIT_PER_CPU_VAR(fixed_percpu_data)
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index c0bea0d7d76a..c0f89d136b80 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -23,6 +23,9 @@
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
+DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
+DECLARE_INIT_PER_CPU(irq_stack_backing_store);
+
 int sysctl_panic_on_stackoverflow;
 
 /*
@@ -90,7 +93,7 @@ bool handle_irq(struct irq_desc *desc, struct pt_regs *regs)
 
 static int map_irq_stack(unsigned int cpu)
 {
-	void *va = per_cpu_ptr(irq_stack_union.irq_stack, cpu);
+	void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
 
 	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
 	return 0;
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 657343ecc2da..86663874ef04 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -244,11 +244,6 @@ void __init setup_per_cpu_areas(void)
 		per_cpu(x86_cpu_to_logical_apicid, cpu) =
 			early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
 #endif
-#ifdef CONFIG_X86_64
-		per_cpu(hardirq_stack_ptr, cpu) =
-			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_STACK_SIZE;
-#endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
 			early_per_cpu_map(x86_cpu_to_node_map, cpu);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index bad8c51fee6e..a5af9a7c4be4 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -403,7 +403,8 @@ SECTIONS
  */
 #define INIT_PER_CPU(x) init_per_cpu__##x = ABSOLUTE(x) + __per_cpu_load
 INIT_PER_CPU(gdt_page);
-INIT_PER_CPU(irq_stack_union);
+INIT_PER_CPU(fixed_percpu_data);
+INIT_PER_CPU(irq_stack_backing_store);
 
 /*
  * Build-time check on the image size:
@@ -412,8 +413,8 @@ INIT_PER_CPU(irq_stack_union);
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
 #ifdef CONFIG_SMP
-. = ASSERT((irq_stack_union == 0),
-           "irq_stack_union is not at start of per-cpu area");
+. = ASSERT((fixed_percpu_data == 0),
+           "fixed_percpu_data is not at start of per-cpu area");
 #endif
 
 #endif /* CONFIG_X86_32 */
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index b629f6992d9f..efa483205e43 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -738,7 +738,7 @@ static void percpu_init(void)
  *	__per_cpu_load
  *
  * The "gold" linker incorrectly associates:
- *	init_per_cpu__irq_stack_union
+ *	init_per_cpu__fixed_percpu_data
  *	init_per_cpu__gdt_page
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..c1d8b90aa4e2 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -40,13 +40,13 @@ ENTRY(startup_xen)
 #ifdef CONFIG_X86_64
 	/* Set up %gs.
 	 *
-	 * The base of %gs always points to the bottom of the irqstack
-	 * union.  If the stack protector canary is enabled, it is
-	 * located at %gs:40.  Note that, on SMP, the boot cpu uses
-	 * init data section till per cpu areas are set up.
+	 * The base of %gs always points to fixed_percpu_data.  If the
+	 * stack protector canary is enabled, it is located at %gs:40.
+	 * Note that, on SMP, the boot cpu uses init data section until
+	 * the per cpu areas are set up.
 	 */
 	movl	$MSR_GS_BASE,%ecx
-	movq	$INIT_PER_CPU_VAR(irq_stack_union),%rax
+	movq	$INIT_PER_CPU_VAR(fixed_percpu_data),%rax
 	cdq
 	wrmsr
 #endif

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Remap the IRQ stack with guard pages
  2019-04-14 16:00 ` [patch V3 31/32] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
@ 2019-04-17 14:22   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-04-17 14:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, hpa, mingo, luto, nstange, mingo, x86, sean.j.christopherson,
	tglx, linux-kernel, jpoimboe

Commit-ID:  18b7a6bef62de1d598fbff23b52114b7775ecf00
Gitweb:     https://git.kernel.org/tip/18b7a6bef62de1d598fbff23b52114b7775ecf00
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 14 Apr 2019 18:00:07 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:40:57 +0200

x86/irq/64: Remap the IRQ stack with guard pages

The IRQ stack lives in percpu space, so an IRQ handler that overflows it
will overwrite other data structures.

Use vmap() to remap the IRQ stack so that it will have the usual guard
pages that vmap()/vmalloc() allocations have. With this, the kernel will
panic immediately on an IRQ stack overflow.

[ tglx: Move the map code to a proper place and invoke it only when a CPU
  	is about to be brought online. No point in installing the map at
  	early boot for all possible CPUs. Fail the CPU bringup if the vmap()
  	fails as done for all other preparatory stages in CPU hotplug. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160146.363733568@linutronix.de
---
 arch/x86/kernel/irq_64.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index c0f89d136b80..f107eb2021f6 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -91,6 +91,35 @@ bool handle_irq(struct irq_desc *desc, struct pt_regs *regs)
 	return true;
 }
 
+#ifdef CONFIG_VMAP_STACK
+/*
+ * VMAP the backing store with guard pages
+ */
+static int map_irq_stack(unsigned int cpu)
+{
+	char *stack = (char *)per_cpu_ptr(&irq_stack_backing_store, cpu);
+	struct page *pages[IRQ_STACK_SIZE / PAGE_SIZE];
+	void *va;
+	int i;
+
+	for (i = 0; i < IRQ_STACK_SIZE / PAGE_SIZE; i++) {
+		phys_addr_t pa = per_cpu_ptr_to_phys(stack + (i << PAGE_SHIFT));
+
+		pages[i] = pfn_to_page(pa >> PAGE_SHIFT);
+	}
+
+	va = vmap(pages, IRQ_STACK_SIZE / PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+	if (!va)
+		return -ENOMEM;
+
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+	return 0;
+}
+#else
+/*
+ * If VMAP stacks are disabled due to KASAN, just use the per cpu
+ * backing store without guard pages.
+ */
 static int map_irq_stack(unsigned int cpu)
 {
 	void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
@@ -98,6 +127,7 @@ static int map_irq_stack(unsigned int cpu)
 	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
 	return 0;
 }
+#endif
 
 int irq_init_percpu_irqstack(unsigned int cpu)
 {

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [tip:x86/irq] x86/irq/64: Remove stack overflow debug code
  2019-04-14 16:00 ` [patch V3 32/32] x86/irq/64: Remove stack overflow debug code Thomas Gleixner
@ 2019-04-17 14:23   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 89+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-04-17 14:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, luto, bp, mingo, sean.j.christopherson,
	jpoimboe, mingo, x86, hpa, nstange

Commit-ID:  117ed45485413b1977bfc638c32bf5b01d53c62b
Gitweb:     https://git.kernel.org/tip/117ed45485413b1977bfc638c32bf5b01d53c62b
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 14 Apr 2019 18:00:08 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 17 Apr 2019 15:41:48 +0200

x86/irq/64: Remove stack overflow debug code

All stack types on x86 64-bit have guard pages now.

So there is no point in executing probabilistic overflow checks as the
guard pages are a accurate and reliable overflow prevention.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160146.466354762@linutronix.de
---
 arch/x86/Kconfig         |  2 +-
 arch/x86/kernel/irq_64.c | 56 ------------------------------------------------
 2 files changed, 1 insertion(+), 57 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5ad92419be19..fd06614b09a7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -14,6 +14,7 @@ config X86_32
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select CLKSRC_I8253
 	select CLONE_BACKWARDS
+	select HAVE_DEBUG_STACKOVERFLOW
 	select MODULES_USE_ELF_REL
 	select OLD_SIGACTION
 
@@ -138,7 +139,6 @@ config X86
 	select HAVE_COPY_THREAD_TLS
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_DEBUG_KMEMLEAK
-	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index f107eb2021f6..6bf6517a05bb 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -26,64 +26,8 @@
 DEFINE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store) __visible;
 DECLARE_INIT_PER_CPU(irq_stack_backing_store);
 
-int sysctl_panic_on_stackoverflow;
-
-/*
- * Probabilistic stack overflow check:
- *
- * Regular device interrupts can enter on the following stacks:
- *
- * - User stack
- *
- * - Kernel task stack
- *
- * - Interrupt stack if a device driver reenables interrupts
- *   which should only happen in really old drivers.
- *
- * - Debug IST stack
- *
- * All other contexts are invalid.
- */
-static inline void stack_overflow_check(struct pt_regs *regs)
-{
-#ifdef CONFIG_DEBUG_STACKOVERFLOW
-#define STACK_MARGIN	128
-	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
-	u64 curbase = (u64)task_stack_page(current);
-	struct cea_exception_stacks *estacks;
-
-	if (user_mode(regs))
-		return;
-
-	if (regs->sp >= curbase + sizeof(struct pt_regs) + STACK_MARGIN &&
-	    regs->sp <= curbase + THREAD_SIZE)
-		return;
-
-	irq_stack_top = (u64)__this_cpu_read(hardirq_stack_ptr);
-	irq_stack_bottom = irq_stack_top - IRQ_STACK_SIZE + STACK_MARGIN;
-	if (regs->sp >= irq_stack_bottom && regs->sp <= irq_stack_top)
-		return;
-
-	estacks = __this_cpu_read(cea_exception_stacks);
-	estack_top = CEA_ESTACK_TOP(estacks, DB);
-	estack_bottom = CEA_ESTACK_BOT(estacks, DB) + STACK_MARGIN;
-	if (regs->sp >= estack_bottom && regs->sp <= estack_top)
-		return;
-
-	WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx, irq stack:%Lx-%Lx, exception stack: %Lx-%Lx, ip:%pF)\n",
-		current->comm, curbase, regs->sp,
-		irq_stack_bottom, irq_stack_top,
-		estack_bottom, estack_top, (void *)regs->ip);
-
-	if (sysctl_panic_on_stackoverflow)
-		panic("low stack detected by irq handler - check messages\n");
-#endif
-}
-
 bool handle_irq(struct irq_desc *desc, struct pt_regs *regs)
 {
-	stack_overflow_check(regs);
-
 	if (IS_ERR_OR_NULL(desc))
 		return false;
 

^ permalink raw reply related	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2019-04-17 14:23 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-14 15:59 [patch V3 00/32] x86: Add guard pages to exception and interrupt stacks Thomas Gleixner
2019-04-14 15:59 ` [patch V3 01/32] mm/slab: Fix broken stack trace storage Thomas Gleixner
2019-04-14 16:16   ` Andy Lutomirski
2019-04-14 16:16     ` Andy Lutomirski
2019-04-14 16:34     ` Thomas Gleixner
2019-04-14 16:34       ` Thomas Gleixner
2019-04-15  9:02       ` [patch V4 " Thomas Gleixner
2019-04-15  9:02         ` Thomas Gleixner
2019-04-15 13:23         ` Josh Poimboeuf
2019-04-15 16:07           ` Thomas Gleixner
2019-04-15 16:07             ` Thomas Gleixner
2019-04-15 16:16             ` Josh Poimboeuf
2019-04-15 17:05               ` Andy Lutomirski
2019-04-15 17:05                 ` Andy Lutomirski
2019-04-15 21:22                 ` Thomas Gleixner
2019-04-15 21:22                   ` Thomas Gleixner
2019-04-16 11:37                   ` Vlastimil Babka
2019-04-16 14:10                     ` [patch V5 01/32] mm/slab: Remove " Thomas Gleixner
2019-04-16 14:10                       ` Thomas Gleixner
2019-04-16 15:16                       ` Vlastimil Babka
2019-04-15 21:20               ` [patch V4 01/32] mm/slab: Fix " Thomas Gleixner
2019-04-15 21:20                 ` Thomas Gleixner
2019-04-15 16:21             ` Peter Zijlstra
2019-04-15 16:58       ` [patch V3 " Andy Lutomirski
2019-04-15 16:58         ` Andy Lutomirski
2019-04-14 15:59 ` [patch V3 02/32] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
2019-04-17 14:02   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 03/32] x86/dumpstack: Fix off-by-one errors in stack identification Thomas Gleixner
2019-04-17 14:03   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
2019-04-14 15:59 ` [patch V3 04/32] x86/irq/64: Remove a hardcoded irq_stack_union access Thomas Gleixner
2019-04-17 14:03   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
2019-04-14 15:59 ` [patch V3 05/32] x86/irq/64: Sanitize the top/bottom confusion Thomas Gleixner
2019-04-17 14:04   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 06/32] x86/idt: Remove unused macro SISTG Thomas Gleixner
2019-04-17 14:05   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 07/32] x86/64: Remove stale CURRENT_MASK Thomas Gleixner
2019-04-17 14:06   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 08/32] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
2019-04-17 14:06   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 09/32] x86/exceptions: Make IST index zero based Thomas Gleixner
2019-04-17 14:07   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 10/32] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
2019-04-17 14:08   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 11/32] x86/exceptions: Add structs for exception stacks Thomas Gleixner
2019-04-16 18:20   ` Sean Christopherson
2019-04-17 14:08   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 12/32] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
2019-04-17 14:09   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 13/32] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
2019-04-17 14:10   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 14/32] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
2019-04-17 14:10   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 15/32] x86/irq/64: Use cpu entry area " Thomas Gleixner
2019-04-17 14:11   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 16/32] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
2019-04-17 14:12   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 17/32] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
2019-04-17 14:13   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 18/32] x86/cpu: Remove orig_ist array Thomas Gleixner
2019-04-17 14:13   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 19/32] x86/exceptions: Disconnect IST index and stack order Thomas Gleixner
2019-04-17 14:14   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 20/32] x86/exceptions: Enable IST guard pages Thomas Gleixner
2019-04-17 14:15   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 21/32] x86/exceptions: Split debug IST stack Thomas Gleixner
2019-04-16 22:07   ` Sean Christopherson
2019-04-17 14:15   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 22/32] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
2019-04-17 14:16   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 15:59 ` [patch V3 23/32] x86/irq/32: Define IRQ_STACK_SIZE Thomas Gleixner
2019-04-17 14:17   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 24/32] x86/irq/32: Make irq stack a character array Thomas Gleixner
2019-04-17 14:18   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 25/32] x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr Thomas Gleixner
2019-04-17 14:18   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 26/32] x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr Thomas Gleixner
2019-04-17 14:19   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 27/32] x86/irq/32: Invoke irq_ctx_init() from init_IRQ() Thomas Gleixner
2019-04-17 14:20   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 28/32] x86/irq/32: Handle irq stack allocation failure proper Thomas Gleixner
2019-04-17 14:20   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 29/32] x86/irq/64: Init hardirq_stack_ptr during CPU hotplug Thomas Gleixner
2019-04-17 14:21   ` [tip:x86/irq] " tip-bot for Thomas Gleixner
2019-04-14 16:00 ` [patch V3 30/32] x86/irq/64: Split the IRQ stack into its own pages Thomas Gleixner
2019-04-17 14:22   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
2019-04-14 16:00 ` [patch V3 31/32] x86/irq/64: Remap the IRQ stack with guard pages Thomas Gleixner
2019-04-17 14:22   ` [tip:x86/irq] " tip-bot for Andy Lutomirski
2019-04-14 16:00 ` [patch V3 32/32] x86/irq/64: Remove stack overflow debug code Thomas Gleixner
2019-04-17 14:23   ` [tip:x86/irq] " tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.