All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/14] x86/exceptions: Add guard patches to IST stacks
@ 2019-03-31 21:40 Thomas Gleixner
  2019-03-31 21:40 ` [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
                   ` (14 more replies)
  0 siblings, 15 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

Hi!

While looking for something different I stumbled over the comment in struct
cpu_entry_area:

         * Exception stacks used for IST entries.
         *
         * In the future, this should have a separate slot for each stack
         * with guard pages between them.

As usual with such comments they are added in good faith and then
forgotten. Looking what it takes to fix that let me stumble over some other
leftovers like orig_ist[], now unused macros, useless defines and a bunch
of assumptions about the exception stacks being a big lump. Aside of that I
found a too broad check of the exception stack in the x86/64 stack overflow
detector.

The following series cleans that up and gradually prepares for guard pages
between the IST stacks.

Thanks,

	tglx

8<-------------------
 Documentation/x86/kernel-stacks       |    8 ++--
 arch/x86/entry/entry_64.S             |    4 +-
 arch/x86/include/asm/cpu_entry_area.h |   35 ++++++++++++++++---
 arch/x86/include/asm/page_32_types.h  |    6 ---
 arch/x86/include/asm/page_64_types.h  |   13 ++++---
 arch/x86/include/asm/processor.h      |    9 -----
 arch/x86/kernel/cpu/common.c          |   41 +++-------------------
 arch/x86/kernel/dumpstack_64.c        |   49 +++++++++++++++++----------
 arch/x86/kernel/idt.c                 |   19 +++++-----
 arch/x86/kernel/irq_64.c              |   31 ++++++++++++-----
 arch/x86/mm/cpu_entry_area.c          |   61 +++++++++++++++++++++++-----------
 arch/x86/mm/fault.c                   |    3 +
 12 files changed, 159 insertions(+), 120 deletions(-)






^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-04-01 18:03   ` Borislav Petkov
  2019-04-02 16:34   ` Sean Christopherson
  2019-03-31 21:40 ` [patch 02/14] x86/idt: Remove unused macro SISTG Thomas Gleixner
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf, Mitsuo Hayasaka

37fe6a42b343 ("x86: Check stack overflow in detail") added a broad check
for the full exception stack area, i.e. it considers the full exception
stack are as valid.

That's wrong in two aspects:

 1) It does not check the individual areas one by one

 2) #DF, NMI and #MCE are not enabling interrupts which means that a
    regular device interrupt cannot happen in their context. In fact if a
    device interrupt hits one of those IST stacks that's a bug because some
    code path enabled interrupts while handling the exception.

Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
---
 arch/x86/kernel/irq_64.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -26,9 +26,18 @@ int sysctl_panic_on_stackoverflow;
 /*
  * Probabilistic stack overflow check:
  *
- * Only check the stack in process context, because everything else
- * runs on the big interrupt stacks. Checking reliably is too expensive,
- * so we just check from interrupts.
+ * Regular device interrupts can enter on the following stacks:
+ *
+ * - User stack
+ *
+ * - Kernel task stack
+ *
+ * - Interrupt stack if a device driver reenables interrupt
+ *   which should only happen in really old drivers.
+ *
+ * - Debug IST stack
+ *
+ * All other contexts are invalid.
  */
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
@@ -53,8 +62,8 @@ static inline void stack_overflow_check(
 		return;
 
 	oist = this_cpu_ptr(&orig_ist);
-	estack_top = (u64)oist->ist[0] - EXCEPTION_STKSZ + STACK_TOP_MARGIN;
-	estack_bottom = (u64)oist->ist[N_EXCEPTION_STACKS - 1];
+	estack_bottom = (u64)oist->ist[DEBUG_STACK];
+	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
 	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
 		return;
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 02/14] x86/idt: Remove unused macro SISTG
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
  2019-03-31 21:40 ` [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-04-01  4:04   ` Andy Lutomirski
  2019-03-31 21:40 ` [patch 03/14] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

d8ba61ba58c8 ("x86/entry/64: Don't use IST entry for #BP stack") removed
the last user but left the macro around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/idt.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -45,10 +45,6 @@ struct idt_data {
 #define ISTG(_vector, _addr, _ist)			\
 	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
-/* System interrupt gate with interrupt stack */
-#define SISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL3, __KERNEL_CS)
-
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
 	G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 03/14] x86/exceptions: Remove unused stack defines on 32bit
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
  2019-03-31 21:40 ` [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
  2019-03-31 21:40 ` [patch 02/14] x86/idt: Remove unused macro SISTG Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 04/14] x86/exceptions: Make IST index zero based Thomas Gleixner
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

Nothing requires those for 32bit builds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/page_32_types.h |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,11 +22,7 @@
 #define THREAD_SIZE_ORDER	1
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 0
-#define DEBUG_STACK 0
-#define MCE_STACK 0
-#define N_EXCEPTION_STACKS 1
+#define N_EXCEPTION_STACKS	1
 
 #ifdef CONFIG_X86_PAE
 /*



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 04/14] x86/exceptions: Make IST index zero based
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-03-31 21:40 ` [patch 03/14] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-04-01  7:30   ` Peter Zijlstra
                     ` (2 more replies)
  2019-03-31 21:40 ` [patch 05/14] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
                   ` (10 subsequent siblings)
  14 siblings, 3 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

The defines for the exception stack (IST) array in the TSS are using the
SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
array indices related to IST. That's confusing at best and does not provide
any value.

Make the indices zero based and fixup the usage sites. The only code which
needs to adjust the 0 based index is the interrupt descriptor setup which
needs to add 1 now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/x86/kernel-stacks      |    8 ++++----
 arch/x86/entry/entry_64.S            |    4 ++--
 arch/x86/include/asm/page_64_types.h |   13 ++++++++-----
 arch/x86/kernel/cpu/common.c         |    4 ++--
 arch/x86/kernel/dumpstack_64.c       |   14 +++++++-------
 arch/x86/kernel/idt.c                |   15 +++++++++------
 6 files changed, 32 insertions(+), 26 deletions(-)

--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -59,7 +59,7 @@ If that assumption is ever broken then t
 
 The currently assigned IST stacks are :-
 
-* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* DOUBLEFAULT_IST.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).
 
@@ -68,7 +68,7 @@ The currently assigned IST stacks are :-
   Using a separate stack allows the kernel to recover from it well enough
   in many cases to still output an oops.
 
-* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* NMI_IST.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for non-maskable interrupts (NMI).
 
@@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
   middle of switching stacks.  Using IST for NMI events avoids making
   assumptions about the previous state of the kernel stack.
 
-* DEBUG_STACK.  DEBUG_STKSZ
+* DEBUG_IST.  DEBUG_STKSZ
 
   Used for hardware debug interrupts (interrupt 1) and for software
   debug interrupts (INT3).
@@ -86,7 +86,7 @@ The currently assigned IST stacks are :-
   avoids making assumptions about the previous state of the kernel
   stack.
 
-* MCE_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+* MCE_IST.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 18 - Machine Check Exception (#MC).
 
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -841,7 +841,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8)
 
 /**
  * idtentry - Generate an IDT entry stub
@@ -1129,7 +1129,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 	hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
-idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
+idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_IST
 idtentry int3			do_int3			has_error_code=0
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -25,11 +25,14 @@
 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 2
-#define DEBUG_STACK 3
-#define MCE_STACK 4
-#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
+/*
+ * The index for the tss.ist[] array. The hardware limit is 7 entries.
+ */
+#define	DOUBLEFAULT_IST		0
+#define	NMI_IST			1
+#define	DEBUG_IST		2
+#define	MCE_IST			3
+#define	N_EXCEPTION_STACKS	4
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -516,7 +516,7 @@ DEFINE_PER_CPU(struct cpu_entry_area *,
  */
 static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
+	  [DEBUG_IST]				= DEBUG_STKSZ
 };
 #endif
 
@@ -1760,7 +1760,7 @@ void cpu_init(void)
 			estacks += exception_stack_sizes[v];
 			oist->ist[v] = t->x86_tss.ist[v] =
 					(unsigned long)estacks;
-			if (v == DEBUG_STACK-1)
+			if (v == DEBUG_IST)
 				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
 		}
 	}
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -18,16 +18,16 @@
 
 #include <asm/stacktrace.h>
 
-static char *exception_stack_names[N_EXCEPTION_STACKS] = {
-		[ DOUBLEFAULT_STACK-1	]	= "#DF",
-		[ NMI_STACK-1		]	= "NMI",
-		[ DEBUG_STACK-1		]	= "#DB",
-		[ MCE_STACK-1		]	= "#MC",
+static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ DOUBLEFAULT_IST	]	= "#DF",
+		[ NMI_IST		]	= "NMI",
+		[ DEBUG_IST		]	= "#DB",
+		[ MCE_IST		]	= "#MC",
 };
 
-static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[DEBUG_STACK - 1]			= DEBUG_STKSZ
+	[DEBUG_IST]				= DEBUG_STKSZ
 };
 
 const char *stack_type_name(enum stack_type type)
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -41,9 +41,12 @@ struct idt_data {
 #define SYSG(_vector, _addr)				\
 	G(_vector, _addr, DEFAULT_STACK, GATE_INTERRUPT, DPL3, __KERNEL_CS)
 
-/* Interrupt gate with interrupt stack */
+/*
+ * Interrupt gate with interrupt stack. The _ist index is the index in
+ * the tss.ist[] array, but for the descriptor it needs to start at 1.
+ */
 #define ISTG(_vector, _addr, _ist)			\
-	G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
+	G(_vector, _addr, _ist + 1, GATE_INTERRUPT, DPL0, __KERNEL_CS)
 
 /* Task gate */
 #define TSKG(_vector, _gdt)				\
@@ -180,11 +183,11 @@ gate_desc debug_idt_table[IDT_ENTRIES] _
  * cpu_init() when the TSS has been initialized.
  */
 static const __initconst struct idt_data ist_idts[] = {
-	ISTG(X86_TRAP_DB,	debug,		DEBUG_STACK),
-	ISTG(X86_TRAP_NMI,	nmi,		NMI_STACK),
-	ISTG(X86_TRAP_DF,	double_fault,	DOUBLEFAULT_STACK),
+	ISTG(X86_TRAP_DB,	debug,		DEBUG_IST),
+	ISTG(X86_TRAP_NMI,	nmi,		NMI_IST),
+	ISTG(X86_TRAP_DF,	double_fault,	DOUBLEFAULT_IST),
 #ifdef CONFIG_X86_MCE
-	ISTG(X86_TRAP_MC,	&machine_check,	MCE_STACK),
+	ISTG(X86_TRAP_MC,	&machine_check,	MCE_IST),
 #endif
 };
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 05/14] x86/cpu_entry_area: Cleanup setup functions
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-03-31 21:40 ` [patch 04/14] x86/exceptions: Make IST index zero based Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 06/14] x86/exceptions: Add structs for exception stacks Thomas Gleixner
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

No point in retrieving the entry area pointer over and over. Do it once and
use unsigned int for 'cpu' consistently.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/cpu_entry_area.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -52,10 +52,10 @@ cea_map_percpu_pages(void *cea_vaddr, vo
 		cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
 }
 
-static void __init percpu_setup_debug_store(int cpu)
+static void __init percpu_setup_debug_store(unsigned int cpu)
 {
 #ifdef CONFIG_CPU_SUP_INTEL
-	int npages;
+	unsigned int npages;
 	void *cea;
 
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
@@ -79,8 +79,9 @@ static void __init percpu_setup_debug_st
 }
 
 /* Setup the fixmap mappings only once per-processor */
-static void __init setup_cpu_entry_area(int cpu)
+static void __init setup_cpu_entry_area(unsigned int cpu)
 {
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
 #ifdef CONFIG_X86_64
 	/* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
 	pgprot_t gdt_prot = PAGE_KERNEL_RO;
@@ -101,10 +102,9 @@ static void __init setup_cpu_entry_area(
 	pgprot_t tss_prot = PAGE_KERNEL;
 #endif
 
-	cea_set_pte(&get_cpu_entry_area(cpu)->gdt, get_cpu_gdt_paddr(cpu),
-		    gdt_prot);
+	cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);
 
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->entry_stack_page,
+	cea_map_percpu_pages(&cea->entry_stack_page,
 			     per_cpu_ptr(&entry_stack_storage, cpu), 1,
 			     PAGE_KERNEL);
 
@@ -128,19 +128,18 @@ static void __init setup_cpu_entry_area(
 	BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
 		      offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
 	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->tss,
-			     &per_cpu(cpu_tss_rw, cpu),
+	cea_map_percpu_pages(&cea->tss, &per_cpu(cpu_tss_rw, cpu),
 			     sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
 
 #ifdef CONFIG_X86_32
-	per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
+	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
 		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&get_cpu_entry_area(cpu)->exception_stacks,
+	cea_map_percpu_pages(&cea->exception_stacks,
 			     &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 06/14] x86/exceptions: Add structs for exception stacks
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-03-31 21:40 ` [patch 05/14] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 07/14] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

At the moment everything assumes a full linear mapping of the various
exception stacks. Adding guard pages to the cpu entry area mapping of the
exception stacks will break that assumption.

As a preparatory step convert both the real storage and the effective
mapping in the cpu entry area from character arrays to structures.

To ensure that both arrays have the same ordering and the same size of the
individual stacks fill the members with a macro. The guard size is the only
difference between the two resulting structures. For now both have guard
size 0 until the preparation of all usage sites is done.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |   35 +++++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/common.c          |    2 -
 arch/x86/mm/cpu_entry_area.c          |    8 ++-----
 3 files changed, 34 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -7,6 +7,34 @@
 #include <asm/processor.h>
 #include <asm/intel_ds.h>
 
+#ifdef CONFIG_X86_64
+
+/* Macro to enforce the same ordering and stack sizes */
+#define ESTACKS_MEMBERS(guardsize)		\
+	char	DF_stack[EXCEPTION_STKSZ];	\
+	char	DF_stack_guard[guardsize];	\
+	char	NMI_stack[EXCEPTION_STKSZ];	\
+	char	NMI_stack_guard[guardsize];	\
+	char	DB_stack[DEBUG_STKSZ];		\
+	char	DB_stack_guard[guardsize];	\
+	char	MCE_stack[EXCEPTION_STKSZ];	\
+	char	MCE_stack_guard[guardsize];	\
+
+/* The exception stacks linear storage. No guard pages required */
+struct exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+
+/*
+ * The effective cpu entry area mapping with guard pages. Guard size is
+ * zero until the code which makes assumptions about linear mapping is
+ * cleaned up.
+ */
+struct cea_exception_stacks {
+	ESTACKS_MEMBERS(0)
+};
+#endif
+
 /*
  * cpu_entry_area is a percpu region that contains things needed by the CPU
  * and early entry/exit code.  Real types aren't used for all fields here
@@ -32,12 +60,9 @@ struct cpu_entry_area {
 
 #ifdef CONFIG_X86_64
 	/*
-	 * Exception stacks used for IST entries.
-	 *
-	 * In the future, this should have a separate slot for each stack
-	 * with guard pages between them.
+	 * Exception stacks used for IST entries with guard pages.
 	 */
-	char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
+	struct cea_exception_stacks estacks;
 #endif
 #ifdef CONFIG_CPU_SUP_INTEL
 	/*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1754,7 +1754,7 @@ void cpu_init(void)
 	 * set up and load the per-CPU TSS
 	 */
 	if (!oist->ist[0]) {
-		char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
+		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
 
 		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
 			estacks += exception_stack_sizes[v];
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -13,8 +13,7 @@
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
 
 #ifdef CONFIG_X86_64
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
-	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -138,9 +137,8 @@ static void __init setup_cpu_entry_area(
 #ifdef CONFIG_X86_64
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
 	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->exception_stacks));
-	cea_map_percpu_pages(&cea->exception_stacks,
-			     &per_cpu(exception_stacks, cpu),
+		     sizeof(((struct cpu_entry_area *)0)->estacks));
+	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
 			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
 #endif
 	percpu_setup_debug_store(cpu);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 07/14] x86/cpu_entry_area: Prepare for IST guard pages
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-03-31 21:40 ` [patch 06/14] x86/exceptions: Add structs for exception stacks Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 08/14] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

To allow guard pages between the IST stacks each stack needs to be mapped
individually.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/cpu_entry_area.c |   36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -77,6 +77,33 @@ static void __init percpu_setup_debug_st
 #endif
 }
 
+#ifdef CONFIG_X86_64
+
+#define cea_map_stack(name)						\
+	npages = sizeof(estacks->name## _stack) / PAGE_SIZE;		\
+	cea_map_percpu_pages(cea->estacks.name## _stack,		\
+			     estacks->name## _stack, npages, PAGE_KERNEL)
+
+static void __init percpu_setup_exception_stacks(unsigned int cpu)
+{
+	struct exception_stacks *estacks = per_cpu_ptr(&exception_stacks, cpu);
+	struct cpu_entry_area *cea = get_cpu_entry_area(cpu);
+	unsigned int npages;
+
+	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+	/*
+	 * The exceptions stack mappings in the per cpu area are protected
+	 * by guard pages so each stack must be mapped seperately.
+	 */
+	cea_map_stack(DF);
+	cea_map_stack(NMI);
+	cea_map_stack(DB);
+	cea_map_stack(MCE);
+}
+#else
+static inline void percpu_setup_exception_stacks(unsigned int cpu) {}
+#endif
+
 /* Setup the fixmap mappings only once per-processor */
 static void __init setup_cpu_entry_area(unsigned int cpu)
 {
@@ -134,13 +161,8 @@ static void __init setup_cpu_entry_area(
 	per_cpu(cpu_entry_area, cpu) = cea;
 #endif
 
-#ifdef CONFIG_X86_64
-	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
-	BUILD_BUG_ON(sizeof(exception_stacks) !=
-		     sizeof(((struct cpu_entry_area *)0)->estacks));
-	cea_map_percpu_pages(&cea->estacks, &per_cpu(exception_stacks, cpu),
-			     sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
-#endif
+	percpu_setup_exception_stacks(cpu);
+
 	percpu_setup_debug_store(cpu);
 }
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 08/14] x86/cpu_entry_area: Provide exception stack accessor
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-03-31 21:40 ` [patch 07/14] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 09/14] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

Store a pointer to the per cpu entry area exception stack mappings to allow
fast retrieval.

Required for converting various places from using the shadow IST array to
directly doing address calculations on the actual mapping address.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |    4 ++++
 arch/x86/mm/cpu_entry_area.c          |    4 ++++
 2 files changed, 8 insertions(+)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -82,6 +82,7 @@ struct cpu_entry_area {
 #define CPU_ENTRY_AREA_TOT_SIZE	(CPU_ENTRY_AREA_SIZE * NR_CPUS)
 
 DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks);
 
 extern void setup_cpu_entry_areas(void);
 extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
@@ -101,4 +102,7 @@ static inline struct entry_stack *cpu_en
 	return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
 }
 
+#define __this_cpu_ist_top_va(name)	\
+  (unsigned long)__this_cpu_read(cea_exception_stacks)->name## _stack_guard
+
 #endif
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -14,6 +14,7 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(struc
 
 #ifdef CONFIG_X86_64
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
+DEFINE_PER_CPU(struct cea_exception_stacks*, cea_exception_stacks);
 #endif
 
 struct cpu_entry_area *get_cpu_entry_area(int cpu)
@@ -91,6 +92,9 @@ static void __init percpu_setup_exceptio
 	unsigned int npages;
 
 	BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+
+	per_cpu(cea_exception_stacks, cpu) = &cea->estacks;
+
 	/*
 	 * The exceptions stack mappings in the per cpu area are protected
 	 * by guard pages so each stack must be mapped seperately.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 09/14] x86/traps: Use cpu_entry_area instead of orig_ist
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-03-31 21:40 ` [patch 08/14] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 10/14] x86/irq/64: Use cpu entry area " Thomas Gleixner
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/fault.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
 #include <asm/efi.h>			/* efi_recover_from_page_fault()*/
 #include <asm/desc.h>			/* store_idt(), ...		*/
+#include <asm/cpu_entry_area.h>		/* exception stack		*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -793,7 +794,7 @@ no_context(struct pt_regs *regs, unsigne
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
+		unsigned long stack = __this_cpu_ist_top_va(DF) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
 		 * left.  It's plausible that we'd hit this condition but



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 10/14] x86/irq/64: Use cpu entry area instead of orig_ist
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-03-31 21:40 ` [patch 09/14] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 11/14] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq_64.c |   16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,8 @@
 #include <linux/uaccess.h>
 #include <linux/smp.h>
 #include <linux/sched/task_stack.h>
+
+#include <asm/cpu_entry_area.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
 
@@ -42,11 +44,12 @@ int sysctl_panic_on_stackoverflow;
 static inline void stack_overflow_check(struct pt_regs *regs)
 {
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
+
 #define STACK_TOP_MARGIN	128
-	struct orig_ist *oist;
-	u64 irq_stack_top, irq_stack_bottom;
-	u64 estack_top, estack_bottom;
+
+	u64 irq_stack_top, irq_stack_bottom, estack_top, estack_bottom;
 	u64 curbase = (u64)task_stack_page(current);
+	struct cea_exception_stacks *estacks;
 
 	if (user_mode(regs))
 		return;
@@ -61,9 +64,10 @@ static inline void stack_overflow_check(
 	if (regs->sp >= irq_stack_top && regs->sp <= irq_stack_bottom)
 		return;
 
-	oist = this_cpu_ptr(&orig_ist);
-	estack_bottom = (u64)oist->ist[DEBUG_STACK];
-	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
+	estacks = __this_cpu_read(cea_exception_stacks);
+	estack_top = (u64)estacks->DB_stack;
+	estack_bottom = estack_top + sizeof(estacks->DB_stack);
+	estack_top += STACK_TOP_MARGIN;
 	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
 		return;
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 11/14] x86/dumpstack/64: Use cpu_entry_area instead of orig_ist
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-03-31 21:40 ` [patch 10/14] x86/irq/64: Use cpu entry area " Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 12/14] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c |   39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/nmi.h>
 
+#include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 
 static const char *exception_stack_names[N_EXCEPTION_STACKS] = {
@@ -25,11 +26,6 @@ static const char *exception_stack_names
 		[ MCE_IST		]	= "#MC",
 };
 
-static const unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
-	[DEBUG_IST]				= DEBUG_STKSZ
-};
-
 const char *stack_type_name(enum stack_type type)
 {
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
@@ -52,25 +48,44 @@ const char *stack_type_name(enum stack_t
 	return NULL;
 }
 
+struct estack_layout {
+	unsigned int	begin;
+	unsigned int	end;
+};
+
+#define	ESTACK_ENTRY(x)	{						  \
+	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
+	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
+	}
+
+static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
+	[ DOUBLEFAULT_IST	]	= ESTACK_ENTRY(DF),
+	[ NMI_IST		]	= ESTACK_ENTRY(NMI),
+	[ DEBUG_IST		]	= ESTACK_ENTRY(DB),
+	[ MCE_IST		]	= ESTACK_ENTRY(MCE),
+};
+
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin, *end;
+	unsigned long estacks, begin, end, stk = (unsigned long)stack;
 	struct pt_regs *regs;
-	unsigned k;
+	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
+	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		begin = estacks + layout[k].begin;
+		end   = estacks + layout[k].end;
 		regs  = (struct pt_regs *)end - 1;
 
-		if (stack <= begin || stack >= end)
+		if (stk <= begin || stk >= end)
 			continue;
 
 		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= begin;
-		info->end	= end;
+		info->begin	= (unsigned long *)begin;
+		info->end	= (unsigned long *)end;
 		info->next_sp	= (unsigned long *)regs->sp;
 
 		return true;



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 12/14] x86/cpu: Prepare TSS.IST setup for guard pages
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (10 preceding siblings ...)
  2019-03-31 21:40 ` [patch 11/14] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-04-02 16:57   ` Sean Christopherson
  2019-03-31 21:40 ` [patch 13/14] x86/cpu: Remove orig_ist array Thomas Gleixner
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

Convert the TSS.IST setup code to use the cpu entry area information
directly instead of assuming a linear mapping of the IST stacks.

The store to orig_ist[] is not longer required as there are no users
anymore.

This is the last preparatory step for IST guard pages.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/common.c |   35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -507,19 +507,6 @@ void load_percpu_segment(int cpu)
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
 #endif
 
-#ifdef CONFIG_X86_64
-/*
- * Special IST stacks which the CPU switches to when it calls
- * an IST-marked descriptor entry. Up to 7 stacks (hardware
- * limit), all of them are 4K, except the debug stack which
- * is 8K.
- */
-static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
-	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-	  [DEBUG_IST]				= DEBUG_STKSZ
-};
-#endif
-
 /* Load the original GDT from the per-cpu structure */
 void load_direct_gdt(int cpu)
 {
@@ -1690,17 +1677,14 @@ static void setup_getcpu(int cpu)
  * initialized (naturally) in the bootstrap process, such as the GDT
  * and IDT. We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
- * A lot of state is already set up in PDA init for 64 bit
  */
 #ifdef CONFIG_X86_64
 
 void cpu_init(void)
 {
-	struct orig_ist *oist;
+	int cpu = raw_smp_processor_id();
 	struct task_struct *me;
 	struct tss_struct *t;
-	unsigned long v;
-	int cpu = raw_smp_processor_id();
 	int i;
 
 	wait_for_master_cpu(cpu);
@@ -1715,7 +1699,6 @@ void cpu_init(void)
 		load_ucode_ap();
 
 	t = &per_cpu(cpu_tss_rw, cpu);
-	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
 	if (this_cpu_read(numa_node) == 0 &&
@@ -1753,16 +1736,12 @@ void cpu_init(void)
 	/*
 	 * set up and load the per-CPU TSS
 	 */
-	if (!oist->ist[0]) {
-		char *estacks = (char *)&get_cpu_entry_area(cpu)->estacks;
-
-		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
-			estacks += exception_stack_sizes[v];
-			oist->ist[v] = t->x86_tss.ist[v] =
-					(unsigned long)estacks;
-			if (v == DEBUG_IST)
-				per_cpu(debug_stack_addr, cpu) = (unsigned long)estacks;
-		}
+	if (!t->x86_tss.ist[0]) {
+		t->x86_tss.ist[DOUBLEFAULT_IST] = __this_cpu_ist_top_va(DF);
+		t->x86_tss.ist[NMI_IST] = __this_cpu_ist_top_va(NMI);
+		t->x86_tss.ist[DEBUG_IST] = __this_cpu_ist_top_va(DB);
+		t->x86_tss.ist[MCE_IST] = __this_cpu_ist_top_va(MCE);
+		per_cpu(debug_stack_addr, cpu) = t->x86_tss.ist[DEBUG_IST];
 	}
 
 	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 13/14] x86/cpu: Remove orig_ist array
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (11 preceding siblings ...)
  2019-03-31 21:40 ` [patch 12/14] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-03-31 21:40 ` [patch 14/14] x86/exceptions: Enable IST guard pages Thomas Gleixner
  2019-04-01  4:03 ` [patch 00/14] x86/exceptions: Add guard patches to IST stacks Andy Lutomirski
  14 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

All users gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    9 ---------
 arch/x86/kernel/cpu/common.c     |    6 ------
 2 files changed, 15 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -374,16 +374,7 @@ DECLARE_PER_CPU(unsigned long, cpu_curre
 #define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
 #endif
 
-/*
- * Save the original ist values for checking stack pointers during debugging
- */
-struct orig_ist {
-	unsigned long		ist[7];
-};
-
 #ifdef CONFIG_X86_64
-DECLARE_PER_CPU(struct orig_ist, orig_ist);
-
 union irq_stack_union {
 	char irq_stack[IRQ_STACK_SIZE];
 	/*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1549,12 +1549,6 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
-/*
- * Copies of the original ist values from the tss are only accessed during
- * debugging, no special alignment required.
- */
-DEFINE_PER_CPU(struct orig_ist, orig_ist);
-
 static DEFINE_PER_CPU(unsigned long, debug_stack_addr);
 DEFINE_PER_CPU(int, debug_stack_usage);
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 14/14] x86/exceptions: Enable IST guard pages
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (12 preceding siblings ...)
  2019-03-31 21:40 ` [patch 13/14] x86/cpu: Remove orig_ist array Thomas Gleixner
@ 2019-03-31 21:40 ` Thomas Gleixner
  2019-04-02 10:19   ` [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
  2019-04-01  4:03 ` [patch 00/14] x86/exceptions: Add guard patches to IST stacks Andy Lutomirski
  14 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-03-31 21:40 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

All usage sites which expected that the exception stacks in the CPU entry
area are mapped linearly are fixed up. Enable guard pages between the
IST stacks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpu_entry_area.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -25,13 +25,9 @@ struct exception_stacks {
 	ESTACKS_MEMBERS(0)
 };
 
-/*
- * The effective cpu entry area mapping with guard pages. Guard size is
- * zero until the code which makes assumptions about linear mapping is
- * cleaned up.
- */
+/* The effective cpu_entry_area mapping with guard pages */
 struct cea_exception_stacks {
-	ESTACKS_MEMBERS(0)
+	ESTACKS_MEMBERS(PAGE_SIZE)
 };
 #endif
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 00/14] x86/exceptions: Add guard patches to IST stacks
  2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
                   ` (13 preceding siblings ...)
  2019-03-31 21:40 ` [patch 14/14] x86/exceptions: Enable IST guard pages Thomas Gleixner
@ 2019-04-01  4:03 ` Andy Lutomirski
  2019-04-03 16:30   ` Thomas Gleixner
  14 siblings, 1 reply; 43+ messages in thread
From: Andy Lutomirski @ 2019-04-01  4:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Andy Lutomirski, Josh Poimboeuf

On Sun, Mar 31, 2019 at 3:10 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Hi!
>
> While looking for something different I stumbled over the comment in struct
> cpu_entry_area:
>
>          * Exception stacks used for IST entries.
>          *
>          * In the future, this should have a separate slot for each stack
>          * with guard pages between them.
>
> As usual with such comments they are added in good faith and then
> forgotten. Looking what it takes to fix that let me stumble over some other
> leftovers like orig_ist[], now unused macros, useless defines and a bunch
> of assumptions about the exception stacks being a big lump. Aside of that I
> found a too broad check of the exception stack in the x86/64 stack overflow
> detector.
>
> The following series cleans that up and gradually prepares for guard pages
> between the IST stacks.

Thanks!  I'll review this over the next couple days.

Meanwhile, if you're inspired, I have a WIP series to do the same
thing to the IRQ stacks here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages

Want to take a look or pick it up if you want to keep working on this?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 02/14] x86/idt: Remove unused macro SISTG
  2019-03-31 21:40 ` [patch 02/14] x86/idt: Remove unused macro SISTG Thomas Gleixner
@ 2019-04-01  4:04   ` Andy Lutomirski
  0 siblings, 0 replies; 43+ messages in thread
From: Andy Lutomirski @ 2019-04-01  4:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Andy Lutomirski, Josh Poimboeuf

On Sun, Mar 31, 2019 at 3:10 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> d8ba61ba58c8 ("x86/entry/64: Don't use IST entry for #BP stack") removed
> the last user but left the macro around.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/idt.c |    4 ----
>  1 file changed, 4 deletions(-)
>
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -45,10 +45,6 @@ struct idt_data {
>  #define ISTG(_vector, _addr, _ist)                     \
>         G(_vector, _addr, _ist, GATE_INTERRUPT, DPL0, __KERNEL_CS)
>
> -/* System interrupt gate with interrupt stack */
> -#define SISTG(_vector, _addr, _ist)                    \
> -       G(_vector, _addr, _ist, GATE_INTERRUPT, DPL3, __KERNEL_CS)
> -
>  /* Task gate */
>  #define TSKG(_vector, _gdt)                            \
>         G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)
>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 04/14] x86/exceptions: Make IST index zero based
  2019-03-31 21:40 ` [patch 04/14] x86/exceptions: Make IST index zero based Thomas Gleixner
@ 2019-04-01  7:30   ` Peter Zijlstra
  2019-04-01  7:33     ` Thomas Gleixner
  2019-04-02 16:49   ` Sean Christopherson
  2019-04-03 16:35   ` Borislav Petkov
  2 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-04-01  7:30 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Sun, Mar 31, 2019 at 11:40:24PM +0200, Thomas Gleixner wrote:
> --- a/arch/x86/include/asm/page_64_types.h
> +++ b/arch/x86/include/asm/page_64_types.h
> @@ -25,11 +25,14 @@
>  #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
>  #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
>  
> -#define DOUBLEFAULT_STACK 1
> -#define NMI_STACK 2
> -#define DEBUG_STACK 3
> -#define MCE_STACK 4
> -#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
> +/*
> + * The index for the tss.ist[] array. The hardware limit is 7 entries.
> + */
> +#define	DOUBLEFAULT_IST		0
> +#define	NMI_IST			1
> +#define	DEBUG_IST		2
> +#define	MCE_IST			3
> +#define	N_EXCEPTION_STACKS	4

Would it make sense to use an enum here?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 04/14] x86/exceptions: Make IST index zero based
  2019-04-01  7:30   ` Peter Zijlstra
@ 2019-04-01  7:33     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-01  7:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Mon, 1 Apr 2019, Peter Zijlstra wrote:

> On Sun, Mar 31, 2019 at 11:40:24PM +0200, Thomas Gleixner wrote:
> > --- a/arch/x86/include/asm/page_64_types.h
> > +++ b/arch/x86/include/asm/page_64_types.h
> > @@ -25,11 +25,14 @@
> >  #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
> >  #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
> >  
> > -#define DOUBLEFAULT_STACK 1
> > -#define NMI_STACK 2
> > -#define DEBUG_STACK 3
> > -#define MCE_STACK 4
> > -#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
> > +/*
> > + * The index for the tss.ist[] array. The hardware limit is 7 entries.
> > + */
> > +#define	DOUBLEFAULT_IST		0
> > +#define	NMI_IST			1
> > +#define	DEBUG_IST		2
> > +#define	MCE_IST			3
> > +#define	N_EXCEPTION_STACKS	4
> 
> Would it make sense to use an enum here?

Yes, but ASM code hates enums. We could solve that by moving it to a
different header and exposing the necessary define via asm-offsets. I'll
have a look.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack
  2019-03-31 21:40 ` [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
@ 2019-04-01 18:03   ` Borislav Petkov
  2019-04-02 16:34   ` Sean Christopherson
  1 sibling, 0 replies; 43+ messages in thread
From: Borislav Petkov @ 2019-04-01 18:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf, Mitsuo Hayasaka

On Sun, Mar 31, 2019 at 11:40:21PM +0200, Thomas Gleixner wrote:
> 37fe6a42b343 ("x86: Check stack overflow in detail") added a broad check
> for the full exception stack area, i.e. it considers the full exception
> stack are as valid.
> 
> That's wrong in two aspects:
> 
>  1) It does not check the individual areas one by one
> 
>  2) #DF, NMI and #MCE are not enabling interrupts which means that a
>     regular device interrupt cannot happen in their context. In fact if a
>     device interrupt hits one of those IST stacks that's a bug because some
>     code path enabled interrupts while handling the exception.
> 
> Limit the check to the #DB stack and consider all other IST stacks as
> 'overflow' or invalid.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
> ---
>  arch/x86/kernel/irq_64.c |   19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> --- a/arch/x86/kernel/irq_64.c
> +++ b/arch/x86/kernel/irq_64.c
> @@ -26,9 +26,18 @@ int sysctl_panic_on_stackoverflow;
>  /*
>   * Probabilistic stack overflow check:
>   *
> - * Only check the stack in process context, because everything else
> - * runs on the big interrupt stacks. Checking reliably is too expensive,
> - * so we just check from interrupts.
> + * Regular device interrupts can enter on the following stacks:
> + *
> + * - User stack
> + *
> + * - Kernel task stack
> + *
> + * - Interrupt stack if a device driver reenables interrupt

						     "interrupts"

> + *   which should only happen in really old drivers.
> + *
> + * - Debug IST stack
> + *
> + * All other contexts are invalid.

<---

I could use the above blurb here too, explaining why we're checking the
#DB stack only.

Otherwise:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-03-31 21:40 ` [patch 14/14] x86/exceptions: Enable IST guard pages Thomas Gleixner
@ 2019-04-02 10:19   ` Thomas Gleixner
  2019-04-02 15:43     ` Josh Poimboeuf
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-02 10:19 UTC (permalink / raw)
  To: LKML; +Cc: x86, Andy Lutomirski, Josh Poimboeuf

The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time with nasty macro magic and for the
!KASAN case the interesting page descriptors exactly fit into a single
cache line. Just the last guard page descriptor is in the next cacheline,
but that should not be accessed in the regular case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/dumpstack_64.c |   97 +++++++++++++++++++++++++++--------------
 1 file changed, 66 insertions(+), 31 deletions(-)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -48,50 +48,85 @@ const char *stack_type_name(enum stack_t
 	return NULL;
 }
 
-struct estack_layout {
-	unsigned int	begin;
-	unsigned int	end;
+#define ESTACK_S(st)						\
+	(offsetof(struct cea_exception_stacks, st## _stack))
+
+#define ESTACK_E(st)							\
+	(offsetof(struct cea_exception_stacks, st## _stack_guard))
+
+#define PAGENR(offs)	((offs) / PAGE_SIZE)
+#define PAGERANGE(st)	PAGENR(ESTACK_S(st)) ... PAGENR(ESTACK_E(st) - 1)
+
+#if EXCEPTION_STKSZ == PAGE_SIZE
+# define CONDRANGE(st)	PAGENR(ESTACK_S(st))
+#else
+# define CONDRANGE(st)	PAGERANGE(st)
+#endif
+
+/**
+ * struct estack_pages - Page descriptor for exception stacks
+ * @offs:	Offset from the start of the exception stack area
+ * @size:	Size of the exception stack
+ * @type:	Type to store in the stack_info struct
+ */
+struct estack_pages {
+	u32	offs;
+	u16	size;
+	u16	type;
 };
 
-#define	ESTACK_ENTRY(x)	{						  \
-	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
-	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
+#define ESTACK_PAGE(ist, est) {				\
+	.offs	= ESTACK_S(est),			\
+	.size	= ESTACK_E(est) - ESTACK_S(est),	\
+	.type	= STACK_TYPE_EXCEPTION + ist,		\
 	}
 
-static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
-	[ DOUBLEFAULT_IST	]	= ESTACK_ENTRY(DF),
-	[ NMI_IST		]	= ESTACK_ENTRY(NMI),
-	[ DEBUG_IST		]	= ESTACK_ENTRY(DB),
-	[ MCE_IST		]	= ESTACK_ENTRY(MCE),
+#define ESTACK_PAGES	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
+
+/*
+ * Array of exception stack page descriptors. If the stack is larger than
+ * PAGE_SIZE, all pages covering a particular stack will have the same
+ * info.
+ */
+static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
+	[CONDRANGE(DF)]		= ESTACK_PAGE(DOUBLEFAULT_IST, DF),
+	[CONDRANGE(NMI)]	= ESTACK_PAGE(NMI_IST, NMI),
+	[PAGERANGE(DB)]		= ESTACK_PAGE(DEBUG_IST, DB),
+	[CONDRANGE(MCE)]	= ESTACK_PAGE(MCE_IST, MCE),
 };
 
 static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long estacks, begin, end, stk = (unsigned long)stack;
+	unsigned long begin, end, stk = (unsigned long)stack;
+	const struct estack_pages *ep;
 	struct pt_regs *regs;
 	unsigned int k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
-	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
-
-	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		begin = estacks + layout[k].begin;
-		end   = estacks + layout[k].end;
-		regs  = (struct pt_regs *)end - 1;
-
-		if (stk <= begin || stk >= end)
-			continue;
-
-		info->type	= STACK_TYPE_EXCEPTION + k;
-		info->begin	= (unsigned long *)begin;
-		info->end	= (unsigned long *)end;
-		info->next_sp	= (unsigned long *)regs->sp;
-
-		return true;
-	}
-
-	return false;
+	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	end = begin + sizeof(struct cea_exception_stacks);
+	/* Bail if @stack is outside the exception stack area. */
+	if (stk <= begin || stk >= end)
+		return false;
+
+	/* Calc page offset from start of exception stacks */
+	k = (stk - begin) >> PAGE_SHIFT;
+	/* Lookup the page descriptor */
+	ep = &estack_pages[k];
+	/* Guard page? */
+	if (unlikely(!ep->size))
+		return false;
+
+	begin += (unsigned long)ep->offs;
+	end = begin + (unsigned long)ep->size;
+	regs = (struct pt_regs *)end - 1;
+
+	info->type	= ep->type;
+	info->begin	= (unsigned long *)begin;
+	info->end	= (unsigned long *)end;
+	info->next_sp	= (unsigned long *)regs->sp;
+	return true;
 }
 
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 10:19   ` [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
@ 2019-04-02 15:43     ` Josh Poimboeuf
  2019-04-02 15:48       ` Thomas Gleixner
  2019-04-03  8:02       ` Peter Zijlstra
  0 siblings, 2 replies; 43+ messages in thread
From: Josh Poimboeuf @ 2019-04-02 15:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski

On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
> The current implementation of in_exception_stack() iterates over the
> exception stacks array. Most of the time this is an useless exercise, but
> even for the actual use cases (perf and ftrace) it takes at least 2
> iterations to get to the NMI stack.
> 
> As the exception stacks and the guard pages are page aligned the loop can
> be avoided completely.
> 
> Add a initial check whether the stack pointer is inside the full exception
> stack area and leave early if not.
> 
> Create a lookup table which describes the stack area. The table index is
> the page offset from the beginning of the exception stacks. So for any
> given stack pointer the page offset is computed and a lookup in the
> description table is performed. If it is inside a guard page, return. If
> not, use the descriptor to fill in the info structure.
> 
> The table is filled at compile time with nasty macro magic and for the
> !KASAN case the interesting page descriptors exactly fit into a single
> cache line. Just the last guard page descriptor is in the next cacheline,
> but that should not be accessed in the regular case.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/dumpstack_64.c |   97 +++++++++++++++++++++++++++--------------
>  1 file changed, 66 insertions(+), 31 deletions(-)
> 
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -48,50 +48,85 @@ const char *stack_type_name(enum stack_t
>  	return NULL;
>  }
>  
> -struct estack_layout {
> -	unsigned int	begin;
> -	unsigned int	end;
> +#define ESTACK_S(st)						\
> +	(offsetof(struct cea_exception_stacks, st## _stack))
> +
> +#define ESTACK_E(st)							\
> +	(offsetof(struct cea_exception_stacks, st## _stack_guard))
> +
> +#define PAGENR(offs)	((offs) / PAGE_SIZE)
> +#define PAGERANGE(st)	PAGENR(ESTACK_S(st)) ... PAGENR(ESTACK_E(st) - 1)
> +
> +#if EXCEPTION_STKSZ == PAGE_SIZE
> +# define CONDRANGE(st)	PAGENR(ESTACK_S(st))
> +#else
> +# define CONDRANGE(st)	PAGERANGE(st)
> +#endif
> +
> +/**
> + * struct estack_pages - Page descriptor for exception stacks
> + * @offs:	Offset from the start of the exception stack area
> + * @size:	Size of the exception stack
> + * @type:	Type to store in the stack_info struct
> + */
> +struct estack_pages {
> +	u32	offs;
> +	u16	size;
> +	u16	type;
>  };
>  
> -#define	ESTACK_ENTRY(x)	{						  \
> -	.begin	= offsetof(struct cea_exception_stacks, x## _stack),	  \
> -	.end	= offsetof(struct cea_exception_stacks, x## _stack_guard) \
> +#define ESTACK_PAGE(ist, est) {				\
> +	.offs	= ESTACK_S(est),			\
> +	.size	= ESTACK_E(est) - ESTACK_S(est),	\
> +	.type	= STACK_TYPE_EXCEPTION + ist,		\
>  	}
>  
> -static const struct estack_layout layout[N_EXCEPTION_STACKS] = {
> -	[ DOUBLEFAULT_IST	]	= ESTACK_ENTRY(DF),
> -	[ NMI_IST		]	= ESTACK_ENTRY(NMI),
> -	[ DEBUG_IST		]	= ESTACK_ENTRY(DB),
> -	[ MCE_IST		]	= ESTACK_ENTRY(MCE),
> +#define ESTACK_PAGES	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
> +
> +/*
> + * Array of exception stack page descriptors. If the stack is larger than
> + * PAGE_SIZE, all pages covering a particular stack will have the same
> + * info.
> + */
> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
> +	[CONDRANGE(DF)]		= ESTACK_PAGE(DOUBLEFAULT_IST, DF),
> +	[CONDRANGE(NMI)]	= ESTACK_PAGE(NMI_IST, NMI),
> +	[PAGERANGE(DB)]		= ESTACK_PAGE(DEBUG_IST, DB),
> +	[CONDRANGE(MCE)]	= ESTACK_PAGE(MCE_IST, MCE),

It would be nice if the *_IST macro naming aligned with the struct
cea_exception_stacks field naming.  Then you could just do, e.g.
ESTACKPAGE(DF).

Also it's a bit unfortunate that some of the stack size knowledge is
hard-coded here, i.e #DB always being > 1 page and non-#DB being
sometimes 1 page.

>  };
>  
>  static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
>  {
> -	unsigned long estacks, begin, end, stk = (unsigned long)stack;
> +	unsigned long begin, end, stk = (unsigned long)stack;
> +	const struct estack_pages *ep;
>  	struct pt_regs *regs;
>  	unsigned int k;
>  
>  	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
>  
> -	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
> -
> -	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
> -		begin = estacks + layout[k].begin;
> -		end   = estacks + layout[k].end;
> -		regs  = (struct pt_regs *)end - 1;
> -
> -		if (stk <= begin || stk >= end)
> -			continue;
> -
> -		info->type	= STACK_TYPE_EXCEPTION + k;
> -		info->begin	= (unsigned long *)begin;
> -		info->end	= (unsigned long *)end;
> -		info->next_sp	= (unsigned long *)regs->sp;
> -
> -		return true;
> -	}
> -
> -	return false;
> +	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
> +	end = begin + sizeof(struct cea_exception_stacks);
> +	/* Bail if @stack is outside the exception stack area. */
> +	if (stk <= begin || stk >= end)
> +		return false;

This check is the most important piece.  Exception stack dumps are quite
rare, so this ensures an early exit in most cases regardless of whether
there's a loop below.

> +
> +	/* Calc page offset from start of exception stacks */
> +	k = (stk - begin) >> PAGE_SHIFT;
> +	/* Lookup the page descriptor */
> +	ep = &estack_pages[k];
> +	/* Guard page? */
> +	if (unlikely(!ep->size))
> +		return false;
> +
> +	begin += (unsigned long)ep->offs;
> +	end = begin + (unsigned long)ep->size;
> +	regs = (struct pt_regs *)end - 1;
> +
> +	info->type	= ep->type;
> +	info->begin	= (unsigned long *)begin;
> +	info->end	= (unsigned long *)end;
> +	info->next_sp	= (unsigned long *)regs->sp;
> +	return true;

With the above "(stk <= begin || stk >= end)" check, removing the loop
becomes not all that important since exception stack dumps are quite
rare and not performance sensitive.  With all the macros this code
becomes a little more obtuse, so I'm not sure whether removal of the
loop is a net positive.

>  }
>  
>  static bool in_irq_stack(unsigned long *stack, struct stack_info *info)

-- 
Josh

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:43     ` Josh Poimboeuf
@ 2019-04-02 15:48       ` Thomas Gleixner
  2019-04-02 15:51         ` Josh Poimboeuf
                           ` (2 more replies)
  2019-04-03  8:02       ` Peter Zijlstra
  1 sibling, 3 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-02 15:48 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: LKML, x86, Andy Lutomirski

On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
> > +/*
> > + * Array of exception stack page descriptors. If the stack is larger than
> > + * PAGE_SIZE, all pages covering a particular stack will have the same
> > + * info.
> > + */
> > +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
> > +	[CONDRANGE(DF)]		= ESTACK_PAGE(DOUBLEFAULT_IST, DF),
> > +	[CONDRANGE(NMI)]	= ESTACK_PAGE(NMI_IST, NMI),
> > +	[PAGERANGE(DB)]		= ESTACK_PAGE(DEBUG_IST, DB),
> > +	[CONDRANGE(MCE)]	= ESTACK_PAGE(MCE_IST, MCE),
> 
> It would be nice if the *_IST macro naming aligned with the struct
> cea_exception_stacks field naming.  Then you could just do, e.g.
> ESTACKPAGE(DF).

Yes, lemme fix that up.

> Also it's a bit unfortunate that some of the stack size knowledge is
> hard-coded here, i.e #DB always being > 1 page and non-#DB being
> sometimes 1 page.

The problem is that there is no way to make this macro maze conditional on
sizeof(). But my macro foo is rusty.

> > +	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
> > +	end = begin + sizeof(struct cea_exception_stacks);
> > +	/* Bail if @stack is outside the exception stack area. */
> > +	if (stk <= begin || stk >= end)
> > +		return false;
> 
> This check is the most important piece.  Exception stack dumps are quite
> rare, so this ensures an early exit in most cases regardless of whether
> there's a loop below.
> 
> > +
> > +	/* Calc page offset from start of exception stacks */
> > +	k = (stk - begin) >> PAGE_SHIFT;
> > +	/* Lookup the page descriptor */
> > +	ep = &estack_pages[k];
> > +	/* Guard page? */
> > +	if (unlikely(!ep->size))
> > +		return false;
> > +
> > +	begin += (unsigned long)ep->offs;
> > +	end = begin + (unsigned long)ep->size;
> > +	regs = (struct pt_regs *)end - 1;
> > +
> > +	info->type	= ep->type;
> > +	info->begin	= (unsigned long *)begin;
> > +	info->end	= (unsigned long *)end;
> > +	info->next_sp	= (unsigned long *)regs->sp;
> > +	return true;
> 
> With the above "(stk <= begin || stk >= end)" check, removing the loop
> becomes not all that important since exception stack dumps are quite
> rare and not performance sensitive.  With all the macros this code
> becomes a little more obtuse, so I'm not sure whether removal of the
> loop is a net positive.

What about perf? It's NMI context and probably starts from there. Peter?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:48       ` Thomas Gleixner
@ 2019-04-02 15:51         ` Josh Poimboeuf
  2019-04-02 15:53           ` Josh Poimboeuf
  2019-04-03  8:08           ` Peter Zijlstra
  2019-04-02 16:11         ` Andy Lutomirski
  2019-04-02 19:02         ` Rasmus Villemoes
  2 siblings, 2 replies; 43+ messages in thread
From: Josh Poimboeuf @ 2019-04-02 15:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski

On Tue, Apr 02, 2019 at 05:48:56PM +0200, Thomas Gleixner wrote:
> > With the above "(stk <= begin || stk >= end)" check, removing the loop
> > becomes not all that important since exception stack dumps are quite
> > rare and not performance sensitive.  With all the macros this code
> > becomes a little more obtuse, so I'm not sure whether removal of the
> > loop is a net positive.
> 
> What about perf? It's NMI context and probably starts from there. Peter?

I believe perf unwinds starting from the regs from the context which was
interrupted by the NMI.

-- 
Josh

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:51         ` Josh Poimboeuf
@ 2019-04-02 15:53           ` Josh Poimboeuf
  2019-04-03  8:08           ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Josh Poimboeuf @ 2019-04-02 15:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Peter Zijlstra

On Tue, Apr 02, 2019 at 10:51:49AM -0500, Josh Poimboeuf wrote:
> On Tue, Apr 02, 2019 at 05:48:56PM +0200, Thomas Gleixner wrote:
> > > With the above "(stk <= begin || stk >= end)" check, removing the loop
> > > becomes not all that important since exception stack dumps are quite
> > > rare and not performance sensitive.  With all the macros this code
> > > becomes a little more obtuse, so I'm not sure whether removal of the
> > > loop is a net positive.
> > 
> > What about perf? It's NMI context and probably starts from there. Peter?
> 
> I believe perf unwinds starting from the regs from the context which was
> interrupted by the NMI.

Adding Peter to keep me honest.

-- 
Josh

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:48       ` Thomas Gleixner
  2019-04-02 15:51         ` Josh Poimboeuf
@ 2019-04-02 16:11         ` Andy Lutomirski
  2019-04-02 18:27           ` Thomas Gleixner
  2019-04-02 19:02         ` Rasmus Villemoes
  2 siblings, 1 reply; 43+ messages in thread
From: Andy Lutomirski @ 2019-04-02 16:11 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski



> On Apr 2, 2019, at 9:48 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
>>> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
>>> +/*
>>> + * Array of exception stack page descriptors. If the stack is larger than
>>> + * PAGE_SIZE, all pages covering a particular stack will have the same
>>> + * info.
>>> + */
>>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
>>> +    [CONDRANGE(DF)]        = ESTACK_PAGE(DOUBLEFAULT_IST, DF),
>>> +    [CONDRANGE(NMI)]    = ESTACK_PAGE(NMI_IST, NMI),
>>> +    [PAGERANGE(DB)]        = ESTACK_PAGE(DEBUG_IST, DB),
>>> +    [CONDRANGE(MCE)]    = ESTACK_PAGE(MCE_IST, MCE),
>> 
>> It would be nice if the *_IST macro naming aligned with the struct
>> cea_exception_stacks field naming.  Then you could just do, e.g.
>> ESTACKPAGE(DF).
> 
> Yes, lemme fix that up.
> 
>> Also it's a bit unfortunate that some of the stack size knowledge is
>> hard-coded here, i.e #DB always being > 1 page and non-#DB being
>> sometimes 1 page.
> 
> The problem is that there is no way to make this macro maze conditional on
> sizeof(). But my macro foo is rusty.

How about a much better fix: make the DB stack be the same size as all the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all, overflowing from one debug stack into another is just as much of a bug as overflowing into a different IST stack.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack
  2019-03-31 21:40 ` [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
  2019-04-01 18:03   ` Borislav Petkov
@ 2019-04-02 16:34   ` Sean Christopherson
  1 sibling, 0 replies; 43+ messages in thread
From: Sean Christopherson @ 2019-04-02 16:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf, Mitsuo Hayasaka

On Sun, Mar 31, 2019 at 11:40:21PM +0200, Thomas Gleixner wrote:
> 37fe6a42b343 ("x86: Check stack overflow in detail") added a broad check
> for the full exception stack area, i.e. it considers the full exception
> stack are as valid.
        ^^^
        area

> 
> That's wrong in two aspects:
> 
>  1) It does not check the individual areas one by one
> 
>  2) #DF, NMI and #MCE are not enabling interrupts which means that a
>     regular device interrupt cannot happen in their context. In fact if a
>     device interrupt hits one of those IST stacks that's a bug because some
>     code path enabled interrupts while handling the exception.
> 
> Limit the check to the #DB stack and consider all other IST stacks as
> 'overflow' or invalid.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
> ---
>  arch/x86/kernel/irq_64.c |   19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> --- a/arch/x86/kernel/irq_64.c
> +++ b/arch/x86/kernel/irq_64.c
> @@ -26,9 +26,18 @@ int sysctl_panic_on_stackoverflow;
>  /*
>   * Probabilistic stack overflow check:
>   *
> - * Only check the stack in process context, because everything else
> - * runs on the big interrupt stacks. Checking reliably is too expensive,
> - * so we just check from interrupts.
> + * Regular device interrupts can enter on the following stacks:
> + *
> + * - User stack
> + *
> + * - Kernel task stack
> + *
> + * - Interrupt stack if a device driver reenables interrupt
> + *   which should only happen in really old drivers.
> + *
> + * - Debug IST stack
> + *
> + * All other contexts are invalid.
>   */
>  static inline void stack_overflow_check(struct pt_regs *regs)
>  {
> @@ -53,8 +62,8 @@ static inline void stack_overflow_check(
>  		return;
>  
>  	oist = this_cpu_ptr(&orig_ist);
> -	estack_top = (u64)oist->ist[0] - EXCEPTION_STKSZ + STACK_TOP_MARGIN;
> -	estack_bottom = (u64)oist->ist[N_EXCEPTION_STACKS - 1];
> +	estack_bottom = (u64)oist->ist[DEBUG_STACK];
> +	estack_top = estack_bottom - DEBUG_STKSZ + STACK_TOP_MARGIN;
>  	if (regs->sp >= estack_top && regs->sp <= estack_bottom)
>  		return;
>  
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 04/14] x86/exceptions: Make IST index zero based
  2019-03-31 21:40 ` [patch 04/14] x86/exceptions: Make IST index zero based Thomas Gleixner
  2019-04-01  7:30   ` Peter Zijlstra
@ 2019-04-02 16:49   ` Sean Christopherson
  2019-04-03 16:35   ` Borislav Petkov
  2 siblings, 0 replies; 43+ messages in thread
From: Sean Christopherson @ 2019-04-02 16:49 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Sun, Mar 31, 2019 at 11:40:24PM +0200, Thomas Gleixner wrote:
> The defines for the exception stack (IST) array in the TSS are using the
> SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
> array indices related to IST. That's confusing at best and does not provide
> any value.
> 
> Make the indices zero based and fixup the usage sites. The only code which
> needs to adjust the 0 based index is the interrupt descriptor setup which
> needs to add 1 now.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  Documentation/x86/kernel-stacks      |    8 ++++----
>  arch/x86/entry/entry_64.S            |    4 ++--
>  arch/x86/include/asm/page_64_types.h |   13 ++++++++-----
>  arch/x86/kernel/cpu/common.c         |    4 ++--
>  arch/x86/kernel/dumpstack_64.c       |   14 +++++++-------
>  arch/x86/kernel/idt.c                |   15 +++++++++------
>  6 files changed, 32 insertions(+), 26 deletions(-)
> 
> --- a/Documentation/x86/kernel-stacks
> +++ b/Documentation/x86/kernel-stacks
> @@ -59,7 +59,7 @@ If that assumption is ever broken then t
>  
>  The currently assigned IST stacks are :-
>  
> -* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
> +* DOUBLEFAULT_IST.  EXCEPTION_STKSZ (PAGE_SIZE).
>  
>    Used for interrupt 8 - Double Fault Exception (#DF).
>  
> @@ -68,7 +68,7 @@ The currently assigned IST stacks are :-
>    Using a separate stack allows the kernel to recover from it well enough
>    in many cases to still output an oops.
>  
> -* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
> +* NMI_IST.  EXCEPTION_STKSZ (PAGE_SIZE).
>  
>    Used for non-maskable interrupts (NMI).
>  
> @@ -76,7 +76,7 @@ The currently assigned IST stacks are :-
>    middle of switching stacks.  Using IST for NMI events avoids making
>    assumptions about the previous state of the kernel stack.
>  
> -* DEBUG_STACK.  DEBUG_STKSZ
> +* DEBUG_IST.  DEBUG_STKSZ

What about <name>_STACK_IST, or is that too verbose?  It'd be nice to have
the extra mental cue that IST is referring the 64 exception stacks.

>  
>    Used for hardware debug interrupts (interrupt 1) and for software
>    debug interrupts (INT3).
> @@ -86,7 +86,7 @@ The currently assigned IST stacks are :-
>    avoids making assumptions about the previous state of the kernel
>    stack.
>  
> -* MCE_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
> +* MCE_IST.  EXCEPTION_STKSZ (PAGE_SIZE).
>  
>    Used for interrupt 18 - Machine Check Exception (#MC).
>  

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 12/14] x86/cpu: Prepare TSS.IST setup for guard pages
  2019-03-31 21:40 ` [patch 12/14] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
@ 2019-04-02 16:57   ` Sean Christopherson
  0 siblings, 0 replies; 43+ messages in thread
From: Sean Christopherson @ 2019-04-02 16:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Sun, Mar 31, 2019 at 11:40:32PM +0200, Thomas Gleixner wrote:
> Convert the TSS.IST setup code to use the cpu entry area information
> directly instead of assuming a linear mapping of the IST stacks.
> 
> The store to orig_ist[] is not longer required as there are no users
                             ^^^
                             no
> anymore.
> 
> This is the last preparatory step for IST guard pages.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 16:11         ` Andy Lutomirski
@ 2019-04-02 18:27           ` Thomas Gleixner
  2019-04-02 19:29             ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-02 18:27 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski

On Tue, 2 Apr 2019, Andy Lutomirski wrote:
> > On Apr 2, 2019, at 9:48 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> >> On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
> >>> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
> >>> +/*
> >>> + * Array of exception stack page descriptors. If the stack is larger than
> >>> + * PAGE_SIZE, all pages covering a particular stack will have the same
> >>> + * info.
> >>> + */
> >>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
> >>> +    [CONDRANGE(DF)]        = ESTACK_PAGE(DOUBLEFAULT_IST, DF),
> >>> +    [CONDRANGE(NMI)]    = ESTACK_PAGE(NMI_IST, NMI),
> >>> +    [PAGERANGE(DB)]        = ESTACK_PAGE(DEBUG_IST, DB),
> >>> +    [CONDRANGE(MCE)]    = ESTACK_PAGE(MCE_IST, MCE),
> >> 
> >> It would be nice if the *_IST macro naming aligned with the struct
> >> cea_exception_stacks field naming.  Then you could just do, e.g.
> >> ESTACKPAGE(DF).
> > 
> > Yes, lemme fix that up.
> > 
> >> Also it's a bit unfortunate that some of the stack size knowledge is
> >> hard-coded here, i.e #DB always being > 1 page and non-#DB being
> >> sometimes 1 page.
> > 
> > The problem is that there is no way to make this macro maze conditional on
> > sizeof(). But my macro foo is rusty.
> 
> How about a much better fix: make the DB stack be the same size as all
> the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all,
> overflowing from one debug stack into another is just as much of a bug as
> overflowing into a different IST stack.

That makes sense.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:48       ` Thomas Gleixner
  2019-04-02 15:51         ` Josh Poimboeuf
  2019-04-02 16:11         ` Andy Lutomirski
@ 2019-04-02 19:02         ` Rasmus Villemoes
  2019-04-02 19:21           ` Thomas Gleixner
  2 siblings, 1 reply; 43+ messages in thread
From: Rasmus Villemoes @ 2019-04-02 19:02 UTC (permalink / raw)
  To: Thomas Gleixner, Josh Poimboeuf; +Cc: LKML, x86, Andy Lutomirski

On 02/04/2019 17.48, Thomas Gleixner wrote:
> On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
>> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
>>> +/*
>>> + * Array of exception stack page descriptors. If the stack is larger than
>>> + * PAGE_SIZE, all pages covering a particular stack will have the same
>>> + * info.
>>> + */
>>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
>>> +	[CONDRANGE(DF)]		= ESTACK_PAGE(DOUBLEFAULT_IST, DF),
>>> +	[CONDRANGE(NMI)]	= ESTACK_PAGE(NMI_IST, NMI),
>>> +	[PAGERANGE(DB)]		= ESTACK_PAGE(DEBUG_IST, DB),
>>> +	[CONDRANGE(MCE)]	= ESTACK_PAGE(MCE_IST, MCE),
>>
>> It would be nice if the *_IST macro naming aligned with the struct
>> cea_exception_stacks field naming.  Then you could just do, e.g.
>> ESTACKPAGE(DF).
> 
> Yes, lemme fix that up.
> 
>> Also it's a bit unfortunate that some of the stack size knowledge is
>> hard-coded here, i.e #DB always being > 1 page and non-#DB being
>> sometimes 1 page.
> 
> The problem is that there is no way to make this macro maze conditional on
> sizeof(). But my macro foo is rusty.

Eh, but why do you need the CONDRANGE thing at all? [5 ... 5] is a
perfectly fine designator, equivalent to [5]. So you can just use
PAGERANGE in all cases, no?

Rasmus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 19:02         ` Rasmus Villemoes
@ 2019-04-02 19:21           ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-02 19:21 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski

On Tue, 2 Apr 2019, Rasmus Villemoes wrote:

> On 02/04/2019 17.48, Thomas Gleixner wrote:
> > On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
> >> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
> >>> +/*
> >>> + * Array of exception stack page descriptors. If the stack is larger than
> >>> + * PAGE_SIZE, all pages covering a particular stack will have the same
> >>> + * info.
> >>> + */
> >>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
> >>> +	[CONDRANGE(DF)]		= ESTACK_PAGE(DOUBLEFAULT_IST, DF),
> >>> +	[CONDRANGE(NMI)]	= ESTACK_PAGE(NMI_IST, NMI),
> >>> +	[PAGERANGE(DB)]		= ESTACK_PAGE(DEBUG_IST, DB),
> >>> +	[CONDRANGE(MCE)]	= ESTACK_PAGE(MCE_IST, MCE),
> >>
> >> It would be nice if the *_IST macro naming aligned with the struct
> >> cea_exception_stacks field naming.  Then you could just do, e.g.
> >> ESTACKPAGE(DF).
> > 
> > Yes, lemme fix that up.
> > 
> >> Also it's a bit unfortunate that some of the stack size knowledge is
> >> hard-coded here, i.e #DB always being > 1 page and non-#DB being
> >> sometimes 1 page.
> > 
> > The problem is that there is no way to make this macro maze conditional on
> > sizeof(). But my macro foo is rusty.
> 
> Eh, but why do you need the CONDRANGE thing at all? [5 ... 5] is a
> perfectly fine designator, equivalent to [5]. So you can just use
> PAGERANGE in all cases, no?

Indeed. I tried that before and at some point GCC barfed, but probably due
to some other slipup. The macro expansion error messages are soo helpful...

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 18:27           ` Thomas Gleixner
@ 2019-04-02 19:29             ` Thomas Gleixner
  2019-04-03  0:36               ` Andy Lutomirski
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-02 19:29 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski

On Tue, 2 Apr 2019, Thomas Gleixner wrote:
> On Tue, 2 Apr 2019, Andy Lutomirski wrote:
> > > On Apr 2, 2019, at 9:48 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > 
> > >> On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
> > >>> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
> > >>> +/*
> > >>> + * Array of exception stack page descriptors. If the stack is larger than
> > >>> + * PAGE_SIZE, all pages covering a particular stack will have the same
> > >>> + * info.
> > >>> + */
> > >>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
> > >>> +    [CONDRANGE(DF)]        = ESTACK_PAGE(DOUBLEFAULT_IST, DF),
> > >>> +    [CONDRANGE(NMI)]    = ESTACK_PAGE(NMI_IST, NMI),
> > >>> +    [PAGERANGE(DB)]        = ESTACK_PAGE(DEBUG_IST, DB),
> > >>> +    [CONDRANGE(MCE)]    = ESTACK_PAGE(MCE_IST, MCE),
> > >> 
> > >> It would be nice if the *_IST macro naming aligned with the struct
> > >> cea_exception_stacks field naming.  Then you could just do, e.g.
> > >> ESTACKPAGE(DF).
> > > 
> > > Yes, lemme fix that up.
> > > 
> > >> Also it's a bit unfortunate that some of the stack size knowledge is
> > >> hard-coded here, i.e #DB always being > 1 page and non-#DB being
> > >> sometimes 1 page.
> > > 
> > > The problem is that there is no way to make this macro maze conditional on
> > > sizeof(). But my macro foo is rusty.
> > 
> > How about a much better fix: make the DB stack be the same size as all
> > the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all,
> > overflowing from one debug stack into another is just as much of a bug as
> > overflowing into a different IST stack.
> 
> That makes sense.

Except that we just have two not four.

It needs some tweaking of the ist_shift stuff in entry_64.S but that's not
rocket science. Famous last words....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 19:29             ` Thomas Gleixner
@ 2019-04-03  0:36               ` Andy Lutomirski
  2019-04-03 16:26                 ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Lutomirski @ 2019-04-03  0:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski



> On Apr 2, 2019, at 1:29 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Tue, 2 Apr 2019, Thomas Gleixner wrote:
>> On Tue, 2 Apr 2019, Andy Lutomirski wrote:
>>>> On Apr 2, 2019, at 9:48 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> 
>>>>>> On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
>>>>>> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
>>>>>> +/*
>>>>>> + * Array of exception stack page descriptors. If the stack is larger than
>>>>>> + * PAGE_SIZE, all pages covering a particular stack will have the same
>>>>>> + * info.
>>>>>> + */
>>>>>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
>>>>>> +    [CONDRANGE(DF)]        = ESTACK_PAGE(DOUBLEFAULT_IST, DF),
>>>>>> +    [CONDRANGE(NMI)]    = ESTACK_PAGE(NMI_IST, NMI),
>>>>>> +    [PAGERANGE(DB)]        = ESTACK_PAGE(DEBUG_IST, DB),
>>>>>> +    [CONDRANGE(MCE)]    = ESTACK_PAGE(MCE_IST, MCE),
>>>>> 
>>>>> It would be nice if the *_IST macro naming aligned with the struct
>>>>> cea_exception_stacks field naming.  Then you could just do, e.g.
>>>>> ESTACKPAGE(DF).
>>>> 
>>>> Yes, lemme fix that up.
>>>> 
>>>>> Also it's a bit unfortunate that some of the stack size knowledge is
>>>>> hard-coded here, i.e #DB always being > 1 page and non-#DB being
>>>>> sometimes 1 page.
>>>> 
>>>> The problem is that there is no way to make this macro maze conditional on
>>>> sizeof(). But my macro foo is rusty.
>>> 
>>> How about a much better fix: make the DB stack be the same size as all
>>> the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all,
>>> overflowing from one debug stack into another is just as much of a bug as
>>> overflowing into a different IST stack.
>> 
>> That makes sense.
> 
> Except that we just have two not four.
> 
> It needs some tweaking of the ist_shift stuff in entry_64.S but that's not
> rocket science. Famous last words....
> 

The ist_shift mess should probably be in C, but that’s a big can of worms. That being said, why do we have it at all?  Once upon a time, we’d do ICEBP from user mode (or a legit breakpoint), then send a signal and hit a data breakpoint, and we’d recurse.  But we don’t run user debug handlers on the IST stack at all anymore.

Maybe we can convince ourselves it’s safe?

What we should do is check, on IST return, that we’re not about to return to our own stack.  Then we can at least properly panic.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:43     ` Josh Poimboeuf
  2019-04-02 15:48       ` Thomas Gleixner
@ 2019-04-03  8:02       ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2019-04-03  8:02 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Thomas Gleixner, LKML, x86, Andy Lutomirski

On Tue, Apr 02, 2019 at 10:43:29AM -0500, Josh Poimboeuf wrote:
> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
> >  static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
> >  {
> > -	unsigned long estacks, begin, end, stk = (unsigned long)stack;
> > +	unsigned long begin, end, stk = (unsigned long)stack;
> > +	const struct estack_pages *ep;
> >  	struct pt_regs *regs;
> >  	unsigned int k;
> >  
> >  	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
> >  
> > -	estacks = (unsigned long)__this_cpu_read(cea_exception_stacks);
> > -
> > -	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
> > -		begin = estacks + layout[k].begin;
> > -		end   = estacks + layout[k].end;
> > -		regs  = (struct pt_regs *)end - 1;
> > -
> > -		if (stk <= begin || stk >= end)
> > -			continue;
> > -
> > -		info->type	= STACK_TYPE_EXCEPTION + k;
> > -		info->begin	= (unsigned long *)begin;
> > -		info->end	= (unsigned long *)end;
> > -		info->next_sp	= (unsigned long *)regs->sp;
> > -
> > -		return true;
> > -	}
> > -
> > -	return false;
> > +	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
> > +	end = begin + sizeof(struct cea_exception_stacks);
> > +	/* Bail if @stack is outside the exception stack area. */
> > +	if (stk <= begin || stk >= end)
> > +		return false;
> 
> This check is the most important piece.  Exception stack dumps are quite
> rare, so this ensures an early exit in most cases regardless of whether
> there's a loop below.
> 
> > +
> > +	/* Calc page offset from start of exception stacks */
> > +	k = (stk - begin) >> PAGE_SHIFT;
> > +	/* Lookup the page descriptor */
> > +	ep = &estack_pages[k];
> > +	/* Guard page? */
> > +	if (unlikely(!ep->size))
> > +		return false;
> > +
> > +	begin += (unsigned long)ep->offs;
> > +	end = begin + (unsigned long)ep->size;
> > +	regs = (struct pt_regs *)end - 1;
> > +
> > +	info->type	= ep->type;
> > +	info->begin	= (unsigned long *)begin;
> > +	info->end	= (unsigned long *)end;
> > +	info->next_sp	= (unsigned long *)regs->sp;
> > +	return true;
> 
> With the above "(stk <= begin || stk >= end)" check, removing the loop
> becomes not all that important since exception stack dumps are quite
> rare and not performance sensitive.  With all the macros this code
> becomes a little more obtuse, so I'm not sure whether removal of the
> loop is a net positive.

Are you sure; perf does a lot of stack dumps from NMI context.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-02 15:51         ` Josh Poimboeuf
  2019-04-02 15:53           ` Josh Poimboeuf
@ 2019-04-03  8:08           ` Peter Zijlstra
  2019-04-03  8:10             ` Peter Zijlstra
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-04-03  8:08 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Thomas Gleixner, LKML, x86, Andy Lutomirski

On Tue, Apr 02, 2019 at 10:51:49AM -0500, Josh Poimboeuf wrote:
> On Tue, Apr 02, 2019 at 05:48:56PM +0200, Thomas Gleixner wrote:
> > > With the above "(stk <= begin || stk >= end)" check, removing the loop
> > > becomes not all that important since exception stack dumps are quite
> > > rare and not performance sensitive.  With all the macros this code
> > > becomes a little more obtuse, so I'm not sure whether removal of the
> > > loop is a net positive.
> > 
> > What about perf? It's NMI context and probably starts from there. Peter?
> 
> I believe perf unwinds starting from the regs from the context which was
> interrupted by the NMI.

Aah, indeed. So then we only see exception stacks when the NMI lands in
an exception, which is, as you say, quite rare.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-03  8:08           ` Peter Zijlstra
@ 2019-04-03  8:10             ` Peter Zijlstra
  2019-04-03 15:11               ` Josh Poimboeuf
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-04-03  8:10 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Thomas Gleixner, LKML, x86, Andy Lutomirski

On Wed, Apr 03, 2019 at 10:08:28AM +0200, Peter Zijlstra wrote:
> On Tue, Apr 02, 2019 at 10:51:49AM -0500, Josh Poimboeuf wrote:
> > On Tue, Apr 02, 2019 at 05:48:56PM +0200, Thomas Gleixner wrote:
> > > > With the above "(stk <= begin || stk >= end)" check, removing the loop
> > > > becomes not all that important since exception stack dumps are quite
> > > > rare and not performance sensitive.  With all the macros this code
> > > > becomes a little more obtuse, so I'm not sure whether removal of the
> > > > loop is a net positive.
> > > 
> > > What about perf? It's NMI context and probably starts from there. Peter?
> > 
> > I believe perf unwinds starting from the regs from the context which was
> > interrupted by the NMI.
> 
> Aah, indeed. So then we only see exception stacks when the NMI lands in
> an exception, which is, as you say, quite rare.

Aah, ftrace OTOH might still trigger this lots. When you do function
tracer with stacktrace enabled it'll do unwinds _everywhere_.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-03  8:10             ` Peter Zijlstra
@ 2019-04-03 15:11               ` Josh Poimboeuf
  0 siblings, 0 replies; 43+ messages in thread
From: Josh Poimboeuf @ 2019-04-03 15:11 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, x86, Andy Lutomirski

On Wed, Apr 03, 2019 at 10:10:41AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 03, 2019 at 10:08:28AM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 02, 2019 at 10:51:49AM -0500, Josh Poimboeuf wrote:
> > > On Tue, Apr 02, 2019 at 05:48:56PM +0200, Thomas Gleixner wrote:
> > > > > With the above "(stk <= begin || stk >= end)" check, removing the loop
> > > > > becomes not all that important since exception stack dumps are quite
> > > > > rare and not performance sensitive.  With all the macros this code
> > > > > becomes a little more obtuse, so I'm not sure whether removal of the
> > > > > loop is a net positive.
> > > > 
> > > > What about perf? It's NMI context and probably starts from there. Peter?
> > > 
> > > I believe perf unwinds starting from the regs from the context which was
> > > interrupted by the NMI.
> > 
> > Aah, indeed. So then we only see exception stacks when the NMI lands in
> > an exception, which is, as you say, quite rare.
> 
> Aah, ftrace OTOH might still trigger this lots. When you do function
> tracer with stacktrace enabled it'll do unwinds _everywhere_.

Even then, ftrace stacktrace will be really slow regardless, and this
loop removal would be a tiny performance improvement for a tiny fraction
of those stack traces.  Unless the improvement is measurable I would
personally rather err on the side of code readability.

-- 
Josh

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-03  0:36               ` Andy Lutomirski
@ 2019-04-03 16:26                 ` Thomas Gleixner
  2019-04-03 19:42                   ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-03 16:26 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Tue, 2 Apr 2019, Andy Lutomirski wrote:
> > On Apr 2, 2019, at 1:29 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >>> How about a much better fix: make the DB stack be the same size as all
> >>> the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all,
> >>> overflowing from one debug stack into another is just as much of a bug as
> >>> overflowing into a different IST stack.
> >> 
> >> That makes sense.
> > 
> > Except that we just have two not four.
> > 
> > It needs some tweaking of the ist_shift stuff in entry_64.S but that's not
> > rocket science. Famous last words....
> > 
> 
> The ist_shift mess should probably be in C, but that’s a big can of
> worms. That being said, why do we have it at all?  Once upon a time, we’d
> do ICEBP from user mode (or a legit breakpoint), then send a signal and
> hit a data breakpoint, and we’d recurse.  But we don’t run user debug
> handlers on the IST stack at all anymore.
>
> Maybe we can convince ourselves it’s safe?

Maybe. Need to think about it for a while.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 00/14] x86/exceptions: Add guard patches to IST stacks
  2019-04-01  4:03 ` [patch 00/14] x86/exceptions: Add guard patches to IST stacks Andy Lutomirski
@ 2019-04-03 16:30   ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-03 16:30 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Josh Poimboeuf

On Sun, 31 Mar 2019, Andy Lutomirski wrote:
> On Sun, Mar 31, 2019 at 3:10 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > While looking for something different I stumbled over the comment in struct
> > cpu_entry_area:
> >
> >          * Exception stacks used for IST entries.
> >          *
> >          * In the future, this should have a separate slot for each stack
> >          * with guard pages between them.
> >
> > As usual with such comments they are added in good faith and then
> > forgotten. Looking what it takes to fix that let me stumble over some other
> > leftovers like orig_ist[], now unused macros, useless defines and a bunch
> > of assumptions about the exception stacks being a big lump. Aside of that I
> > found a too broad check of the exception stack in the x86/64 stack overflow
> > detector.
> >
> > The following series cleans that up and gradually prepares for guard pages
> > between the IST stacks.
> 
> Thanks!  I'll review this over the next couple days.
> 
> Meanwhile, if you're inspired, I have a WIP series to do the same
> thing to the IRQ stacks here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages
> 
> Want to take a look or pick it up if you want to keep working on this?

I grabbed the lot and addressed the todo's there. Not completely done
though, but it builds and boots :)

With all stacks having guard pages now, the stack_overflow_check() in
irq_64.c is kind of pointless. When the kernel overflows any of the stacks
independent of what we do with DB (we at least split it into 2 different
valid stacks) then it hits a guard page and dies. Mission accomplished....

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 04/14] x86/exceptions: Make IST index zero based
  2019-03-31 21:40 ` [patch 04/14] x86/exceptions: Make IST index zero based Thomas Gleixner
  2019-04-01  7:30   ` Peter Zijlstra
  2019-04-02 16:49   ` Sean Christopherson
@ 2019-04-03 16:35   ` Borislav Petkov
  2 siblings, 0 replies; 43+ messages in thread
From: Borislav Petkov @ 2019-04-03 16:35 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Andy Lutomirski, Josh Poimboeuf

On Sun, Mar 31, 2019 at 11:40:24PM +0200, Thomas Gleixner wrote:
> The defines for the exception stack (IST) array in the TSS are using the
> SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
> array indices related to IST. That's confusing at best and does not provide
> any value.
> 
> Make the indices zero based and fixup the usage sites. The only code which
> needs to adjust the 0 based index is the interrupt descriptor setup which
> needs to add 1 now.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  Documentation/x86/kernel-stacks      |    8 ++++----
>  arch/x86/entry/entry_64.S            |    4 ++--
>  arch/x86/include/asm/page_64_types.h |   13 ++++++++-----
>  arch/x86/kernel/cpu/common.c         |    4 ++--
>  arch/x86/kernel/dumpstack_64.c       |   14 +++++++-------
>  arch/x86/kernel/idt.c                |   15 +++++++++------
>  6 files changed, 32 insertions(+), 26 deletions(-)

With the below hunk added:

Reviewed-by: Borislav Petkov <bp@suse.de>

---
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 667f1da36208..1e340adf65e8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -793,7 +793,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
        if (is_vmalloc_addr((void *)address) &&
            (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
             address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-               unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
+               unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_IST]) - sizeof(void *);
                /*
                 * We're likely to be running with very little stack space
                 * left.  It's plausible that we'd hit this condition but


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-03 16:26                 ` Thomas Gleixner
@ 2019-04-03 19:42                   ` Thomas Gleixner
  2019-04-04  0:03                     ` Andy Lutomirski
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-04-03 19:42 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Josh Poimboeuf, LKML, x86, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On Wed, 3 Apr 2019, Thomas Gleixner wrote:
> On Tue, 2 Apr 2019, Andy Lutomirski wrote:
> > > On Apr 2, 2019, at 1:29 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >>> How about a much better fix: make the DB stack be the same size as all
> > >>> the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all,
> > >>> overflowing from one debug stack into another is just as much of a bug as
> > >>> overflowing into a different IST stack.
> > >> 
> > >> That makes sense.
> > > 
> > > Except that we just have two not four.
> > > 
> > > It needs some tweaking of the ist_shift stuff in entry_64.S but that's not
> > > rocket science. Famous last words....
> > > 
> > 
> > The ist_shift mess should probably be in C, but that’s a big can of
> > worms. That being said, why do we have it at all?  Once upon a time, we’d
> > do ICEBP from user mode (or a legit breakpoint), then send a signal and
> > hit a data breakpoint, and we’d recurse.  But we don’t run user debug
> > handlers on the IST stack at all anymore.
> >
> > Maybe we can convince ourselves it’s safe?
> 
> Maybe. Need to think about it for a while.

What about kprobes. It has nasty reentrancy stuff as well...

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()
  2019-04-03 19:42                   ` Thomas Gleixner
@ 2019-04-04  0:03                     ` Andy Lutomirski
  0 siblings, 0 replies; 43+ messages in thread
From: Andy Lutomirski @ 2019-04-04  0:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Josh Poimboeuf, LKML, X86 ML, Andy Lutomirski

On Wed, Apr 3, 2019 at 12:42 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, 3 Apr 2019, Thomas Gleixner wrote:
> > On Tue, 2 Apr 2019, Andy Lutomirski wrote:
> > > > On Apr 2, 2019, at 1:29 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >>> How about a much better fix: make the DB stack be the same size as all
> > > >>> the others and just have 4 of them (DB0, DB1, DB2, and DB3.  After all,
> > > >>> overflowing from one debug stack into another is just as much of a bug as
> > > >>> overflowing into a different IST stack.
> > > >>
> > > >> That makes sense.
> > > >
> > > > Except that we just have two not four.
> > > >
> > > > It needs some tweaking of the ist_shift stuff in entry_64.S but that's not
> > > > rocket science. Famous last words....
> > > >
> > >
> > > The ist_shift mess should probably be in C, but that’s a big can of
> > > worms. That being said, why do we have it at all?  Once upon a time, we’d
> > > do ICEBP from user mode (or a legit breakpoint), then send a signal and
> > > hit a data breakpoint, and we’d recurse.  But we don’t run user debug
> > > handlers on the IST stack at all anymore.
> > >
> > > Maybe we can convince ourselves it’s safe?
> >
> > Maybe. Need to think about it for a while.
>
> What about kprobes. It has nasty reentrancy stuff as well...
>

Hmm.  We used to have #BP on the same stack, and I bet there were
plenty of ways to get #DB and #BP inside each other.

I bet the best solution is to set dr7 to 0 before we do anything
complicated in do_debug() (at least if we got there from kernel mode).
But we should probably make this a whole separate project after your
series is done.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-04-04  0:03 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-31 21:40 [patch 00/14] x86/exceptions: Add guard patches to IST stacks Thomas Gleixner
2019-03-31 21:40 ` [patch 01/14] x86/irq/64: Limit IST stack overflow check to #DB stack Thomas Gleixner
2019-04-01 18:03   ` Borislav Petkov
2019-04-02 16:34   ` Sean Christopherson
2019-03-31 21:40 ` [patch 02/14] x86/idt: Remove unused macro SISTG Thomas Gleixner
2019-04-01  4:04   ` Andy Lutomirski
2019-03-31 21:40 ` [patch 03/14] x86/exceptions: Remove unused stack defines on 32bit Thomas Gleixner
2019-03-31 21:40 ` [patch 04/14] x86/exceptions: Make IST index zero based Thomas Gleixner
2019-04-01  7:30   ` Peter Zijlstra
2019-04-01  7:33     ` Thomas Gleixner
2019-04-02 16:49   ` Sean Christopherson
2019-04-03 16:35   ` Borislav Petkov
2019-03-31 21:40 ` [patch 05/14] x86/cpu_entry_area: Cleanup setup functions Thomas Gleixner
2019-03-31 21:40 ` [patch 06/14] x86/exceptions: Add structs for exception stacks Thomas Gleixner
2019-03-31 21:40 ` [patch 07/14] x86/cpu_entry_area: Prepare for IST guard pages Thomas Gleixner
2019-03-31 21:40 ` [patch 08/14] x86/cpu_entry_area: Provide exception stack accessor Thomas Gleixner
2019-03-31 21:40 ` [patch 09/14] x86/traps: Use cpu_entry_area instead of orig_ist Thomas Gleixner
2019-03-31 21:40 ` [patch 10/14] x86/irq/64: Use cpu entry area " Thomas Gleixner
2019-03-31 21:40 ` [patch 11/14] x86/dumpstack/64: Use cpu_entry_area " Thomas Gleixner
2019-03-31 21:40 ` [patch 12/14] x86/cpu: Prepare TSS.IST setup for guard pages Thomas Gleixner
2019-04-02 16:57   ` Sean Christopherson
2019-03-31 21:40 ` [patch 13/14] x86/cpu: Remove orig_ist array Thomas Gleixner
2019-03-31 21:40 ` [patch 14/14] x86/exceptions: Enable IST guard pages Thomas Gleixner
2019-04-02 10:19   ` [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack() Thomas Gleixner
2019-04-02 15:43     ` Josh Poimboeuf
2019-04-02 15:48       ` Thomas Gleixner
2019-04-02 15:51         ` Josh Poimboeuf
2019-04-02 15:53           ` Josh Poimboeuf
2019-04-03  8:08           ` Peter Zijlstra
2019-04-03  8:10             ` Peter Zijlstra
2019-04-03 15:11               ` Josh Poimboeuf
2019-04-02 16:11         ` Andy Lutomirski
2019-04-02 18:27           ` Thomas Gleixner
2019-04-02 19:29             ` Thomas Gleixner
2019-04-03  0:36               ` Andy Lutomirski
2019-04-03 16:26                 ` Thomas Gleixner
2019-04-03 19:42                   ` Thomas Gleixner
2019-04-04  0:03                     ` Andy Lutomirski
2019-04-02 19:02         ` Rasmus Villemoes
2019-04-02 19:21           ` Thomas Gleixner
2019-04-03  8:02       ` Peter Zijlstra
2019-04-01  4:03 ` [patch 00/14] x86/exceptions: Add guard patches to IST stacks Andy Lutomirski
2019-04-03 16:30   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.