All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/7] Prep code for better stack switching
@ 2017-11-11  4:05 Andy Lutomirski
  2017-11-11  4:05 ` [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack Andy Lutomirski
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This isn't quite done (the TSS remap patch is busted on 32-bit, but
that's a straightforward fix), but it should be ready for at least a
conceptual review.

The idea here is to prepare us to have all kernel data needed for
user mode execution and early entry located in the fixmap.  To do
this, I hijack the GDT remap mechanism and make it more general.  I
add a struct cpu_entry_area.  This struct is never instantiated
directly.  Instead, it represents the layout of a per-cpu portion of
the fixmap.  That portion contains the GDT, the TSS (including IO
bitmap), and the entry stack (for now just a part of the TSS
region).  It should also end up containing the PEBS and BTS buffers.

If this works, then the idea would be to add a magic *executable* page
to cpu_entry_area.  That page would contain a stub like this:

ENTRY(entry_SYSCALL_64_trampoline)
	UNWIND_HINT_EMPTY
	movq	%rsp, 0x1000+entry_SYSCALL_64_trampoline-1f(%rip)
1:
	movq	0x1008+entry_SYSCALL_64_trampoline-1f(%rip), %rsp
1:
	pushq	%rdi
	pushq	%rsi
	movq	0x1000+entry_SYSCALL_64_trampoline-1f(%rip), %rsi
1:
	movq	$entry_SYSCALL_64, %rdi
	jmp	*%rdi
END(entry_SYSCALL_64_trampoline)

(Those offsets are made up.  In real life, they'd be computed using
asm-offsets so they refer to the top word of the entry stack and to
the word that contains the real kernel stack address, respectively.)

We'd now enter entry_SYSCALL_64 (probably renamed) on the real task
stack, with user RDI and RSI on that stack (and in need of popping)
and with user RSP in RSI.  This is weird, but it gives us some major
benefits:

 - This entire sequence works without any %gs prefixes and without
   touching the conventional percpu mappings.  This means that it
   will work without mapping any conventional percpu data.  That
   removes a considerable amount of complexity in Dave's series and
   also removes a giant kASLR hole in that Dave's series, as is,
   leaks the location of all the percpu mappings.

 - We run the SYSCALL entry code in a context in which it has
   easy access to scratch space for its CR3 shenanigans.

 - I've carefully done this without needing access to the
   cpu_entry_area from the post-trampoline entry code.  Finding
   it would require awkward calculations, a percpu load from
   an otherwise unneeded cacheline, or a potentially unfortunate
   load of the valule we just stored from a different VA alias.  I
   imagine that the last one is nasy from a microarchitectural
   perspective.

I'd really like to do this in a way that makes it optional so that,
if KAISER is disabled, we don't take the TLB miss overhead, which
probably outweighs the minor speedup that we no longer stall on
SWAPGS.  OTOH, it might end up benchmarking faster than the current
code, since, while it's harder on I$ and the TLB, it's easier on D$
(avoids two conventional percpu accesses, instead using a cacheline
that's needed anyway for the stack0.

The same exact treatment is used for SYSCALL32.

If I didn't forget some detail, this would allow KAISER to function
with only the fixmap, the entry text, and the espfix64 junk mapped.
Down the road, we could further tweak it to get rid of the entry
text too by moving all the CR3-switching code into the fixmap.

The ORC unwinder would need to learn about this special case to be
able to unwind an NMI that hits in the trampoline.  Or maybe we
don't care.  kallsyms might also want to hackery to recognize
the trampoline for perf's benefit.

Open questions:

 - Should the entry stack be anywhere near as big as I made it here?
   If I keep it very small, then inappropriate uses of it would be
   immediately detected as (properly backtraced) double faults.

 - Something should IMO complain very loudly, at least with debugging on,
   if we accidentally schedule from the entry stack.  As is, it causes
   huge corruption but doesn't immediately die.

 - This is incompatible with the PIE effort.  We'd have to use movabs
   instead of movq, but I don't know whether the tooling can handle
   the resulting relocation.

Andy Lutomirski (7):
  x86/asm/64: Allocate and enable the SYSENTER stack
  x86/gdt: Put per-cpu GDT remaps in ascending order
  x86/fixmap: Generalize the GDT fixmap mechanism
  x86/asm: Fix assumptions that the HW TSS is at the beginning of
    cpu_tss
  x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix
    alignment
  x86/asm: Remap the TSS into the cpu entry area
  x86/unwind/64: Add support for the SYSENTER stack

 arch/x86/entry/entry_64_compat.S  |  2 +-
 arch/x86/include/asm/desc.h       | 11 ++--------
 arch/x86/include/asm/fixmap.h     | 43 +++++++++++++++++++++++++++++++++++--
 arch/x86/include/asm/processor.h  | 25 +++++++++++-----------
 arch/x86/include/asm/stacktrace.h |  1 +
 arch/x86/kernel/asm-offsets.c     |  5 +++++
 arch/x86/kernel/asm-offsets_32.c  |  5 -----
 arch/x86/kernel/cpu/common.c      | 45 +++++++++++++++++++++++++++++----------
 arch/x86/kernel/doublefault.c     | 36 +++++++++++++++----------------
 arch/x86/kernel/dumpstack_32.c    |  3 +++
 arch/x86/kernel/dumpstack_64.c    | 23 ++++++++++++++++++++
 arch/x86/kernel/process.c         |  2 --
 arch/x86/kernel/traps.c           |  3 +--
 arch/x86/power/cpu.c              | 16 ++++++++------
 arch/x86/xen/mmu_pv.c             |  2 +-
 15 files changed, 151 insertions(+), 71 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-13 19:07   ` Dave Hansen
  2017-11-11  4:05 ` [RFC 2/7] x86/gdt: Put per-cpu GDT remaps in ascending order Andy Lutomirski
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This will simplify some future code changes that will want some
temporary stack space in more places.  It also lets us get rid of a
SWAPGS_UNSAFE_STACK user.

This does not depend on CONFIG_IA32_EMULATION because we'll want the
stack space even without IA32 emulation.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64_compat.S | 2 +-
 arch/x86/include/asm/processor.h | 3 ---
 arch/x86/kernel/asm-offsets.c    | 5 +++++
 arch/x86/kernel/asm-offsets_32.c | 5 -----
 arch/x86/kernel/cpu/common.c     | 4 +++-
 arch/x86/kernel/process.c        | 2 --
 arch/x86/kernel/traps.c          | 3 +--
 7 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 932b96ce1b06..36c508c21480 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -47,7 +47,7 @@
  */
 ENTRY(entry_SYSENTER_compat)
 	/* Interrupts are off on entry. */
-	SWAPGS_UNSAFE_STACK
+	SWAPGS
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 	/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f10dae14f951..0644f888b12c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -338,14 +338,11 @@ struct tss_struct {
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
 
-#ifdef CONFIG_X86_32
 	/*
 	 * Space for the temporary SYSENTER stack.
 	 */
 	unsigned long		SYSENTER_stack_canary;
 	unsigned long		SYSENTER_stack[64];
-#endif
-
 } ____cacheline_aligned;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index de827d6ac8c2..031bd35bd911 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -92,4 +92,9 @@ void common(void) {
 
 	BLANK();
 	DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
+
+	/* Offset from cpu_tss to SYSENTER_stack */
+	OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack);
+	/* Size of SYSENTER_stack */
+	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 }
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 710edab9e644..6c683edf7015 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -49,11 +49,6 @@ void foo(void)
 	DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
 	       offsetofend(struct tss_struct, SYSENTER_stack));
 
-	/* Offset from cpu_tss to SYSENTER_stack */
-	OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack);
-	/* Size of SYSENTER_stack */
-	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
-
 #ifdef CONFIG_CC_STACKPROTECTOR
 	BLANK();
 	OFFSET(stack_canary_offset, stack_canary, canary);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cdf79ab628c2..22f542170198 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1361,7 +1361,9 @@ void syscall_init(void)
 	 * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
 	 */
 	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
-	wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+	wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
+		    (unsigned long)this_cpu_ptr(&cpu_tss) +
+		    offsetofend(struct tss_struct, SYSENTER_stack));
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
 #else
 	wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ff8a9acbcf8b..b49c78b73699 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -70,9 +70,7 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 	  */
 	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },
 #endif
-#ifdef CONFIG_X86_32
 	.SYSENTER_stack_canary	= STACK_END_MAGIC,
-#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 42a9c4458f5d..01a9e25ee9d8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -790,14 +790,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	debug_stack_usage_dec();
 
 exit:
-#if defined(CONFIG_X86_32)
 	/*
 	 * This is the most likely code path that involves non-trivial use
 	 * of the SYSENTER stack.  Check that we haven't overrun it.
 	 */
 	WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
 	     "Overran or corrupted SYSENTER stack\n");
-#endif
+
 	ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 2/7] x86/gdt: Put per-cpu GDT remaps in ascending order
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
  2017-11-11  4:05 ` [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-11  4:05 ` [RFC 3/7] x86/fixmap: Generalize the GDT fixmap mechanism Andy Lutomirski
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

We currently have CPU 0's GDT at the top of the GDT range and
higher-numbered CPUs at lower addreses.  This happens because the
fixmap is upside down (index 0 is the top of the fixmap).

Flip it so that GDTs are in ascending order by virtual address.
This will simplify a future patch that will generalize the GDT
remap to contain multiple pages.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 9d0e13738ed3..02a414cda30f 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -62,7 +62,7 @@ static inline struct desc_struct *get_current_gdt_rw(void)
 /* Get the fixmap index for a specific processor */
 static inline unsigned int get_cpu_gdt_ro_index(int cpu)
 {
-	return FIX_GDT_REMAP_BEGIN + cpu;
+	return FIX_GDT_REMAP_END - cpu;
 }
 
 /* Provide the fixmap address of the remapped GDT */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 3/7] x86/fixmap: Generalize the GDT fixmap mechanism
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
  2017-11-11  4:05 ` [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack Andy Lutomirski
  2017-11-11  4:05 ` [RFC 2/7] x86/gdt: Put per-cpu GDT remaps in ascending order Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-11  4:05 ` [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss Andy Lutomirski
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Currently, the GDT is an ad-hoc array of pages, one per CPU, in the
fixmap.  Generalize it to be an array of a new struct cpu_entry_area
so that we can cleanly add new things to it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h   |  9 +--------
 arch/x86/include/asm/fixmap.h | 36 ++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/common.c  | 14 +++++++-------
 arch/x86/xen/mmu_pv.c         |  2 +-
 4 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 02a414cda30f..92ad5c354d11 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -59,17 +59,10 @@ static inline struct desc_struct *get_current_gdt_rw(void)
 	return this_cpu_ptr(&gdt_page)->gdt;
 }
 
-/* Get the fixmap index for a specific processor */
-static inline unsigned int get_cpu_gdt_ro_index(int cpu)
-{
-	return FIX_GDT_REMAP_END - cpu;
-}
-
 /* Provide the fixmap address of the remapped GDT */
 static inline struct desc_struct *get_cpu_gdt_ro(int cpu)
 {
-	unsigned int idx = get_cpu_gdt_ro_index(cpu);
-	return (struct desc_struct *)__fix_to_virt(idx);
+	return (struct desc_struct *)&get_cpu_entry_area(cpu)->gdt;
 }
 
 /* Provide the current read-only GDT */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index dcd9fb55e679..fbc9b7f4e35e 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -44,6 +44,17 @@ extern unsigned long __FIXADDR_TOP;
 			 PAGE_SIZE)
 #endif
 
+/*
+ * cpu_entry_area is a percpu region in the fixmap that contains things
+ * need by the CPU and early entry/exit code.  Real types aren't used here
+ * to avoid circular header dependencies.
+ */
+struct cpu_entry_area
+{
+	char gdt[PAGE_SIZE];
+};
+
+#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE)
 
 /*
  * Here we define all the compile-time 'special' virtual
@@ -101,8 +112,8 @@ enum fixed_addresses {
 	FIX_LNW_VRTC,
 #endif
 	/* Fixmap entries to remap the GDTs, one per processor. */
-	FIX_GDT_REMAP_BEGIN,
-	FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+	FIX_CPU_ENTRY_AREA_TOP,
+	FIX_CPU_ENTRY_AREA_BOTTOM = FIX_CPU_ENTRY_AREA_TOP + (CPU_ENTRY_AREA_PAGES * NR_CPUS) - 1,
 
 	__end_of_permanent_fixed_addresses,
 
@@ -185,5 +196,26 @@ void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
 void __early_set_fixmap(enum fixed_addresses idx,
 			phys_addr_t phys, pgprot_t flags);
 
+static inline unsigned int __get_cpu_entry_area_page_index(int cpu, int page)
+{
+	BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0);
+
+	return FIX_CPU_ENTRY_AREA_BOTTOM - cpu*CPU_ENTRY_AREA_PAGES - page;
+}
+
+#define __get_cpu_entry_area_offset_index(cpu, offset) ({		\
+	BUILD_BUG_ON(offset % PAGE_SIZE != 0);				\
+	__get_cpu_entry_area_page_index(cpu, offset / PAGE_SIZE);	\
+	})
+
+#define get_cpu_entry_area_index(cpu, field)				\
+	__get_cpu_entry_area_offset_index((cpu), offsetof(struct cpu_entry_area, field))
+
+static inline struct cpu_entry_area *get_cpu_entry_area(int cpu)
+{
+	return (struct cpu_entry_area *)
+		__fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0));
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_FIXMAP_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22f542170198..2cb394dc4153 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -466,12 +466,12 @@ void load_percpu_segment(int cpu)
 	load_stack_canary_segment();
 }
 
-/* Setup the fixmap mapping only once per-processor */
-static inline void setup_fixmap_gdt(int cpu)
+/* Setup the fixmap mappings only once per-processor */
+static inline void setup_cpu_entry_area(int cpu)
 {
 #ifdef CONFIG_X86_64
 	/* On 64-bit systems, we use a read-only fixmap GDT. */
-	pgprot_t prot = PAGE_KERNEL_RO;
+	pgprot_t gdt_prot = PAGE_KERNEL_RO;
 #else
 	/*
 	 * On native 32-bit systems, the GDT cannot be read-only because
@@ -482,11 +482,11 @@ static inline void setup_fixmap_gdt(int cpu)
 	 * On Xen PV, the GDT must be read-only because the hypervisor requires
 	 * it.
 	 */
-	pgprot_t prot = boot_cpu_has(X86_FEATURE_XENPV) ?
+	pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ?
 		PAGE_KERNEL_RO : PAGE_KERNEL;
 #endif
 
-	__set_fixmap(get_cpu_gdt_ro_index(cpu), get_cpu_gdt_paddr(cpu), prot);
+	__set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot);
 }
 
 /* Load the original GDT from the per-cpu structure */
@@ -1589,7 +1589,7 @@ void cpu_init(void)
 	if (is_uv_system())
 		uv_cpu_init();
 
-	setup_fixmap_gdt(cpu);
+	setup_cpu_entry_area(cpu);
 	load_fixmap_gdt(cpu);
 }
 
@@ -1651,7 +1651,7 @@ void cpu_init(void)
 
 	fpu__init_cpu();
 
-	setup_fixmap_gdt(cpu);
+	setup_cpu_entry_area(cpu);
 	load_fixmap_gdt(cpu);
 }
 #endif
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 71495f1a86d7..a04157699d3b 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2311,7 +2311,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 #endif
 	case FIX_TEXT_POKE0:
 	case FIX_TEXT_POKE1:
-	case FIX_GDT_REMAP_BEGIN ... FIX_GDT_REMAP_END:
+	case FIX_CPU_ENTRY_AREA_TOP ... FIX_CPU_ENTRY_AREA_BOTTOM:
 		/* All local page mappings */
 		pte = pfn_pte(phys, prot);
 		break;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
                   ` (2 preceding siblings ...)
  2017-11-11  4:05 ` [RFC 3/7] x86/fixmap: Generalize the GDT fixmap mechanism Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-13 17:01   ` Dave Hansen
  2017-11-11  4:05 ` [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment Andy Lutomirski
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

I'm going to move SYSENTER_stack to the beginning of cpu_tss to help
detect overflow.  Before this can happen, I need to fix several code
paths that hardcode assumptions about the old layout.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h      |  2 +-
 arch/x86/include/asm/processor.h |  4 ++--
 arch/x86/kernel/cpu/common.c     |  4 ++--
 arch/x86/kernel/doublefault.c    | 36 +++++++++++++++++-------------------
 arch/x86/power/cpu.c             | 13 +++++++------
 5 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 92ad5c354d11..ad35544d9e00 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -177,7 +177,7 @@ static inline void set_tssldt_descriptor(void *d, unsigned long addr,
 #endif
 }
 
-static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
+static inline void __set_tss_desc(unsigned cpu, unsigned int entry, struct x86_hw_tss *addr)
 {
 	struct desc_struct *d = get_cpu_gdt_rw(cpu);
 	tss_desc tss;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0644f888b12c..301d41ca1fa1 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -161,7 +161,7 @@ enum cpuid_regs_idx {
 extern struct cpuinfo_x86	boot_cpu_data;
 extern struct cpuinfo_x86	new_cpu_data;
 
-extern struct tss_struct	doublefault_tss;
+extern struct x86_hw_tss	doublefault_tss;
 extern __u32			cpu_caps_cleared[NCAPINTS];
 extern __u32			cpu_caps_set[NCAPINTS];
 
@@ -321,7 +321,7 @@ struct x86_hw_tss {
 #define IO_BITMAP_BITS			65536
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
-#define IO_BITMAP_OFFSET		offsetof(struct tss_struct, io_bitmap)
+#define IO_BITMAP_OFFSET		(offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss))
 #define INVALID_IO_BITMAP_OFFSET	0x8000
 
 struct tss_struct {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2cb394dc4153..ce3b3c79fc0c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1576,7 +1576,7 @@ void cpu_init(void)
 	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
 	 * task never enters user mode.
 	 */
-	set_tss_desc(cpu, t);
+	set_tss_desc(cpu, &t->x86_tss);
 	load_TR_desc();
 
 	load_mm_ldt(&init_mm);
@@ -1634,7 +1634,7 @@ void cpu_init(void)
 	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
 	 * task never enters user mode.
 	 */
-	set_tss_desc(cpu, t);
+	set_tss_desc(cpu, &t->x86_tss);
 	load_TR_desc();
 
 	load_mm_ldt(&init_mm);
diff --git a/arch/x86/kernel/doublefault.c b/arch/x86/kernel/doublefault.c
index f9c324e08d85..a9fe79d49d39 100644
--- a/arch/x86/kernel/doublefault.c
+++ b/arch/x86/kernel/doublefault.c
@@ -49,25 +49,23 @@ static void doublefault_fn(void)
 		cpu_relax();
 }
 
-struct tss_struct doublefault_tss __cacheline_aligned = {
-	.x86_tss = {
-		.sp0		= STACK_START,
-		.ss0		= __KERNEL_DS,
-		.ldt		= 0,
-		.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
-
-		.ip		= (unsigned long) doublefault_fn,
-		/* 0x2 bit is always set */
-		.flags		= X86_EFLAGS_SF | 0x2,
-		.sp		= STACK_START,
-		.es		= __USER_DS,
-		.cs		= __KERNEL_CS,
-		.ss		= __KERNEL_DS,
-		.ds		= __USER_DS,
-		.fs		= __KERNEL_PERCPU,
-
-		.__cr3		= __pa_nodebug(swapper_pg_dir),
-	}
+struct x86_hw_tss doublefault_tss __cacheline_aligned = {
+	.sp0		= STACK_START,
+	.ss0		= __KERNEL_DS,
+	.ldt		= 0,
+	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
+
+	.ip		= (unsigned long) doublefault_fn,
+	/* 0x2 bit is always set */
+	.flags		= X86_EFLAGS_SF | 0x2,
+	.sp		= STACK_START,
+	.es		= __USER_DS,
+	.cs		= __KERNEL_CS,
+	.ss		= __KERNEL_DS,
+	.ds		= __USER_DS,
+	.fs		= __KERNEL_PERCPU,
+
+	.__cr3		= __pa_nodebug(swapper_pg_dir),
 };
 
 /* dummy for do_double_fault() call */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 84fcfde53f8f..50593e138281 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -165,12 +165,13 @@ static void fix_processor_context(void)
 	struct desc_struct *desc = get_cpu_gdt_rw(cpu);
 	tss_desc tss;
 #endif
-	set_tss_desc(cpu, t);	/*
-				 * This just modifies memory; should not be
-				 * necessary. But... This is necessary, because
-				 * 386 hardware has concept of busy TSS or some
-				 * similar stupidity.
-				 */
+
+	/*
+	 * This just modifies memory; should not be necessary. But... This is
+	 * necessary, because 386 hardware has concept of busy TSS or some
+	 * similar stupidity.
+	 */
+	set_tss_desc(cpu, &t->x86_tss);
 
 #ifdef CONFIG_X86_64
 	memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc));
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
                   ` (3 preceding siblings ...)
  2017-11-11  4:05 ` [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-11  4:11   ` Andy Lutomirski
  2017-11-13 19:19   ` Dave Hansen
  2017-11-11  4:05 ` [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area Andy Lutomirski
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

The Intel SDM says (Volume 3, 7.2.1):

   Avoid placing a page boundary in the part of the TSS that the
   processor reads during a task switch (the first 104 bytes). The
   processor may not correctly perform address translations if a
   boundary occurs in this area. During a task switch, the processor
   reads and writes into the first 104 bytes of each TSS (using
   contiguous physical addresses beginning with the physical address
   of the first byte of the TSS). So, after TSS access begins, if
   part of the 104 bytes is not physically contiguous, the processor
   will access incorrect information without generating a page-fault
   exception.

Merely cacheline-aligning the TSS doesn't actually guarantee that
the hardware TSS doesn't span a page.  Instead, page-align the
structure that contains it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 301d41ca1fa1..97ded6e3edd3 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -326,6 +326,16 @@ struct x86_hw_tss {
 
 struct tss_struct {
 	/*
+	 * Space for the temporary SYSENTER stack.  Used for the entry
+	 * trampoline as well.  Size it such that tss_struct ends up
+	 * as a multiple of PAGE_SIZE.  This calculation assumes that
+	 * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one
+	 * long.
+	 */
+	unsigned long		SYSENTER_stack_canary;
+	unsigned long		SYSENTER_stack[(PAGE_SIZE - sizeof(struct x86_hw_tss)) / sizeof(unsigned long) - 2];
+
+	/*
 	 * The hardware state:
 	 */
 	struct x86_hw_tss	x86_tss;
@@ -337,15 +347,9 @@ struct tss_struct {
 	 * be within the limit.
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
-
-	/*
-	 * Space for the temporary SYSENTER stack.
-	 */
-	unsigned long		SYSENTER_stack_canary;
-	unsigned long		SYSENTER_stack[64];
 } ____cacheline_aligned;
 
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss);
 
 /*
  * sizeof(unsigned long) coming from an extra "long" at the end
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
                   ` (4 preceding siblings ...)
  2017-11-11  4:05 ` [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-13 19:22   ` Dave Hansen
  2017-11-11  4:05 ` [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack Andy Lutomirski
  2017-11-11 10:58 ` [RFC 0/7] Prep code for better stack switching Borislav Petkov
  7 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

This has a secondary purpose: it puts the entry stack into a region
with a well-controlled layout.  A subsequent patch will take
advantage of this to streamline the SYSCALL entry code to be able to
find it more easily.

XXX: This either needs to not happen on 32-bit or we need to fix the 32-bit
entry code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fixmap.h |  7 +++++++
 arch/x86/kernel/cpu/common.c  | 33 +++++++++++++++++++++++++++------
 arch/x86/power/cpu.c          | 11 ++++++-----
 3 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index fbc9b7f4e35e..8a9ba5553cab 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -52,6 +52,13 @@ extern unsigned long __FIXADDR_TOP;
 struct cpu_entry_area
 {
 	char gdt[PAGE_SIZE];
+
+	/*
+	 * The gdt is just below cpu_tss and thus serves (on x86_64) as a
+	 * a read-only guard page for the SYSENTER stack at the bottom
+	 * of the TSS region.
+	 */
+	struct tss_struct tss;
 };
 
 #define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ce3b3c79fc0c..fdf8108791ce 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -466,6 +466,16 @@ void load_percpu_segment(int cpu)
 	load_stack_canary_segment();
 }
 
+static void set_percpu_fixmap_pages(int fixmap_index, void *ptr, int pages,
+				    pgprot_t prot)
+{
+	int i;
+
+	for (i = 0; i < pages; i++)
+		__set_fixmap(fixmap_index - i,
+			     per_cpu_ptr_to_phys(ptr + i*PAGE_SIZE), prot);
+}
+
 /* Setup the fixmap mappings only once per-processor */
 static inline void setup_cpu_entry_area(int cpu)
 {
@@ -487,6 +497,12 @@ static inline void setup_cpu_entry_area(int cpu)
 #endif
 
 	__set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot);
+
+	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
+	set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss),
+				&per_cpu(cpu_tss, cpu),
+				sizeof(struct tss_struct) / PAGE_SIZE,
+				PAGE_KERNEL);
 }
 
 /* Load the original GDT from the per-cpu structure */
@@ -1236,7 +1252,8 @@ void enable_sep_cpu(void)
 	wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0);
 
 	wrmsr(MSR_IA32_SYSENTER_ESP,
-	      (unsigned long)tss + offsetofend(struct tss_struct, SYSENTER_stack),
+	      (unsigned long)&get_cpu_entry_area(cpu)->tss +
+	      offsetofend(struct tss_struct, SYSENTER_stack),
 	      0);
 
 	wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0);
@@ -1349,6 +1366,8 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
 /* May not be marked __init: used by software suspend */
 void syscall_init(void)
 {
+	int cpu = smp_processor_id();
+
 	wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
 	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
 
@@ -1362,7 +1381,7 @@ void syscall_init(void)
 	 */
 	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
 	wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
-		    (unsigned long)this_cpu_ptr(&cpu_tss) +
+		    (unsigned long)&get_cpu_entry_area(cpu)->tss +
 		    offsetofend(struct tss_struct, SYSENTER_stack));
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
 #else
@@ -1572,11 +1591,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, me);
 
+	setup_cpu_entry_area(cpu);
+
 	/*
 	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
 	 * task never enters user mode.
 	 */
-	set_tss_desc(cpu, &t->x86_tss);
+	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 	load_TR_desc();
 
 	load_mm_ldt(&init_mm);
@@ -1589,7 +1610,6 @@ void cpu_init(void)
 	if (is_uv_system())
 		uv_cpu_init();
 
-	setup_cpu_entry_area(cpu);
 	load_fixmap_gdt(cpu);
 }
 
@@ -1630,11 +1650,13 @@ void cpu_init(void)
 	initialize_tlbstate_and_flush();
 	enter_lazy_tlb(&init_mm, curr);
 
+	setup_cpu_entry_area(cpu);
+
 	/*
 	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
 	 * task never enters user mode.
 	 */
-	set_tss_desc(cpu, &t->x86_tss);
+	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 	load_TR_desc();
 
 	load_mm_ldt(&init_mm);
@@ -1651,7 +1673,6 @@ void cpu_init(void)
 
 	fpu__init_cpu();
 
-	setup_cpu_entry_area(cpu);
 	load_fixmap_gdt(cpu);
 }
 #endif
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 50593e138281..04d5157fe7f8 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -160,18 +160,19 @@ static void do_fpu_end(void)
 static void fix_processor_context(void)
 {
 	int cpu = smp_processor_id();
-	struct tss_struct *t = &per_cpu(cpu_tss, cpu);
 #ifdef CONFIG_X86_64
 	struct desc_struct *desc = get_cpu_gdt_rw(cpu);
 	tss_desc tss;
 #endif
 
 	/*
-	 * This just modifies memory; should not be necessary. But... This is
-	 * necessary, because 386 hardware has concept of busy TSS or some
-	 * similar stupidity.
+	 * We need to reload TR, which requires that we change the
+	 * GDT entry to indicate "available" first.
+	 *
+	 * XXX: This could probably all be replaced by a call to
+	 * force_reload_TR().
 	 */
-	set_tss_desc(cpu, &t->x86_tss);
+	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
 #ifdef CONFIG_X86_64
 	memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc));
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
                   ` (5 preceding siblings ...)
  2017-11-11  4:05 ` [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area Andy Lutomirski
@ 2017-11-11  4:05 ` Andy Lutomirski
  2017-11-13 22:46   ` Josh Poimboeuf
  2017-11-11 10:58 ` [RFC 0/7] Prep code for better stack switching Borislav Petkov
  7 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:05 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/stacktrace.h |  1 +
 arch/x86/kernel/dumpstack_32.c    |  3 +++
 arch/x86/kernel/dumpstack_64.c    | 23 +++++++++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 2e41c50ddf47..854f5cd141ed 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -15,6 +15,7 @@ enum stack_type {
 	STACK_TYPE_TASK,
 	STACK_TYPE_IRQ,
 	STACK_TYPE_SOFTIRQ,
+	STACK_TYPE_SYSENTER,
 	STACK_TYPE_EXCEPTION,
 	STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1,
 };
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 4f0481474903..0a04c7a9ecfc 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -25,6 +25,9 @@ const char *stack_type_name(enum stack_type type)
 	if (type == STACK_TYPE_SOFTIRQ)
 		return "SOFTIRQ";
 
+	if (type == STACK_TYPE_SYSENTER)
+		return "SYSENTER";
+
 	return NULL;
 }
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 225af4184f06..b9195ff7f1cf 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -36,6 +36,9 @@ const char *stack_type_name(enum stack_type type)
 	if (type == STACK_TYPE_IRQ)
 		return "IRQ";
 
+	if (type == STACK_TYPE_SYSENTER)
+		return "SYSENTER";
+
 	if (type >= STACK_TYPE_EXCEPTION && type <= STACK_TYPE_EXCEPTION_LAST)
 		return exception_stack_names[type - STACK_TYPE_EXCEPTION];
 
@@ -94,6 +97,23 @@ static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 	return true;
 }
 
+static bool in_SYSENTER_stack(unsigned long *stack, struct stack_info *info)
+{
+	int cpu = smp_processor_id();
+	void *begin = &get_cpu_entry_area(cpu)->tss.SYSENTER_stack;
+	void *end   = begin + sizeof(cpu_tss.SYSENTER_stack);
+
+	if ((void *)stack < begin || (void *)stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_SYSENTER;
+	info->begin	= begin;
+	info->end	= end;
+	info->next_sp	= NULL;
+
+	return true;
+}
+
 int get_stack_info(unsigned long *stack, struct task_struct *task,
 		   struct stack_info *info, unsigned long *visit_mask)
 {
@@ -114,6 +134,9 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	if (in_irq_stack(stack, info))
 		goto recursion_check;
 
+	if (in_SYSENTER_stack(stack, info))
+		goto recursion_check;
+
 	goto unknown;
 
 recursion_check:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
  2017-11-11  4:05 ` [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment Andy Lutomirski
@ 2017-11-11  4:11   ` Andy Lutomirski
  2017-11-13 19:19   ` Dave Hansen
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-11  4:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Fri, Nov 10, 2017 at 8:05 PM, Andy Lutomirski <luto@kernel.org> wrote:
> The Intel SDM says (Volume 3, 7.2.1):
>
>    Avoid placing a page boundary in the part of the TSS that the
>    processor reads during a task switch (the first 104 bytes). The
>    processor may not correctly perform address translations if a
>    boundary occurs in this area. During a task switch, the processor
>    reads and writes into the first 104 bytes of each TSS (using
>    contiguous physical addresses beginning with the physical address
>    of the first byte of the TSS). So, after TSS access begins, if
>    part of the 104 bytes is not physically contiguous, the processor
>    will access incorrect information without generating a page-fault
>    exception.

Hmm.  I should add that I suspect we rarely if ever hit this problem
in practice because (a) we only ever task switch on 32-bit
doublefaults, (b) if the old register state gets corrupted by this
issue during a doublefault, we might not notice, and (c) there is
probably rarely a page boundary in the wrong place.  I suspect that
regular kernel entries have the same issue but that esp0 and ss0 were
always in the same page due to cacheline alignment.

FWIW, we really do virtually map the percpu section AFAICT.  The code
does not appear to guarantee that percpu variables are physically
contiguous.

I'd love to make this mapping RO, but the SDM advises against that.  I
don't know whether there's a real concern (on 64-bit) or whether it's
just being overly cautious.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/7] Prep code for better stack switching
  2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
                   ` (6 preceding siblings ...)
  2017-11-11  4:05 ` [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack Andy Lutomirski
@ 2017-11-11 10:58 ` Borislav Petkov
  2017-11-12  2:59   ` Andy Lutomirski
  7 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2017-11-11 10:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds

On Fri, Nov 10, 2017 at 08:05:19PM -0800, Andy Lutomirski wrote:
> This isn't quite done (the TSS remap patch is busted on 32-bit, but
> that's a straightforward fix), but it should be ready for at least a
> conceptual review.
> 
> The idea here is to prepare us to have all kernel data needed for
> user mode execution and early entry located in the fixmap.  To do
> this, I hijack the GDT remap mechanism and make it more general.  I
> add a struct cpu_entry_area.  This struct is never instantiated
> directly.  Instead, it represents the layout of a per-cpu portion of
> the fixmap.  That portion contains the GDT, the TSS (including IO
> bitmap), and the entry stack (for now just a part of the TSS
> region).  It should also end up containing the PEBS and BTS buffers.
> 
> If this works, then the idea would be to add a magic *executable* page
> to cpu_entry_area.  That page would contain a stub like this:
> 
> ENTRY(entry_SYSCALL_64_trampoline)
> 	UNWIND_HINT_EMPTY
> 	movq	%rsp, 0x1000+entry_SYSCALL_64_trampoline-1f(%rip)
> 1:
> 	movq	0x1008+entry_SYSCALL_64_trampoline-1f(%rip), %rsp
> 1:
> 	pushq	%rdi
> 	pushq	%rsi

> 	movq	0x1000+entry_SYSCALL_64_trampoline-1f(%rip), %rsi
> 1:
> 	movq	$entry_SYSCALL_64, %rdi
> 	jmp	*%rdi

So I'm wondering: r12-r15 are callee-preserved so why can't you
scratch into those on entry and leave rsi and rdi pristine so that
entry_SYSCALL_64 can get to work directly?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/7] Prep code for better stack switching
  2017-11-11 10:58 ` [RFC 0/7] Prep code for better stack switching Borislav Petkov
@ 2017-11-12  2:59   ` Andy Lutomirski
  2017-11-12  4:25     ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-12  2:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Sat, Nov 11, 2017 at 2:58 AM, Borislav Petkov <bp@suse.de> wrote:
> On Fri, Nov 10, 2017 at 08:05:19PM -0800, Andy Lutomirski wrote:
>> This isn't quite done (the TSS remap patch is busted on 32-bit, but
>> that's a straightforward fix), but it should be ready for at least a
>> conceptual review.
>>
>> The idea here is to prepare us to have all kernel data needed for
>> user mode execution and early entry located in the fixmap.  To do
>> this, I hijack the GDT remap mechanism and make it more general.  I
>> add a struct cpu_entry_area.  This struct is never instantiated
>> directly.  Instead, it represents the layout of a per-cpu portion of
>> the fixmap.  That portion contains the GDT, the TSS (including IO
>> bitmap), and the entry stack (for now just a part of the TSS
>> region).  It should also end up containing the PEBS and BTS buffers.
>>
>> If this works, then the idea would be to add a magic *executable* page
>> to cpu_entry_area.  That page would contain a stub like this:
>>
>> ENTRY(entry_SYSCALL_64_trampoline)
>>       UNWIND_HINT_EMPTY
>>       movq    %rsp, 0x1000+entry_SYSCALL_64_trampoline-1f(%rip)
>> 1:
>>       movq    0x1008+entry_SYSCALL_64_trampoline-1f(%rip), %rsp
>> 1:
>>       pushq   %rdi
>>       pushq   %rsi
>
>>       movq    0x1000+entry_SYSCALL_64_trampoline-1f(%rip), %rsi
>> 1:
>>       movq    $entry_SYSCALL_64, %rdi
>>       jmp     *%rdi
>
> So I'm wondering: r12-r15 are callee-preserved so why can't you
> scratch into those on entry and leave rsi and rdi pristine so that
> entry_SYSCALL_64 can get to work directly?

I'm not sure I understand your suggestion.  SYSCALL has always
preserved all regs except rcx, r11, flags, rax, and, depending on what
signals are involved, the argument registers.  r12-r15 are definitely
preserved, and existing userspace relies on that.

Anyway, I'm halfway through actually implementing this, and it looks a
wee bit different, but not much different.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/7] Prep code for better stack switching
  2017-11-12  2:59   ` Andy Lutomirski
@ 2017-11-12  4:25     ` Andy Lutomirski
  2017-11-13  4:37       ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-12  4:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Sat, Nov 11, 2017 at 6:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Sat, Nov 11, 2017 at 2:58 AM, Borislav Petkov <bp@suse.de> wrote:
>> On Fri, Nov 10, 2017 at 08:05:19PM -0800, Andy Lutomirski wrote:
>>> This isn't quite done (the TSS remap patch is busted on 32-bit, but
>>> that's a straightforward fix), but it should be ready for at least a
>>> conceptual review.
>>>
>>> The idea here is to prepare us to have all kernel data needed for
>>> user mode execution and early entry located in the fixmap.  To do
>>> this, I hijack the GDT remap mechanism and make it more general.  I
>>> add a struct cpu_entry_area.  This struct is never instantiated
>>> directly.  Instead, it represents the layout of a per-cpu portion of
>>> the fixmap.  That portion contains the GDT, the TSS (including IO
>>> bitmap), and the entry stack (for now just a part of the TSS
>>> region).  It should also end up containing the PEBS and BTS buffers.
>>>
>>> If this works, then the idea would be to add a magic *executable* page
>>> to cpu_entry_area.  That page would contain a stub like this:
>>>
>>> ENTRY(entry_SYSCALL_64_trampoline)
>>>       UNWIND_HINT_EMPTY
>>>       movq    %rsp, 0x1000+entry_SYSCALL_64_trampoline-1f(%rip)
>>> 1:
>>>       movq    0x1008+entry_SYSCALL_64_trampoline-1f(%rip), %rsp
>>> 1:
>>>       pushq   %rdi
>>>       pushq   %rsi
>>
>>>       movq    0x1000+entry_SYSCALL_64_trampoline-1f(%rip), %rsi
>>> 1:
>>>       movq    $entry_SYSCALL_64, %rdi
>>>       jmp     *%rdi
>>
>> So I'm wondering: r12-r15 are callee-preserved so why can't you
>> scratch into those on entry and leave rsi and rdi pristine so that
>> entry_SYSCALL_64 can get to work directly?
>
> I'm not sure I understand your suggestion.  SYSCALL has always
> preserved all regs except rcx, r11, flags, rax, and, depending on what
> signals are involved, the argument registers.  r12-r15 are definitely
> preserved, and existing userspace relies on that.
>
> Anyway, I'm halfway through actually implementing this, and it looks a
> wee bit different, but not much different.


Here it is:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_stack.wip&id=96a6ab74088a86f6b9b6df8284c6466e4fa50d08

Seems to work for me.

Dave, want to see if you can get this working cleanly without mapping
any percpu variables at all?  You'll probably have to move PEBS, etc
into cpu_entry_area.  For now, it should be safe to just ignore the
LDT.  I'm somewhat tempted to just adjust your code so that the fixmap
ends up being mapped separately for LDT-using tasks rather than
mucking with putting the LDT in the user address range.  The latter
involves a little more mm magic than I really want to deal with if I
can avoid it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/7] Prep code for better stack switching
  2017-11-12  4:25     ` Andy Lutomirski
@ 2017-11-13  4:37       ` Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-13  4:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, X86 ML, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Sat, Nov 11, 2017 at 8:25 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Sat, Nov 11, 2017 at 6:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Sat, Nov 11, 2017 at 2:58 AM, Borislav Petkov <bp@suse.de> wrote:
>>> On Fri, Nov 10, 2017 at 08:05:19PM -0800, Andy Lutomirski wrote:
>>>> This isn't quite done (the TSS remap patch is busted on 32-bit, but
>>>> that's a straightforward fix), but it should be ready for at least a
>>>> conceptual review.
>>>>
>>>> The idea here is to prepare us to have all kernel data needed for
>>>> user mode execution and early entry located in the fixmap.  To do
>>>> this, I hijack the GDT remap mechanism and make it more general.  I
>>>> add a struct cpu_entry_area.  This struct is never instantiated
>>>> directly.  Instead, it represents the layout of a per-cpu portion of
>>>> the fixmap.  That portion contains the GDT, the TSS (including IO
>>>> bitmap), and the entry stack (for now just a part of the TSS
>>>> region).  It should also end up containing the PEBS and BTS buffers.
>>>>
>>>> If this works, then the idea would be to add a magic *executable* page
>>>> to cpu_entry_area.  That page would contain a stub like this:
>>>>
>>>> ENTRY(entry_SYSCALL_64_trampoline)
>>>>       UNWIND_HINT_EMPTY
>>>>       movq    %rsp, 0x1000+entry_SYSCALL_64_trampoline-1f(%rip)
>>>> 1:
>>>>       movq    0x1008+entry_SYSCALL_64_trampoline-1f(%rip), %rsp
>>>> 1:
>>>>       pushq   %rdi
>>>>       pushq   %rsi
>>>
>>>>       movq    0x1000+entry_SYSCALL_64_trampoline-1f(%rip), %rsi
>>>> 1:
>>>>       movq    $entry_SYSCALL_64, %rdi
>>>>       jmp     *%rdi
>>>
>>> So I'm wondering: r12-r15 are callee-preserved so why can't you
>>> scratch into those on entry and leave rsi and rdi pristine so that
>>> entry_SYSCALL_64 can get to work directly?
>>
>> I'm not sure I understand your suggestion.  SYSCALL has always
>> preserved all regs except rcx, r11, flags, rax, and, depending on what
>> signals are involved, the argument registers.  r12-r15 are definitely
>> preserved, and existing userspace relies on that.
>>
>> Anyway, I'm halfway through actually implementing this, and it looks a
>> wee bit different, but not much different.
>
>
> Here it is:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_stack.wip&id=96a6ab74088a86f6b9b6df8284c6466e4fa50d08
>
> Seems to work for me.
>
> Dave, want to see if you can get this working cleanly without mapping
> any percpu variables at all?  You'll probably have to move PEBS, etc
> into cpu_entry_area.  For now, it should be safe to just ignore the
> LDT.  I'm somewhat tempted to just adjust your code so that the fixmap
> ends up being mapped separately for LDT-using tasks rather than
> mucking with putting the LDT in the user address range.  The latter
> involves a little more mm magic than I really want to deal with if I
> can avoid it.

If any of you are playing with the full series (the stuff in my tree,
not the stuff in this email), don't try to use it with excessive
amounts of tracing on or with CONFIG_CONTEXT_TRACKING_FORCE -- it'll
explode horribly.  I see the root cause, and I'll fix it soon.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-11  4:05 ` [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss Andy Lutomirski
@ 2017-11-13 17:01   ` Dave Hansen
  2017-11-26 13:48     ` [PATCH v2] x86/entry: " Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2017-11-13 17:01 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
> -struct tss_struct doublefault_tss __cacheline_aligned = {
> -	.x86_tss = {
> -		.sp0		= STACK_START,
> -		.ss0		= __KERNEL_DS,
> -		.ldt		= 0,
...
> +struct x86_hw_tss doublefault_tss __cacheline_aligned = {
> +	.sp0		= STACK_START,
> +	.ss0		= __KERNEL_DS,
> +	.ldt		= 0,
> +	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,

FWIW, I really like the trend of renaming the hardware structures in
such a way that it's clear that they *are* hardware structures.

It might also be nice to reference the relevant SDM sections on the
topic, or even to include a comment along the lines of how it get used.
This chunk from the SDM is particularly relevant:

"The TSS holds information important to 64-bit mode and that is not
directly related to the task-switch mechanism."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack
  2017-11-11  4:05 ` [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack Andy Lutomirski
@ 2017-11-13 19:07   ` Dave Hansen
  2017-11-14  2:17     ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2017-11-13 19:07 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
> This will simplify some future code changes that will want some
> temporary stack space in more places.  It also lets us get rid of a
> SWAPGS_UNSAFE_STACK user.
> 
> This does not depend on CONFIG_IA32_EMULATION because we'll want the
> stack space even without IA32 emulation.

It was never clear to me why we don't use this on 64-bit today.  Does
anybody know why?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
  2017-11-11  4:05 ` [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment Andy Lutomirski
  2017-11-11  4:11   ` Andy Lutomirski
@ 2017-11-13 19:19   ` Dave Hansen
  1 sibling, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2017-11-13 19:19 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>  struct tss_struct {
>  	/*
> +	 * Space for the temporary SYSENTER stack.  Used for the entry
> +	 * trampoline as well.  Size it such that tss_struct ends up
> +	 * as a multiple of PAGE_SIZE.  This calculation assumes that
> +	 * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one
> +	 * long.
> +	 */
> +	unsigned long		SYSENTER_stack_canary;
> +	unsigned long		SYSENTER_stack[(PAGE_SIZE - sizeof(struct x86_hw_tss)) / sizeof(unsigned long) - 2];
> +
> +	/*
>  	 * The hardware state:
>  	 */
>  	struct x86_hw_tss	x86_tss;
> @@ -337,15 +347,9 @@ struct tss_struct {
>  	 * be within the limit.
>  	 */
>  	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
> -
> -	/*
> -	 * Space for the temporary SYSENTER stack.
> -	 */
> -	unsigned long		SYSENTER_stack_canary;
> -	unsigned long		SYSENTER_stack[64];
>  } ____cacheline_aligned;


If io_bitmap[] is already page-size-aligned, how does it help us to move
SYSENTER_stack[]?

It seems like it would be easier to just leave SYSENTER_stack[] where it
is, make it SYSENTER_stack[0], and just find somewhere else to choose
how much to bloat the tss_struct *allocation* instead of trying to make
sure that sizeof(tss_struct) matches the allocation.

That SYSENTER_stack[] size calculation is pretty hideous. :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-11  4:05 ` [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area Andy Lutomirski
@ 2017-11-13 19:22   ` Dave Hansen
  2017-11-13 19:36     ` Linus Torvalds
  2017-11-14  2:27     ` Andy Lutomirski
  0 siblings, 2 replies; 32+ messages in thread
From: Dave Hansen @ 2017-11-13 19:22 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Linus Torvalds

On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index fbc9b7f4e35e..8a9ba5553cab 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -52,6 +52,13 @@ extern unsigned long __FIXADDR_TOP;
>  struct cpu_entry_area
>  {
>  	char gdt[PAGE_SIZE];
> +
> +	/*
> +	 * The gdt is just below cpu_tss and thus serves (on x86_64) as a
> +	 * a read-only guard page for the SYSENTER stack at the bottom
> +	 * of the TSS region.
> +	 */
> +	struct tss_struct tss;
>  };
>  

Aha, and here's the place that you need sizeof(tss_struct) to be nice
and page-aligned.

But why don't we just do:

	char tss_space[PAGE_SIZE*something];

?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-13 19:22   ` Dave Hansen
@ 2017-11-13 19:36     ` Linus Torvalds
  2017-11-14  2:25       ` Andy Lutomirski
  2017-11-14  2:27     ` Andy Lutomirski
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2017-11-13 19:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel, Brian Gerst

On Mon, Nov 13, 2017 at 11:22 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>
> Aha, and here's the place that you need sizeof(tss_struct) to be nice
> and page-aligned.

No, it should _not_ be page-aligned. It should fit _within_ a page,
but it 'struct tss_struct' now has something else in front of it, then
page-aliging that is actually pointless.

I forget what the actual size is, but aligning the hardware TSS struct
to 128 bytes might be sufficient. It's not that big.

Of course, we've had issues with "big" alignments before, in that they
haven't been reliable because the base isn't reliably aligned (the
stack being the worst case, but even standard data sections have had
issues). It's partly why we have special page-aligned sections.

                Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack
  2017-11-11  4:05 ` [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack Andy Lutomirski
@ 2017-11-13 22:46   ` Josh Poimboeuf
  2017-11-14  2:13     ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2017-11-13 22:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds

On Fri, Nov 10, 2017 at 08:05:26PM -0800, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

I would make the subject more specific and less unwinder-centric, like:

  "x86/dumpstack: Add get_stack_info() support for the SYSENTER stack"

since there aren't yet any changes to the unwinders themselves (which
I'm assuming will be needed once you start using the stack).

Also, at least a one-sentence description would be good.

Otherwise the patch looks good to me.

-- 
Josh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack
  2017-11-13 22:46   ` Josh Poimboeuf
@ 2017-11-14  2:13     ` Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-14  2:13 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Dave Hansen, Linus Torvalds

On Mon, Nov 13, 2017 at 2:46 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Nov 10, 2017 at 08:05:26PM -0800, Andy Lutomirski wrote:
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>
> I would make the subject more specific and less unwinder-centric, like:
>
>   "x86/dumpstack: Add get_stack_info() support for the SYSENTER stack"
>
> since there aren't yet any changes to the unwinders themselves (which
> I'm assuming will be needed once you start using the stack).
>
> Also, at least a one-sentence description would be good.
>
> Otherwise the patch looks good to me.

Done.  I also added 32-bit support.

>
> --
> Josh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack
  2017-11-13 19:07   ` Dave Hansen
@ 2017-11-14  2:17     ` Andy Lutomirski
  2017-11-14  7:15       ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-14  2:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Linus Torvalds

On Mon, Nov 13, 2017 at 11:07 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>> This will simplify some future code changes that will want some
>> temporary stack space in more places.  It also lets us get rid of a
>> SWAPGS_UNSAFE_STACK user.
>>
>> This does not depend on CONFIG_IA32_EMULATION because we'll want the
>> stack space even without IA32 emulation.
>
> It was never clear to me why we don't use this on 64-bit today.  Does
> anybody know why?

Nothing used it?

As far as I can tell, the original x86_64 Linux port was a little bit
more excited about IST than I think made sense.  As a result, we use
IST for #DB and #BP, which is IMO rather nasty and causes a lot more
problems than it solves.  But, since #DB uses IST, we don't actually
need a real stack for SYSENTER (because SYSENTER with TF set will
invoke #DB on the IST stack rather than the SYSENTER stack).

I have old patches to stop using IST for #DB and #BP, but I never finished them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-13 19:36     ` Linus Torvalds
@ 2017-11-14  2:25       ` Andy Lutomirski
  2017-11-14  2:28         ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-14  2:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Hansen, Andy Lutomirski, X86 ML, Borislav Petkov,
	linux-kernel, Brian Gerst

On Mon, Nov 13, 2017 at 11:36 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 11:22 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> Aha, and here's the place that you need sizeof(tss_struct) to be nice
>> and page-aligned.
>
> No, it should _not_ be page-aligned. It should fit _within_ a page,
> but it 'struct tss_struct' now has something else in front of it, then
> page-aliging that is actually pointless.
>
> I forget what the actual size is, but aligning the hardware TSS struct
> to 128 bytes might be sufficient. It's not that big.

104 bytes, so it's probably already fine.  For anything except an
actual task switch, only the first 12 or so bytes matter.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-13 19:22   ` Dave Hansen
  2017-11-13 19:36     ` Linus Torvalds
@ 2017-11-14  2:27     ` Andy Lutomirski
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-14  2:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Linus Torvalds

On Mon, Nov 13, 2017 at 11:22 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index fbc9b7f4e35e..8a9ba5553cab 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -52,6 +52,13 @@ extern unsigned long __FIXADDR_TOP;
>>  struct cpu_entry_area
>>  {
>>       char gdt[PAGE_SIZE];
>> +
>> +     /*
>> +      * The gdt is just below cpu_tss and thus serves (on x86_64) as a
>> +      * a read-only guard page for the SYSENTER stack at the bottom
>> +      * of the TSS region.
>> +      */
>> +     struct tss_struct tss;
>>  };
>>
>
> Aha, and here's the place that you need sizeof(tss_struct) to be nice
> and page-aligned.
>
> But why don't we just do:
>
>         char tss_space[PAGE_SIZE*something];

The idea is to save some space.  The TSS plus IO bitmap is slightly
over a page, so, if we're giving it a dedicated block of pages, we
have almost a page of unused space.  I want to use some of that space
for the SYSENTER stack.  To reliably detect overflow, that space
should be at the beginning.

It turns out that using almost a page is way too *big*: it masks bugs.
I want anything nontrivial that accidentally runs on the SYSENTER
stack to overflow and crash very quickly rather than having a decent
chance of working or of causing nasty corruption with a crash down the
road.  So I'm going to make it much smaller and instead just add a
build-time assertion that we don't cross a page boundary.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-14  2:25       ` Andy Lutomirski
@ 2017-11-14  2:28         ` Linus Torvalds
  2017-11-14  2:30           ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2017-11-14  2:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, X86 ML, Borislav Petkov, linux-kernel, Brian Gerst

On Mon, Nov 13, 2017 at 6:25 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Mon, Nov 13, 2017 at 11:36 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I forget what the actual size is, but aligning the hardware TSS struct
>> to 128 bytes might be sufficient. It's not that big.
>
> 104 bytes, so it's probably already fine.  For anything except an
> actual task switch, only the first 12 or so bytes matter.

Note that historically, about half of the Intel errata (that don't get
fixed) are about TSS in oddball situations, mainly page crossers.

I may be exaggerating just a tiny bit, but it's definitely a "don't do it".

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area
  2017-11-14  2:28         ` Linus Torvalds
@ 2017-11-14  2:30           ` Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-14  2:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Dave Hansen, X86 ML, Borislav Petkov,
	linux-kernel, Brian Gerst

On Mon, Nov 13, 2017 at 6:28 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 6:25 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Mon, Nov 13, 2017 at 11:36 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> I forget what the actual size is, but aligning the hardware TSS struct
>>> to 128 bytes might be sufficient. It's not that big.
>>
>> 104 bytes, so it's probably already fine.  For anything except an
>> actual task switch, only the first 12 or so bytes matter.
>
> Note that historically, about half of the Intel errata (that don't get
> fixed) are about TSS in oddball situations, mainly page crossers.
>
> I may be exaggerating just a tiny bit, but it's definitely a "don't do it".

:)

I suspect the major case where this matters is when we do a task
switch, which only ever happens on 32-bit double faults, at which
point we're already seriously screwed.  But yes, I agree.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack
  2017-11-14  2:17     ` Andy Lutomirski
@ 2017-11-14  7:15       ` Ingo Molnar
  0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2017-11-14  7:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, X86 ML, Borislav Petkov, linux-kernel, Brian Gerst,
	Linus Torvalds


* Andy Lutomirski <luto@kernel.org> wrote:

> I have old patches to stop using IST for #DB and #BP, but I never finished them.

I'm all in favor of reviving that effort!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-13 17:01   ` Dave Hansen
@ 2017-11-26 13:48     ` Ingo Molnar
  2017-11-26 15:41       ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2017-11-26 13:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, X86 ML, Borislav Petkov, linux-kernel,
	Brian Gerst, Linus Torvalds


* Dave Hansen <dave.hansen@intel.com> wrote:

> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
> > -struct tss_struct doublefault_tss __cacheline_aligned = {
> > -	.x86_tss = {
> > -		.sp0		= STACK_START,
> > -		.ss0		= __KERNEL_DS,
> > -		.ldt		= 0,
> ...
> > +struct x86_hw_tss doublefault_tss __cacheline_aligned = {
> > +	.sp0		= STACK_START,
> > +	.ss0		= __KERNEL_DS,
> > +	.ldt		= 0,
> > +	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
> 
> FWIW, I really like the trend of renaming the hardware structures in
> such a way that it's clear that they *are* hardware structures.
> 
> It might also be nice to reference the relevant SDM sections on the
> topic, or even to include a comment along the lines of how it get used.
> This chunk from the SDM is particularly relevant:
> 
> "The TSS holds information important to 64-bit mode and that is not
> directly related to the task-switch mechanism."

That makes sense - I've updated this patch with the following description added to 
struct x86_hw_tss:

+/*
+ * Note that while the legacy 'TSS' name comes from 'Task State Segment',
+ * on modern x86 CPUs the TSS holds information important to 64-bit mode
+ * unrelated to the task-switch mechanism:
+ */

I have also added your Reviewed-by tag. Updated patch below.

Thanks,

	Ingo

=====================>
Subject: x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
From: Andy Lutomirski <luto@kernel.org>
Date: Thu, 23 Nov 2017 20:32:52 -0800

A future patch will move SYSENTER_stack to the beginning of cpu_tss
to help detect overflow.  Before this can happen, fix several code
paths that hardcode assumptions about the old layout.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Updated the 'struct tss_struct' comments, as suggested by Dave Hansen. ]
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d40a2c5ae4539d64090849a374f3169ec492f4e2.1511497875.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/desc.h      |    2 +-
 arch/x86/include/asm/processor.h |    9 +++++++--
 arch/x86/kernel/cpu/common.c     |    8 ++++----
 arch/x86/kernel/doublefault.c    |   32 +++++++++++++++-----------------
 arch/x86/power/cpu.c             |   13 +++++++------
 5 files changed, 34 insertions(+), 30 deletions(-)

Index: tip/arch/x86/include/asm/desc.h
===================================================================
--- tip.orig/arch/x86/include/asm/desc.h
+++ tip/arch/x86/include/asm/desc.h
@@ -178,7 +178,7 @@ static inline void set_tssldt_descriptor
 #endif
 }
 
-static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
+static inline void __set_tss_desc(unsigned cpu, unsigned int entry, struct x86_hw_tss *addr)
 {
 	struct desc_struct *d = get_cpu_gdt_rw(cpu);
 	tss_desc tss;
Index: tip/arch/x86/include/asm/processor.h
===================================================================
--- tip.orig/arch/x86/include/asm/processor.h
+++ tip/arch/x86/include/asm/processor.h
@@ -163,7 +163,7 @@ enum cpuid_regs_idx {
 extern struct cpuinfo_x86	boot_cpu_data;
 extern struct cpuinfo_x86	new_cpu_data;
 
-extern struct tss_struct	doublefault_tss;
+extern struct x86_hw_tss	doublefault_tss;
 extern __u32			cpu_caps_cleared[NCAPINTS];
 extern __u32			cpu_caps_set[NCAPINTS];
 
@@ -253,6 +253,11 @@ static inline void load_cr3(pgd_t *pgdir
 	write_cr3(__sme_pa(pgdir));
 }
 
+/*
+ * Note that while the legacy 'TSS' name comes from 'Task State Segment',
+ * on modern x86 CPUs the TSS also holds information important to 64-bit mode,
+ * unrelated to the task-switch mechanism:
+ */
 #ifdef CONFIG_X86_32
 /* This is the TSS defined by the hardware. */
 struct x86_hw_tss {
@@ -323,7 +328,7 @@ struct x86_hw_tss {
 #define IO_BITMAP_BITS			65536
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
-#define IO_BITMAP_OFFSET		offsetof(struct tss_struct, io_bitmap)
+#define IO_BITMAP_OFFSET		(offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss))
 #define INVALID_IO_BITMAP_OFFSET	0x8000
 
 struct tss_struct {
Index: tip/arch/x86/kernel/cpu/common.c
===================================================================
--- tip.orig/arch/x86/kernel/cpu/common.c
+++ tip/arch/x86/kernel/cpu/common.c
@@ -1582,7 +1582,7 @@ void cpu_init(void)
 		}
 	}
 
-	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
+	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
 
 	/*
 	 * <= is required because the CPU will access up to
@@ -1601,7 +1601,7 @@ void cpu_init(void)
 	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
 	 * task never enters user mode.
 	 */
-	set_tss_desc(cpu, t);
+	set_tss_desc(cpu, &t->x86_tss);
 	load_TR_desc();
 
 	load_mm_ldt(&init_mm);
@@ -1659,12 +1659,12 @@ void cpu_init(void)
 	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
 	 * task never enters user mode.
 	 */
-	set_tss_desc(cpu, t);
+	set_tss_desc(cpu, &t->x86_tss);
 	load_TR_desc();
 
 	load_mm_ldt(&init_mm);
 
-	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
+	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
 
 #ifdef CONFIG_DOUBLEFAULT
 	/* Set up doublefault TSS pointer in the GDT */
Index: tip/arch/x86/kernel/doublefault.c
===================================================================
--- tip.orig/arch/x86/kernel/doublefault.c
+++ tip/arch/x86/kernel/doublefault.c
@@ -50,25 +50,23 @@ static void doublefault_fn(void)
 		cpu_relax();
 }
 
-struct tss_struct doublefault_tss __cacheline_aligned = {
-	.x86_tss = {
-		.sp0		= STACK_START,
-		.ss0		= __KERNEL_DS,
-		.ldt		= 0,
-		.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
+struct x86_hw_tss doublefault_tss __cacheline_aligned = {
+	.sp0		= STACK_START,
+	.ss0		= __KERNEL_DS,
+	.ldt		= 0,
+	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
 
-		.ip		= (unsigned long) doublefault_fn,
-		/* 0x2 bit is always set */
-		.flags		= X86_EFLAGS_SF | 0x2,
-		.sp		= STACK_START,
-		.es		= __USER_DS,
-		.cs		= __KERNEL_CS,
-		.ss		= __KERNEL_DS,
-		.ds		= __USER_DS,
-		.fs		= __KERNEL_PERCPU,
+	.ip		= (unsigned long) doublefault_fn,
+	/* 0x2 bit is always set */
+	.flags		= X86_EFLAGS_SF | 0x2,
+	.sp		= STACK_START,
+	.es		= __USER_DS,
+	.cs		= __KERNEL_CS,
+	.ss		= __KERNEL_DS,
+	.ds		= __USER_DS,
+	.fs		= __KERNEL_PERCPU,
 
-		.__cr3		= __pa_nodebug(swapper_pg_dir),
-	}
+	.__cr3		= __pa_nodebug(swapper_pg_dir),
 };
 
 /* dummy for do_double_fault() call */
Index: tip/arch/x86/power/cpu.c
===================================================================
--- tip.orig/arch/x86/power/cpu.c
+++ tip/arch/x86/power/cpu.c
@@ -165,12 +165,13 @@ static void fix_processor_context(void)
 	struct desc_struct *desc = get_cpu_gdt_rw(cpu);
 	tss_desc tss;
 #endif
-	set_tss_desc(cpu, t);	/*
-				 * This just modifies memory; should not be
-				 * necessary. But... This is necessary, because
-				 * 386 hardware has concept of busy TSS or some
-				 * similar stupidity.
-				 */
+
+	/*
+	 * This just modifies memory; should not be necessary. But... This is
+	 * necessary, because 386 hardware has concept of busy TSS or some
+	 * similar stupidity.
+	 */
+	set_tss_desc(cpu, &t->x86_tss);
 
 #ifdef CONFIG_X86_64
 	memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc));

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-26 13:48     ` [PATCH v2] x86/entry: " Ingo Molnar
@ 2017-11-26 15:41       ` Andy Lutomirski
  2017-11-26 15:58         ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-26 15:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Hansen, Andy Lutomirski, X86 ML, Borislav Petkov,
	linux-kernel, Brian Gerst, Linus Torvalds

On Sun, Nov 26, 2017 at 5:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dave Hansen <dave.hansen@intel.com> wrote:
>
>> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>> > -struct tss_struct doublefault_tss __cacheline_aligned = {
>> > -   .x86_tss = {
>> > -           .sp0            = STACK_START,
>> > -           .ss0            = __KERNEL_DS,
>> > -           .ldt            = 0,
>> ...
>> > +struct x86_hw_tss doublefault_tss __cacheline_aligned = {
>> > +   .sp0            = STACK_START,
>> > +   .ss0            = __KERNEL_DS,
>> > +   .ldt            = 0,
>> > +   .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,
>>
>> FWIW, I really like the trend of renaming the hardware structures in
>> such a way that it's clear that they *are* hardware structures.
>>
>> It might also be nice to reference the relevant SDM sections on the
>> topic, or even to include a comment along the lines of how it get used.
>> This chunk from the SDM is particularly relevant:
>>
>> "The TSS holds information important to 64-bit mode and that is not
>> directly related to the task-switch mechanism."
>
> That makes sense - I've updated this patch with the following description added to
> struct x86_hw_tss:

I've folded this in along with all the reviews so far, and a few misc
fixes from Boris' review.  I was planning to resend the whole series
today after I track down the kbuild error.  Does that sound good?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-26 15:41       ` Andy Lutomirski
@ 2017-11-26 15:58         ` Ingo Molnar
  2017-11-26 16:00           ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2017-11-26 15:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, X86 ML, Borislav Petkov, linux-kernel, Brian Gerst,
	Linus Torvalds


* Andy Lutomirski <luto@kernel.org> wrote:

> On Sun, Nov 26, 2017 at 5:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Dave Hansen <dave.hansen@intel.com> wrote:
> >
> >> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
> >> > -struct tss_struct doublefault_tss __cacheline_aligned = {
> >> > -   .x86_tss = {
> >> > -           .sp0            = STACK_START,
> >> > -           .ss0            = __KERNEL_DS,
> >> > -           .ldt            = 0,
> >> ...
> >> > +struct x86_hw_tss doublefault_tss __cacheline_aligned = {
> >> > +   .sp0            = STACK_START,
> >> > +   .ss0            = __KERNEL_DS,
> >> > +   .ldt            = 0,
> >> > +   .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,
> >>
> >> FWIW, I really like the trend of renaming the hardware structures in
> >> such a way that it's clear that they *are* hardware structures.
> >>
> >> It might also be nice to reference the relevant SDM sections on the
> >> topic, or even to include a comment along the lines of how it get used.
> >> This chunk from the SDM is particularly relevant:
> >>
> >> "The TSS holds information important to 64-bit mode and that is not
> >> directly related to the task-switch mechanism."
> >
> > That makes sense - I've updated this patch with the following description added to
> > struct x86_hw_tss:
> 
> I've folded this in along with all the reviews so far, and a few misc
> fixes from Boris' review.  I was planning to resend the whole series
> today after I track down the kbuild error.  Does that sound good?

Could you please do a delta to the very latest WIP.x86/mm instead?

In the latest I have included the review tags already, and all the easy-to-do 
review feedback as well, so the delta should be rasonably small.

These entry bits are destined for x86/urgent real soon, so Thomas and me are 
trying to pin the tree down and do delta changes only.

If it's a simple full interdiff between your latest and WIP.x86/mm that's fine as 
well, can backmerge everything accordingly.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-26 15:58         ` Ingo Molnar
@ 2017-11-26 16:00           ` Ingo Molnar
  2017-11-26 16:05             ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2017-11-26 16:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, X86 ML, Borislav Petkov, linux-kernel, Brian Gerst,
	Linus Torvalds


* Ingo Molnar <mingo@kernel.org> wrote:

> Could you please do a delta to the very latest WIP.x86/mm instead?
> 
> In the latest I have included the review tags already, and all the easy-to-do 
> review feedback as well, so the delta should be rasonably small.
> 
> These entry bits are destined for x86/urgent real soon, so Thomas and me are 
> trying to pin the tree down and do delta changes only.
> 
> If it's a simple full interdiff between your latest and WIP.x86/mm that's fine as 
> well, can backmerge everything accordingly.

Also feel free to include the two unwinder enhancements from you and Josh in the 
delta series (as two separate patches) - WIP.x86/mm won't change more today.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-26 16:00           ` Ingo Molnar
@ 2017-11-26 16:05             ` Andy Lutomirski
  2017-11-26 16:43               ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2017-11-26 16:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Dave Hansen, X86 ML, Borislav Petkov,
	linux-kernel, Brian Gerst, Linus Torvalds

On Sun, Nov 26, 2017 at 8:00 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
>> Could you please do a delta to the very latest WIP.x86/mm instead?
>>
>> In the latest I have included the review tags already, and all the easy-to-do
>> review feedback as well, so the delta should be rasonably small.
>>
>> These entry bits are destined for x86/urgent real soon, so Thomas and me are
>> trying to pin the tree down and do delta changes only.
>>
>> If it's a simple full interdiff between your latest and WIP.x86/mm that's fine as
>> well, can backmerge everything accordingly.
>
> Also feel free to include the two unwinder enhancements from you and Josh in the
> delta series (as two separate patches) - WIP.x86/mm won't change more today.

I can do that.  Or if you want to get testing going, you can just pick
them up -- I'm not planning on touching either one of of them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  2017-11-26 16:05             ` Andy Lutomirski
@ 2017-11-26 16:43               ` Ingo Molnar
  0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2017-11-26 16:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, X86 ML, Borislav Petkov, linux-kernel, Brian Gerst,
	Linus Torvalds


* Andy Lutomirski <luto@kernel.org> wrote:

> On Sun, Nov 26, 2017 at 8:00 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Ingo Molnar <mingo@kernel.org> wrote:
> >
> >> Could you please do a delta to the very latest WIP.x86/mm instead?
> >>
> >> In the latest I have included the review tags already, and all the easy-to-do
> >> review feedback as well, so the delta should be rasonably small.
> >>
> >> These entry bits are destined for x86/urgent real soon, so Thomas and me are
> >> trying to pin the tree down and do delta changes only.
> >>
> >> If it's a simple full interdiff between your latest and WIP.x86/mm that's fine as
> >> well, can backmerge everything accordingly.
> >
> > Also feel free to include the two unwinder enhancements from you and Josh in the
> > delta series (as two separate patches) - WIP.x86/mm won't change more today.
> 
> I can do that.  Or if you want to get testing going, you can just pick
> them up -- I'm not planning on touching either one of of them.

I was hoping for proper changelogs and SOBs for both of them ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-11-26 16:43 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-11  4:05 [RFC 0/7] Prep code for better stack switching Andy Lutomirski
2017-11-11  4:05 ` [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack Andy Lutomirski
2017-11-13 19:07   ` Dave Hansen
2017-11-14  2:17     ` Andy Lutomirski
2017-11-14  7:15       ` Ingo Molnar
2017-11-11  4:05 ` [RFC 2/7] x86/gdt: Put per-cpu GDT remaps in ascending order Andy Lutomirski
2017-11-11  4:05 ` [RFC 3/7] x86/fixmap: Generalize the GDT fixmap mechanism Andy Lutomirski
2017-11-11  4:05 ` [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss Andy Lutomirski
2017-11-13 17:01   ` Dave Hansen
2017-11-26 13:48     ` [PATCH v2] x86/entry: " Ingo Molnar
2017-11-26 15:41       ` Andy Lutomirski
2017-11-26 15:58         ` Ingo Molnar
2017-11-26 16:00           ` Ingo Molnar
2017-11-26 16:05             ` Andy Lutomirski
2017-11-26 16:43               ` Ingo Molnar
2017-11-11  4:05 ` [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment Andy Lutomirski
2017-11-11  4:11   ` Andy Lutomirski
2017-11-13 19:19   ` Dave Hansen
2017-11-11  4:05 ` [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area Andy Lutomirski
2017-11-13 19:22   ` Dave Hansen
2017-11-13 19:36     ` Linus Torvalds
2017-11-14  2:25       ` Andy Lutomirski
2017-11-14  2:28         ` Linus Torvalds
2017-11-14  2:30           ` Andy Lutomirski
2017-11-14  2:27     ` Andy Lutomirski
2017-11-11  4:05 ` [RFC 7/7] x86/unwind/64: Add support for the SYSENTER stack Andy Lutomirski
2017-11-13 22:46   ` Josh Poimboeuf
2017-11-14  2:13     ` Andy Lutomirski
2017-11-11 10:58 ` [RFC 0/7] Prep code for better stack switching Borislav Petkov
2017-11-12  2:59   ` Andy Lutomirski
2017-11-12  4:25     ` Andy Lutomirski
2017-11-13  4:37       ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.