linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Clean up x86_32 stackprotector
@ 2021-02-13 19:19 Andy Lutomirski
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Andy Lutomirski @ 2021-02-13 19:19 UTC (permalink / raw)
  To: x86
  Cc: LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel, Andy Lutomirski

x86_32 stackprotector is a maintenance nightmare.  Clean it up.  This
disables stackprotector on x86_32 on GCC 8.1 and on all clang
versions.  Some clang people are cc'd.

Changes from v1:
 - Changelog fixes.
 - Comment fixes (mostly from Sean).
 - Fix the !SMP case.

Andy Lutomirski (2):
  x86/stackprotector/32: Make the canary into a regular percpu variable
  x86/entry/32: Remove leftover macros after stackprotector cleanups

 arch/x86/Kconfig                          |  7 +-
 arch/x86/Makefile                         |  8 ++
 arch/x86/entry/entry_32.S                 | 95 +----------------------
 arch/x86/include/asm/processor.h          | 15 +---
 arch/x86/include/asm/ptrace.h             |  5 +-
 arch/x86/include/asm/segment.h            | 30 ++-----
 arch/x86/include/asm/stackprotector.h     | 79 ++++---------------
 arch/x86/include/asm/suspend_32.h         |  6 +-
 arch/x86/kernel/asm-offsets_32.c          |  5 --
 arch/x86/kernel/cpu/common.c              |  5 +-
 arch/x86/kernel/doublefault_32.c          |  4 +-
 arch/x86/kernel/head_32.S                 | 18 +----
 arch/x86/kernel/setup_percpu.c            |  1 -
 arch/x86/kernel/tls.c                     |  8 +-
 arch/x86/kvm/svm/svm.c                    | 10 +--
 arch/x86/lib/insn-eval.c                  |  4 -
 arch/x86/platform/pvh/head.S              | 14 ----
 arch/x86/power/cpu.c                      |  6 +-
 arch/x86/xen/enlighten_pv.c               |  1 -
 scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
 20 files changed, 62 insertions(+), 265 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-13 19:19 [PATCH v2 0/2] Clean up x86_32 stackprotector Andy Lutomirski
@ 2021-02-13 19:19 ` Andy Lutomirski
  2021-02-13 19:49   ` Sedat Dilek
                     ` (4 more replies)
  2021-02-13 19:19 ` [PATCH v2 2/2] x86/entry/32: Remove leftover macros after stackprotector cleanups Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 5 replies; 24+ messages in thread
From: Andy Lutomirski @ 2021-02-13 19:19 UTC (permalink / raw)
  To: x86
  Cc: LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel, Andy Lutomirski

On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage.  It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work.  Supporting both variants is a
maintenance and testing mess.

Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.

Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable.  This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.

(That name is special.  We could use any symbol we want for the
 %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
 name other than __stack_chk_guard.)

This patch forcibly disables stackprotector on older compilers that
don't support the new options and makes the stack canary into a
percpu variable.  The "lazy GS" approach is now used for all 32-bit
configurations.

This patch also makes load_gs_index() work on 32-bit kernels.  On
64-bit kernels, it loads the GS selector and updates the user
GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
it loads the GS selector and updates GSBASE, which is now
always the user base.  This means that the overall effect is
the same on 32-bit and 64-bit, which avoids some ifdeffery.

Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/Kconfig                          |  7 +-
 arch/x86/Makefile                         |  8 +++
 arch/x86/entry/entry_32.S                 | 56 ++--------------
 arch/x86/include/asm/processor.h          | 15 ++---
 arch/x86/include/asm/ptrace.h             |  5 +-
 arch/x86/include/asm/segment.h            | 30 +++------
 arch/x86/include/asm/stackprotector.h     | 79 +++++------------------
 arch/x86/include/asm/suspend_32.h         |  6 +-
 arch/x86/kernel/asm-offsets_32.c          |  5 --
 arch/x86/kernel/cpu/common.c              |  5 +-
 arch/x86/kernel/doublefault_32.c          |  4 +-
 arch/x86/kernel/head_32.S                 | 18 +-----
 arch/x86/kernel/setup_percpu.c            |  1 -
 arch/x86/kernel/tls.c                     |  8 +--
 arch/x86/kvm/svm/svm.c                    | 10 +--
 arch/x86/lib/insn-eval.c                  |  4 --
 arch/x86/platform/pvh/head.S              | 14 ----
 arch/x86/power/cpu.c                      |  6 +-
 arch/x86/xen/enlighten_pv.c               |  1 -
 scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
 20 files changed, 62 insertions(+), 226 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..12d8bf011d08 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -353,10 +353,6 @@ config X86_64_SMP
 	def_bool y
 	depends on X86_64 && SMP
 
-config X86_32_LAZY_GS
-	def_bool y
-	depends on X86_32 && !STACKPROTECTOR
-
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
 
@@ -379,7 +375,8 @@ config CC_HAS_SANE_STACKPROTECTOR
 	default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
 	help
 	   We have to make sure stack protector is unconditionally disabled if
-	   the compiler produces broken code.
+	   the compiler produces broken code or if it does not let us control
+	   the segment on 32-bit kernels.
 
 menu "Processor type and features"
 
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 7116da3980be..0b5cd8c49ccb 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -76,6 +76,14 @@ ifeq ($(CONFIG_X86_32),y)
 
         # temporary until string.h is fixed
         KBUILD_CFLAGS += -ffreestanding
+
+	ifeq ($(CONFIG_STACKPROTECTOR),y)
+		ifeq ($(CONFIG_SMP),y)
+			KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+		else
+			KBUILD_CFLAGS += -mstack-protector-guard=global
+		endif
+	endif
 else
         BITS := 64
         UTS_MACHINE := x86_64
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017e6161..eb0cb662bca5 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -20,7 +20,7 @@
  *	1C(%esp) - %ds
  *	20(%esp) - %es
  *	24(%esp) - %fs
- *	28(%esp) - %gs		saved iff !CONFIG_X86_32_LAZY_GS
+ *	28(%esp) - unused -- was %gs on old stackprotector kernels
  *	2C(%esp) - orig_eax
  *	30(%esp) - %eip
  *	34(%esp) - %cs
@@ -56,14 +56,9 @@
 /*
  * User gs save/restore
  *
- * %gs is used for userland TLS and kernel only uses it for stack
- * canary which is required to be at %gs:20 by gcc.  Read the comment
- * at the top of stackprotector.h for more info.
- *
- * Local labels 98 and 99 are used.
+ * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
+ * will remove it entirely.
  */
-#ifdef CONFIG_X86_32_LAZY_GS
-
  /* unfortunately push/pop can't be no-op */
 .macro PUSH_GS
 	pushl	$0
@@ -86,49 +81,6 @@
 .macro SET_KERNEL_GS reg
 .endm
 
-#else	/* CONFIG_X86_32_LAZY_GS */
-
-.macro PUSH_GS
-	pushl	%gs
-.endm
-
-.macro POP_GS pop=0
-98:	popl	%gs
-  .if \pop <> 0
-	add	$\pop, %esp
-  .endif
-.endm
-.macro POP_GS_EX
-.pushsection .fixup, "ax"
-99:	movl	$0, (%esp)
-	jmp	98b
-.popsection
-	_ASM_EXTABLE(98b, 99b)
-.endm
-
-.macro PTGS_TO_GS
-98:	mov	PT_GS(%esp), %gs
-.endm
-.macro PTGS_TO_GS_EX
-.pushsection .fixup, "ax"
-99:	movl	$0, PT_GS(%esp)
-	jmp	98b
-.popsection
-	_ASM_EXTABLE(98b, 99b)
-.endm
-
-.macro GS_TO_REG reg
-	movl	%gs, \reg
-.endm
-.macro REG_TO_PTGS reg
-	movl	\reg, PT_GS(%esp)
-.endm
-.macro SET_KERNEL_GS reg
-	movl	$(__KERNEL_STACK_CANARY), \reg
-	movl	\reg, %gs
-.endm
-
-#endif /* CONFIG_X86_32_LAZY_GS */
 
 /* Unconditionally switch to user cr3 */
 .macro SWITCH_TO_USER_CR3 scratch_reg:req
@@ -779,7 +731,7 @@ SYM_CODE_START(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movl	TASK_stack_canary(%edx), %ebx
-	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
+	movl	%ebx, PER_CPU_VAR(__stack_chk_guard)
 #endif
 
 #ifdef CONFIG_RETPOLINE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c20a52b5534b..c59dff4bbc38 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -441,6 +441,9 @@ struct fixed_percpu_data {
 	 * GCC hardcodes the stack canary as %gs:40.  Since the
 	 * irq_stack is the object at %gs:0, we reserve the bottom
 	 * 48 bytes of the irq stack for the canary.
+	 *
+	 * Once we are willing to require -mstack-protector-guard-symbol=
+	 * support for x86_64 stackprotector, we can get rid of this.
 	 */
 	char		gs_base[40];
 	unsigned long	stack_canary;
@@ -461,17 +464,7 @@ extern asmlinkage void ignore_sysret(void);
 void current_save_fsgs(void);
 #else	/* X86_64 */
 #ifdef CONFIG_STACKPROTECTOR
-/*
- * Make sure stack canary segment base is cached-aligned:
- *   "For Intel Atom processors, avoid non zero segment base address
- *    that is not aligned to cache line boundary at all cost."
- * (Optim Ref Manual Assembly/Compiler Coding Rule 15.)
- */
-struct stack_canary {
-	char __pad[20];		/* canary at %gs:20 */
-	unsigned long canary;
-};
-DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
+DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
 #endif
 /* Per CPU softirq stack pointer */
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index d8324a236696..b2c4c12d237c 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -37,7 +37,10 @@ struct pt_regs {
 	unsigned short __esh;
 	unsigned short fs;
 	unsigned short __fsh;
-	/* On interrupt, gs and __gsh store the vector number. */
+	/*
+	 * On interrupt, gs and __gsh store the vector number.  They never
+	 * store gs any more.
+	 */
 	unsigned short gs;
 	unsigned short __gsh;
 	/* On interrupt, this is the error code. */
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 7fdd4facfce7..72044026eb3c 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -95,7 +95,7 @@
  *
  *  26 - ESPFIX small SS
  *  27 - per-cpu			[ offset to per-cpu data area ]
- *  28 - stack_canary-20		[ for stack protector ]		<=== cacheline #8
+ *  28 - unused
  *  29 - unused
  *  30 - unused
  *  31 - TSS for double fault handler
@@ -118,7 +118,6 @@
 
 #define GDT_ENTRY_ESPFIX_SS		26
 #define GDT_ENTRY_PERCPU		27
-#define GDT_ENTRY_STACK_CANARY		28
 
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
@@ -158,12 +157,6 @@
 # define __KERNEL_PERCPU		0
 #endif
 
-#ifdef CONFIG_STACKPROTECTOR
-# define __KERNEL_STACK_CANARY		(GDT_ENTRY_STACK_CANARY*8)
-#else
-# define __KERNEL_STACK_CANARY		0
-#endif
-
 #else /* 64-bit: */
 
 #include <asm/cache.h>
@@ -364,22 +357,15 @@ static inline void __loadsegment_fs(unsigned short value)
 	asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
 
 /*
- * x86-32 user GS accessors:
+ * x86-32 user GS accessors.  This is ugly and could do with some cleaning up.
  */
 #ifdef CONFIG_X86_32
-# ifdef CONFIG_X86_32_LAZY_GS
-#  define get_user_gs(regs)		(u16)({ unsigned long v; savesegment(gs, v); v; })
-#  define set_user_gs(regs, v)		loadsegment(gs, (unsigned long)(v))
-#  define task_user_gs(tsk)		((tsk)->thread.gs)
-#  define lazy_save_gs(v)		savesegment(gs, (v))
-#  define lazy_load_gs(v)		loadsegment(gs, (v))
-# else	/* X86_32_LAZY_GS */
-#  define get_user_gs(regs)		(u16)((regs)->gs)
-#  define set_user_gs(regs, v)		do { (regs)->gs = (v); } while (0)
-#  define task_user_gs(tsk)		(task_pt_regs(tsk)->gs)
-#  define lazy_save_gs(v)		do { } while (0)
-#  define lazy_load_gs(v)		do { } while (0)
-# endif	/* X86_32_LAZY_GS */
+# define get_user_gs(regs)		(u16)({ unsigned long v; savesegment(gs, v); v; })
+# define set_user_gs(regs, v)		loadsegment(gs, (unsigned long)(v))
+# define task_user_gs(tsk)		((tsk)->thread.gs)
+# define lazy_save_gs(v)		savesegment(gs, (v))
+# define lazy_load_gs(v)		loadsegment(gs, (v))
+# define load_gs_index(v)		loadsegment(gs, (v))
 #endif	/* X86_32 */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 7fb482f0f25b..b6ffe58c70fa 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -5,30 +5,23 @@
  * Stack protector works by putting predefined pattern at the start of
  * the stack frame and verifying that it hasn't been overwritten when
  * returning from the function.  The pattern is called stack canary
- * and unfortunately gcc requires it to be at a fixed offset from %gs.
- * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
- * and x86_32 use segment registers differently and thus handles this
- * requirement differently.
+ * and unfortunately gcc historically required it to be at a fixed offset
+ * from the percpu segment base.  On x86_64, the offset is 40 bytes.
  *
- * On x86_64, %gs is shared by percpu area and stack canary.  All
- * percpu symbols are zero based and %gs points to the base of percpu
- * area.  The first occupant of the percpu area is always
- * fixed_percpu_data which contains stack_canary at offset 40.  Userland
- * %gs is always saved and restored on kernel entry and exit using
- * swapgs, so stack protector doesn't add any complexity there.
+ * The same segment is shared by percpu area and stack canary.  On
+ * x86_64, percpu symbols are zero based and %gs (64-bit) points to the
+ * base of percpu area.  The first occupant of the percpu area is always
+ * fixed_percpu_data which contains stack_canary at the approproate
+ * offset.  On x86_32, the stack canary is just a regular percpu
+ * variable.
  *
- * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
- * used for userland TLS.  Unfortunately, some processors are much
- * slower at loading segment registers with different value when
- * entering and leaving the kernel, so the kernel uses %fs for percpu
- * area and manages %gs lazily so that %gs is switched only when
- * necessary, usually during task switch.
+ * Putting percpu data in %fs on 32-bit is a minor optimization compared to
+ * using %gs.  Since 32-bit userspace normally has %fs == 0, we are likely
+ * to load 0 into %fs on exit to usermode, whereas with percpu data in
+ * %gs, we are likely to load a non-null %gs on return to user mode.
  *
- * As gcc requires the stack canary at %gs:20, %gs can't be managed
- * lazily if stack protector is enabled, so the kernel saves and
- * restores userland %gs on kernel entry and exit.  This behavior is
- * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
- * system.h to hide the details.
+ * Once we are willing to require GCC 8.1 or better for 64-bit stackprotector
+ * support, we can remove some of this complexity.
  */
 
 #ifndef _ASM_STACKPROTECTOR_H
@@ -44,14 +37,6 @@
 #include <linux/random.h>
 #include <linux/sched.h>
 
-/*
- * 24 byte read-only segment initializer for stack canary.  Linker
- * can't handle the address bit shifting.  Address will be set in
- * head_32 for boot CPU and setup_per_cpu_areas() for others.
- */
-#define GDT_STACK_CANARY_INIT						\
-	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
-
 /*
  * Initialize the stackprotector canary value.
  *
@@ -86,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
 #ifdef CONFIG_X86_64
 	this_cpu_write(fixed_percpu_data.stack_canary, canary);
 #else
-	this_cpu_write(stack_canary.canary, canary);
+	this_cpu_write(__stack_chk_guard, canary);
 #endif
 }
 
@@ -95,48 +80,16 @@ static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_64
 	per_cpu(fixed_percpu_data.stack_canary, cpu) = idle->stack_canary;
 #else
-	per_cpu(stack_canary.canary, cpu) = idle->stack_canary;
-#endif
-}
-
-static inline void setup_stack_canary_segment(int cpu)
-{
-#ifdef CONFIG_X86_32
-	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
-	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
-	struct desc_struct desc;
-
-	desc = gdt_table[GDT_ENTRY_STACK_CANARY];
-	set_desc_base(&desc, canary);
-	write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
-#endif
-}
-
-static inline void load_stack_canary_segment(void)
-{
-#ifdef CONFIG_X86_32
-	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
+	per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
 #endif
 }
 
 #else	/* STACKPROTECTOR */
 
-#define GDT_STACK_CANARY_INIT
-
 /* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
 
-static inline void setup_stack_canary_segment(int cpu)
-{ }
-
 static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
 { }
 
-static inline void load_stack_canary_segment(void)
-{
-#ifdef CONFIG_X86_32
-	asm volatile ("mov %0, %%gs" : : "r" (0));
-#endif
-}
-
 #endif	/* STACKPROTECTOR */
 #endif	/* _ASM_STACKPROTECTOR_H */
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index fdbd9d7b7bca..7b132d0312eb 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -13,12 +13,10 @@
 /* image of the saved processor state */
 struct saved_context {
 	/*
-	 * On x86_32, all segment registers, with the possible exception of
-	 * gs, are saved at kernel entry in pt_regs.
+	 * On x86_32, all segment registers except gs are saved at kernel
+	 * entry in pt_regs.
 	 */
-#ifdef CONFIG_X86_32_LAZY_GS
 	u16 gs;
-#endif
 	unsigned long cr0, cr2, cr3, cr4;
 	u64 misc_enable;
 	bool misc_enable_saved;
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 6e043f295a60..2b411cd00a4e 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -53,11 +53,6 @@ void foo(void)
 	       offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
-#ifdef CONFIG_STACKPROTECTOR
-	BLANK();
-	OFFSET(stack_canary_offset, stack_canary, canary);
-#endif
-
 	BLANK();
 	DEFINE(EFI_svam, offsetof(efi_runtime_services_t, set_virtual_address_map));
 }
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..f208569d2d3b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -161,7 +161,6 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
 
 	[GDT_ENTRY_ESPFIX_SS]		= GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
 	[GDT_ENTRY_PERCPU]		= GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
-	GDT_STACK_CANARY_INIT
 #endif
 } };
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
@@ -599,7 +598,6 @@ void load_percpu_segment(int cpu)
 	__loadsegment_simple(gs, 0);
 	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
 #endif
-	load_stack_canary_segment();
 }
 
 #ifdef CONFIG_X86_32
@@ -1793,7 +1791,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
 #ifdef CONFIG_STACKPROTECTOR
-DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
+DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
 #endif
 
 #endif	/* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
index 759d392cbe9f..d1d49e3d536b 100644
--- a/arch/x86/kernel/doublefault_32.c
+++ b/arch/x86/kernel/doublefault_32.c
@@ -100,9 +100,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
 		.ss		= __KERNEL_DS,
 		.ds		= __USER_DS,
 		.fs		= __KERNEL_PERCPU,
-#ifndef CONFIG_X86_32_LAZY_GS
-		.gs		= __KERNEL_STACK_CANARY,
-#endif
+		.gs		= 0,
 
 		.__cr3		= __pa_nodebug(swapper_pg_dir),
 	},
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 7ed84c282233..67f590425d90 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -318,8 +318,8 @@ SYM_FUNC_START(startup_32_smp)
 	movl $(__KERNEL_PERCPU), %eax
 	movl %eax,%fs			# set this cpu's percpu
 
-	movl $(__KERNEL_STACK_CANARY),%eax
-	movl %eax,%gs
+	xorl %eax,%eax
+	movl %eax,%gs			# clear possible garbage in %gs
 
 	xorl %eax,%eax			# Clear LDT
 	lldt %ax
@@ -339,20 +339,6 @@ SYM_FUNC_END(startup_32_smp)
  */
 __INIT
 setup_once:
-#ifdef CONFIG_STACKPROTECTOR
-	/*
-	 * Configure the stack canary. The linker can't handle this by
-	 * relocation.  Manually set base address in stack canary
-	 * segment descriptor.
-	 */
-	movl $gdt_page,%eax
-	movl $stack_canary,%ecx
-	movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
-	shrl $16, %ecx
-	movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
-	movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
-#endif
-
 	andl $0,setup_once_ref	/* Once is enough, thanks */
 	ret
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index fd945ce78554..0941d2f44f2a 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -224,7 +224,6 @@ void __init setup_per_cpu_areas(void)
 		per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
 		per_cpu(cpu_number, cpu) = cpu;
 		setup_percpu_segment(cpu);
-		setup_stack_canary_segment(cpu);
 		/*
 		 * Copy data used in early init routines from the
 		 * initial arrays to the per cpu data areas.  These
diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 64a496a0687f..3c883e064242 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -164,17 +164,11 @@ int do_set_thread_area(struct task_struct *p, int idx,
 		savesegment(fs, sel);
 		if (sel == modified_sel)
 			loadsegment(fs, sel);
-
-		savesegment(gs, sel);
-		if (sel == modified_sel)
-			load_gs_index(sel);
 #endif
 
-#ifdef CONFIG_X86_32_LAZY_GS
 		savesegment(gs, sel);
 		if (sel == modified_sel)
-			loadsegment(gs, sel);
-#endif
+			load_gs_index(sel);
 	} else {
 #ifdef CONFIG_X86_64
 		if (p->thread.fsindex == modified_sel)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f923e14e87df..ec39073b4897 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_X86_64
 		loadsegment(fs, svm->host.fs);
 		wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
-		load_gs_index(svm->host.gs);
-#else
-#ifdef CONFIG_X86_32_LAZY_GS
-		loadsegment(gs, svm->host.gs);
-#endif
 #endif
+		load_gs_index(svm->host.gs);
 
 		for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
 			wrmsrl(host_save_user_msrs[i].index,
@@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	} else {
 		__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
 
+		/* Restore the percpu segment immediately. */
 #ifdef CONFIG_X86_64
 		native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
 #else
 		loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
-		loadsegment(gs, svm->host.gs);
-#endif
 #endif
 	}
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4229950a5d78..7f89a091f1fb 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -404,10 +404,6 @@ static short get_segment_selector(struct pt_regs *regs, int seg_reg_idx)
 	case INAT_SEG_REG_FS:
 		return (unsigned short)(regs->fs & 0xffff);
 	case INAT_SEG_REG_GS:
-		/*
-		 * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
-		 * The macro below takes care of both cases.
-		 */
 		return get_user_gs(regs);
 	case INAT_SEG_REG_IGNORE:
 	default:
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index 43b4d864817e..afbf0bb252da 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -45,10 +45,8 @@
 
 #define PVH_GDT_ENTRY_CS	1
 #define PVH_GDT_ENTRY_DS	2
-#define PVH_GDT_ENTRY_CANARY	3
 #define PVH_CS_SEL		(PVH_GDT_ENTRY_CS * 8)
 #define PVH_DS_SEL		(PVH_GDT_ENTRY_DS * 8)
-#define PVH_CANARY_SEL		(PVH_GDT_ENTRY_CANARY * 8)
 
 SYM_CODE_START_LOCAL(pvh_start_xen)
 	cld
@@ -109,17 +107,6 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 
 #else /* CONFIG_X86_64 */
 
-	/* Set base address in stack canary descriptor. */
-	movl $_pa(gdt_start),%eax
-	movl $_pa(canary),%ecx
-	movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
-	shrl $16, %ecx
-	movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
-	movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
-
-	mov $PVH_CANARY_SEL,%eax
-	mov %eax,%gs
-
 	call mk_early_pgtbl_32
 
 	mov $_pa(initial_page_table), %eax
@@ -163,7 +150,6 @@ SYM_DATA_START_LOCAL(gdt_start)
 	.quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
 #endif
 	.quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
-	.quad GDT_ENTRY(0x4090, 0, 0x18)    /* PVH_CANARY_SEL */
 SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
 
 	.balign 16
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index db1378c6ff26..ef4329d67a5f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -99,11 +99,8 @@ static void __save_processor_state(struct saved_context *ctxt)
 	/*
 	 * segment registers
 	 */
-#ifdef CONFIG_X86_32_LAZY_GS
 	savesegment(gs, ctxt->gs);
-#endif
 #ifdef CONFIG_X86_64
-	savesegment(gs, ctxt->gs);
 	savesegment(fs, ctxt->fs);
 	savesegment(ds, ctxt->ds);
 	savesegment(es, ctxt->es);
@@ -232,7 +229,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
 #else
 	loadsegment(fs, __KERNEL_PERCPU);
-	loadsegment(gs, __KERNEL_STACK_CANARY);
 #endif
 
 	/* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
@@ -255,7 +251,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	 */
 	wrmsrl(MSR_FS_BASE, ctxt->fs_base);
 	wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
-#elif defined(CONFIG_X86_32_LAZY_GS)
+#else
 	loadsegment(gs, ctxt->gs);
 #endif
 
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9a5a50cdaab5..e18235a6390d 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1190,7 +1190,6 @@ static void __init xen_setup_gdt(int cpu)
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
 
-	setup_stack_canary_segment(cpu);
 	switch_to_new_gdt(cpu);
 
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
index f5c119495254..825c75c5b715 100755
--- a/scripts/gcc-x86_32-has-stack-protector.sh
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -1,4 +1,8 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+# This requires GCC 8.1 or better.  Specifically, we require
+# -mstack-protector-guard-reg, added by
+# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
+
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] x86/entry/32: Remove leftover macros after stackprotector cleanups
  2021-02-13 19:19 [PATCH v2 0/2] Clean up x86_32 stackprotector Andy Lutomirski
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
@ 2021-02-13 19:19 ` Andy Lutomirski
  2021-03-08 13:14   ` [tip: x86/core] " tip-bot2 for Andy Lutomirski
  2021-02-13 19:33 ` [PATCH v2 0/2] Clean up x86_32 stackprotector Sedat Dilek
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2021-02-13 19:19 UTC (permalink / raw)
  To: x86
  Cc: LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel, Andy Lutomirski

Now that nonlazy-GS mode is gone, remove the macros from entry_32.S
that obfuscated^Wabstracted GS handling.  The assembled output is
identical before and after this patch.

Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 43 ++-------------------------------------
 1 file changed, 2 insertions(+), 41 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index eb0cb662bca5..bee9101e211e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -53,35 +53,6 @@
 
 #define PTI_SWITCH_MASK         (1 << PAGE_SHIFT)
 
-/*
- * User gs save/restore
- *
- * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
- * will remove it entirely.
- */
- /* unfortunately push/pop can't be no-op */
-.macro PUSH_GS
-	pushl	$0
-.endm
-.macro POP_GS pop=0
-	addl	$(4 + \pop), %esp
-.endm
-.macro POP_GS_EX
-.endm
-
- /* all the rest are no-op */
-.macro PTGS_TO_GS
-.endm
-.macro PTGS_TO_GS_EX
-.endm
-.macro GS_TO_REG reg
-.endm
-.macro REG_TO_PTGS reg
-.endm
-.macro SET_KERNEL_GS reg
-.endm
-
-
 /* Unconditionally switch to user cr3 */
 .macro SWITCH_TO_USER_CR3 scratch_reg:req
 	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
@@ -234,7 +205,7 @@
 .macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 skip_gs=0 unwind_espfix=0
 	cld
 .if \skip_gs == 0
-	PUSH_GS
+	pushl	$0
 .endif
 	pushl	%fs
 
@@ -259,9 +230,6 @@
 	movl	$(__USER_DS), %edx
 	movl	%edx, %ds
 	movl	%edx, %es
-.if \skip_gs == 0
-	SET_KERNEL_GS %edx
-.endif
 	/* Switch to kernel stack if necessary */
 .if \switch_stacks > 0
 	SWITCH_TO_KERNEL_STACK
@@ -300,7 +268,7 @@
 1:	popl	%ds
 2:	popl	%es
 3:	popl	%fs
-	POP_GS \pop
+	addl	$(4 + \pop), %esp	/* pop the unused "gs" slot */
 	IRET_FRAME
 .pushsection .fixup, "ax"
 4:	movl	$0, (%esp)
@@ -313,7 +281,6 @@
 	_ASM_EXTABLE(1b, 4b)
 	_ASM_EXTABLE(2b, 5b)
 	_ASM_EXTABLE(3b, 6b)
-	POP_GS_EX
 .endm
 
 .macro RESTORE_ALL_NMI cr3_reg:req pop=0
@@ -928,7 +895,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
-	PTGS_TO_GS
 
 	popl	%ebx			/* pt_regs->bx */
 	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
@@ -964,7 +930,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	jmp	1b
 .popsection
 	_ASM_EXTABLE(1b, 2b)
-	PTGS_TO_GS_EX
 
 .Lsysenter_fix_flags:
 	pushl	$X86_EFLAGS_FIXED
@@ -1106,11 +1071,7 @@ SYM_CODE_START_LOCAL_NOALIGN(handle_exception)
 	SAVE_ALL switch_stacks=1 skip_gs=1 unwind_espfix=1
 	ENCODE_FRAME_POINTER
 
-	/* fixup %gs */
-	GS_TO_REG %ecx
 	movl	PT_GS(%esp), %edi		# get the function address
-	REG_TO_PTGS %ecx
-	SET_KERNEL_GS %ecx
 
 	/* fixup orig %eax */
 	movl	PT_ORIG_EAX(%esp), %edx		# get the error code
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/2] Clean up x86_32 stackprotector
  2021-02-13 19:19 [PATCH v2 0/2] Clean up x86_32 stackprotector Andy Lutomirski
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
  2021-02-13 19:19 ` [PATCH v2 2/2] x86/entry/32: Remove leftover macros after stackprotector cleanups Andy Lutomirski
@ 2021-02-13 19:33 ` Sedat Dilek
  2021-02-14 14:42 ` Sedat Dilek
  2021-02-16 18:13 ` Nick Desaulniers
  4 siblings, 0 replies; 24+ messages in thread
From: Sedat Dilek @ 2021-02-13 19:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Nick Desaulniers, Sean Christopherson, Brian Gerst,
	Joerg Roedel

On Sat, Feb 13, 2021 at 8:19 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> x86_32 stackprotector is a maintenance nightmare.  Clean it up.  This
> disables stackprotector on x86_32 on GCC 8.1 and on all clang
> versions.  Some clang people are cc'd.
>
> Changes from v1:
>  - Changelog fixes.
>  - Comment fixes (mostly from Sean).
>  - Fix the !SMP case.
>
> Andy Lutomirski (2):
>   x86/stackprotector/32: Make the canary into a regular percpu variable
>   x86/entry/32: Remove leftover macros after stackprotector cleanups
>

Thanks Andy for bringing this up as v2.

AFAICS, LLVM bug #47479 was involved here (see [1]).
Maybe Nick can say some words.

After patching my LLVM toolchain I decided to drop this and integrated
the (kernel-side) diff from Nick (see [1]) in my custom patch-series.

Is this your x86/unify_stack_canary Git branch (see [2])?

- Sedat -

[1] https://bugs.llvm.org/show_bug.cgi?id=47479
[2] https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/unify_stack_canary

>  arch/x86/Kconfig                          |  7 +-
>  arch/x86/Makefile                         |  8 ++
>  arch/x86/entry/entry_32.S                 | 95 +----------------------
>  arch/x86/include/asm/processor.h          | 15 +---
>  arch/x86/include/asm/ptrace.h             |  5 +-
>  arch/x86/include/asm/segment.h            | 30 ++-----
>  arch/x86/include/asm/stackprotector.h     | 79 ++++---------------
>  arch/x86/include/asm/suspend_32.h         |  6 +-
>  arch/x86/kernel/asm-offsets_32.c          |  5 --
>  arch/x86/kernel/cpu/common.c              |  5 +-
>  arch/x86/kernel/doublefault_32.c          |  4 +-
>  arch/x86/kernel/head_32.S                 | 18 +----
>  arch/x86/kernel/setup_percpu.c            |  1 -
>  arch/x86/kernel/tls.c                     |  8 +-
>  arch/x86/kvm/svm/svm.c                    | 10 +--
>  arch/x86/lib/insn-eval.c                  |  4 -
>  arch/x86/platform/pvh/head.S              | 14 ----
>  arch/x86/power/cpu.c                      |  6 +-
>  arch/x86/xen/enlighten_pv.c               |  1 -
>  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
>  20 files changed, 62 insertions(+), 265 deletions(-)
>
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
@ 2021-02-13 19:49   ` Sedat Dilek
  2021-02-16 16:21   ` Sean Christopherson
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Sedat Dilek @ 2021-02-13 19:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Nick Desaulniers, Sean Christopherson, Brian Gerst,
	Joerg Roedel

On Sat, Feb 13, 2021 at 8:19 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> percpu storage.  It's even nastier because it means that whether %gs
> contains userspace state or kernel state while running kernel code
> depends on whether stackprotector is enabled (this is
> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> that segment selectors work.  Supporting both variants is a
> maintenance and testing mess.
>
> Merely rearranging so that percpu and the stack canary
> share the same segment would be messy as the 32-bit percpu address
> layout isn't currently compatible with putting a variable at a fixed
> offset.
>
> Fortunately, GCC 8.1 added options that allow the stack canary to be
> accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
> percpu variable.  This lets us get rid of all of the code to manage the
> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>
> (That name is special.  We could use any symbol we want for the
>  %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
>  name other than __stack_chk_guard.)
>
> This patch forcibly disables stackprotector on older compilers that
> don't support the new options and makes the stack canary into a
> percpu variable.  The "lazy GS" approach is now used for all 32-bit
> configurations.
>
> This patch also makes load_gs_index() work on 32-bit kernels.  On
> 64-bit kernels, it loads the GS selector and updates the user
> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> it loads the GS selector and updates GSBASE, which is now
> always the user base.  This means that the overall effect is
> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>
> Cc: Sedat Dilek <sedat.dilek@gmail.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/Kconfig                          |  7 +-
>  arch/x86/Makefile                         |  8 +++
>  arch/x86/entry/entry_32.S                 | 56 ++--------------
>  arch/x86/include/asm/processor.h          | 15 ++---
>  arch/x86/include/asm/ptrace.h             |  5 +-
>  arch/x86/include/asm/segment.h            | 30 +++------
>  arch/x86/include/asm/stackprotector.h     | 79 +++++------------------
>  arch/x86/include/asm/suspend_32.h         |  6 +-
>  arch/x86/kernel/asm-offsets_32.c          |  5 --
>  arch/x86/kernel/cpu/common.c              |  5 +-
>  arch/x86/kernel/doublefault_32.c          |  4 +-
>  arch/x86/kernel/head_32.S                 | 18 +-----
>  arch/x86/kernel/setup_percpu.c            |  1 -
>  arch/x86/kernel/tls.c                     |  8 +--
>  arch/x86/kvm/svm/svm.c                    | 10 +--
>  arch/x86/lib/insn-eval.c                  |  4 --
>  arch/x86/platform/pvh/head.S              | 14 ----
>  arch/x86/power/cpu.c                      |  6 +-
>  arch/x86/xen/enlighten_pv.c               |  1 -
>  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
>  20 files changed, 62 insertions(+), 226 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..12d8bf011d08 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -353,10 +353,6 @@ config X86_64_SMP
>         def_bool y
>         depends on X86_64 && SMP
>
> -config X86_32_LAZY_GS
> -       def_bool y
> -       depends on X86_32 && !STACKPROTECTOR
> -
>  config ARCH_SUPPORTS_UPROBES
>         def_bool y
>
> @@ -379,7 +375,8 @@ config CC_HAS_SANE_STACKPROTECTOR
>         default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
>         help
>            We have to make sure stack protector is unconditionally disabled if
> -          the compiler produces broken code.
> +          the compiler produces broken code or if it does not let us control
> +          the segment on 32-bit kernels.
>
>  menu "Processor type and features"
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 7116da3980be..0b5cd8c49ccb 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -76,6 +76,14 @@ ifeq ($(CONFIG_X86_32),y)
>
>          # temporary until string.h is fixed
>          KBUILD_CFLAGS += -ffreestanding
> +
> +       ifeq ($(CONFIG_STACKPROTECTOR),y)
> +               ifeq ($(CONFIG_SMP),y)
> +                       KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
> +               else
> +                       KBUILD_CFLAGS += -mstack-protector-guard=global
> +               endif
> +       endif

Just FYI:

According to [1] the following clang compiler flags are available:

-mstack-protector-guard-offset=<arg>
Use the given offset for addressing the stack-protector guard

-mstack-protector-guard-reg=<arg>
Use the given reg for addressing the stack-protector guard

-mstack-protector-guard=<arg>
Use the given guard (global, tls) for addressing the stack-protector guard

Looks like -mstack-protector-guard-symbol=<arg> is not (yet) available.

- Sedat -

[1] https://clang.llvm.org/docs/ClangCommandLineReference.html

>  else
>          BITS := 64
>          UTS_MACHINE := x86_64
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index df8c017e6161..eb0cb662bca5 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -20,7 +20,7 @@
>   *     1C(%esp) - %ds
>   *     20(%esp) - %es
>   *     24(%esp) - %fs
> - *     28(%esp) - %gs          saved iff !CONFIG_X86_32_LAZY_GS
> + *     28(%esp) - unused -- was %gs on old stackprotector kernels
>   *     2C(%esp) - orig_eax
>   *     30(%esp) - %eip
>   *     34(%esp) - %cs
> @@ -56,14 +56,9 @@
>  /*
>   * User gs save/restore
>   *
> - * %gs is used for userland TLS and kernel only uses it for stack
> - * canary which is required to be at %gs:20 by gcc.  Read the comment
> - * at the top of stackprotector.h for more info.
> - *
> - * Local labels 98 and 99 are used.
> + * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
> + * will remove it entirely.
>   */
> -#ifdef CONFIG_X86_32_LAZY_GS
> -
>   /* unfortunately push/pop can't be no-op */
>  .macro PUSH_GS
>         pushl   $0
> @@ -86,49 +81,6 @@
>  .macro SET_KERNEL_GS reg
>  .endm
>
> -#else  /* CONFIG_X86_32_LAZY_GS */
> -
> -.macro PUSH_GS
> -       pushl   %gs
> -.endm
> -
> -.macro POP_GS pop=0
> -98:    popl    %gs
> -  .if \pop <> 0
> -       add     $\pop, %esp
> -  .endif
> -.endm
> -.macro POP_GS_EX
> -.pushsection .fixup, "ax"
> -99:    movl    $0, (%esp)
> -       jmp     98b
> -.popsection
> -       _ASM_EXTABLE(98b, 99b)
> -.endm
> -
> -.macro PTGS_TO_GS
> -98:    mov     PT_GS(%esp), %gs
> -.endm
> -.macro PTGS_TO_GS_EX
> -.pushsection .fixup, "ax"
> -99:    movl    $0, PT_GS(%esp)
> -       jmp     98b
> -.popsection
> -       _ASM_EXTABLE(98b, 99b)
> -.endm
> -
> -.macro GS_TO_REG reg
> -       movl    %gs, \reg
> -.endm
> -.macro REG_TO_PTGS reg
> -       movl    \reg, PT_GS(%esp)
> -.endm
> -.macro SET_KERNEL_GS reg
> -       movl    $(__KERNEL_STACK_CANARY), \reg
> -       movl    \reg, %gs
> -.endm
> -
> -#endif /* CONFIG_X86_32_LAZY_GS */
>
>  /* Unconditionally switch to user cr3 */
>  .macro SWITCH_TO_USER_CR3 scratch_reg:req
> @@ -779,7 +731,7 @@ SYM_CODE_START(__switch_to_asm)
>
>  #ifdef CONFIG_STACKPROTECTOR
>         movl    TASK_stack_canary(%edx), %ebx
> -       movl    %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
> +       movl    %ebx, PER_CPU_VAR(__stack_chk_guard)
>  #endif
>
>  #ifdef CONFIG_RETPOLINE
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index c20a52b5534b..c59dff4bbc38 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -441,6 +441,9 @@ struct fixed_percpu_data {
>          * GCC hardcodes the stack canary as %gs:40.  Since the
>          * irq_stack is the object at %gs:0, we reserve the bottom
>          * 48 bytes of the irq stack for the canary.
> +        *
> +        * Once we are willing to require -mstack-protector-guard-symbol=
> +        * support for x86_64 stackprotector, we can get rid of this.
>          */
>         char            gs_base[40];
>         unsigned long   stack_canary;
> @@ -461,17 +464,7 @@ extern asmlinkage void ignore_sysret(void);
>  void current_save_fsgs(void);
>  #else  /* X86_64 */
>  #ifdef CONFIG_STACKPROTECTOR
> -/*
> - * Make sure stack canary segment base is cached-aligned:
> - *   "For Intel Atom processors, avoid non zero segment base address
> - *    that is not aligned to cache line boundary at all cost."
> - * (Optim Ref Manual Assembly/Compiler Coding Rule 15.)
> - */
> -struct stack_canary {
> -       char __pad[20];         /* canary at %gs:20 */
> -       unsigned long canary;
> -};
> -DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> +DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
>  #endif
>  /* Per CPU softirq stack pointer */
>  DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index d8324a236696..b2c4c12d237c 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -37,7 +37,10 @@ struct pt_regs {
>         unsigned short __esh;
>         unsigned short fs;
>         unsigned short __fsh;
> -       /* On interrupt, gs and __gsh store the vector number. */
> +       /*
> +        * On interrupt, gs and __gsh store the vector number.  They never
> +        * store gs any more.
> +        */
>         unsigned short gs;
>         unsigned short __gsh;
>         /* On interrupt, this is the error code. */
> diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
> index 7fdd4facfce7..72044026eb3c 100644
> --- a/arch/x86/include/asm/segment.h
> +++ b/arch/x86/include/asm/segment.h
> @@ -95,7 +95,7 @@
>   *
>   *  26 - ESPFIX small SS
>   *  27 - per-cpu                       [ offset to per-cpu data area ]
> - *  28 - stack_canary-20               [ for stack protector ]         <=== cacheline #8
> + *  28 - unused
>   *  29 - unused
>   *  30 - unused
>   *  31 - TSS for double fault handler
> @@ -118,7 +118,6 @@
>
>  #define GDT_ENTRY_ESPFIX_SS            26
>  #define GDT_ENTRY_PERCPU               27
> -#define GDT_ENTRY_STACK_CANARY         28
>
>  #define GDT_ENTRY_DOUBLEFAULT_TSS      31
>
> @@ -158,12 +157,6 @@
>  # define __KERNEL_PERCPU               0
>  #endif
>
> -#ifdef CONFIG_STACKPROTECTOR
> -# define __KERNEL_STACK_CANARY         (GDT_ENTRY_STACK_CANARY*8)
> -#else
> -# define __KERNEL_STACK_CANARY         0
> -#endif
> -
>  #else /* 64-bit: */
>
>  #include <asm/cache.h>
> @@ -364,22 +357,15 @@ static inline void __loadsegment_fs(unsigned short value)
>         asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
>
>  /*
> - * x86-32 user GS accessors:
> + * x86-32 user GS accessors.  This is ugly and could do with some cleaning up.
>   */
>  #ifdef CONFIG_X86_32
> -# ifdef CONFIG_X86_32_LAZY_GS
> -#  define get_user_gs(regs)            (u16)({ unsigned long v; savesegment(gs, v); v; })
> -#  define set_user_gs(regs, v)         loadsegment(gs, (unsigned long)(v))
> -#  define task_user_gs(tsk)            ((tsk)->thread.gs)
> -#  define lazy_save_gs(v)              savesegment(gs, (v))
> -#  define lazy_load_gs(v)              loadsegment(gs, (v))
> -# else /* X86_32_LAZY_GS */
> -#  define get_user_gs(regs)            (u16)((regs)->gs)
> -#  define set_user_gs(regs, v)         do { (regs)->gs = (v); } while (0)
> -#  define task_user_gs(tsk)            (task_pt_regs(tsk)->gs)
> -#  define lazy_save_gs(v)              do { } while (0)
> -#  define lazy_load_gs(v)              do { } while (0)
> -# endif        /* X86_32_LAZY_GS */
> +# define get_user_gs(regs)             (u16)({ unsigned long v; savesegment(gs, v); v; })
> +# define set_user_gs(regs, v)          loadsegment(gs, (unsigned long)(v))
> +# define task_user_gs(tsk)             ((tsk)->thread.gs)
> +# define lazy_save_gs(v)               savesegment(gs, (v))
> +# define lazy_load_gs(v)               loadsegment(gs, (v))
> +# define load_gs_index(v)              loadsegment(gs, (v))
>  #endif /* X86_32 */
>
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> index 7fb482f0f25b..b6ffe58c70fa 100644
> --- a/arch/x86/include/asm/stackprotector.h
> +++ b/arch/x86/include/asm/stackprotector.h
> @@ -5,30 +5,23 @@
>   * Stack protector works by putting predefined pattern at the start of
>   * the stack frame and verifying that it hasn't been overwritten when
>   * returning from the function.  The pattern is called stack canary
> - * and unfortunately gcc requires it to be at a fixed offset from %gs.
> - * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
> - * and x86_32 use segment registers differently and thus handles this
> - * requirement differently.
> + * and unfortunately gcc historically required it to be at a fixed offset
> + * from the percpu segment base.  On x86_64, the offset is 40 bytes.
>   *
> - * On x86_64, %gs is shared by percpu area and stack canary.  All
> - * percpu symbols are zero based and %gs points to the base of percpu
> - * area.  The first occupant of the percpu area is always
> - * fixed_percpu_data which contains stack_canary at offset 40.  Userland
> - * %gs is always saved and restored on kernel entry and exit using
> - * swapgs, so stack protector doesn't add any complexity there.
> + * The same segment is shared by percpu area and stack canary.  On
> + * x86_64, percpu symbols are zero based and %gs (64-bit) points to the
> + * base of percpu area.  The first occupant of the percpu area is always
> + * fixed_percpu_data which contains stack_canary at the approproate
> + * offset.  On x86_32, the stack canary is just a regular percpu
> + * variable.
>   *
> - * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
> - * used for userland TLS.  Unfortunately, some processors are much
> - * slower at loading segment registers with different value when
> - * entering and leaving the kernel, so the kernel uses %fs for percpu
> - * area and manages %gs lazily so that %gs is switched only when
> - * necessary, usually during task switch.
> + * Putting percpu data in %fs on 32-bit is a minor optimization compared to
> + * using %gs.  Since 32-bit userspace normally has %fs == 0, we are likely
> + * to load 0 into %fs on exit to usermode, whereas with percpu data in
> + * %gs, we are likely to load a non-null %gs on return to user mode.
>   *
> - * As gcc requires the stack canary at %gs:20, %gs can't be managed
> - * lazily if stack protector is enabled, so the kernel saves and
> - * restores userland %gs on kernel entry and exit.  This behavior is
> - * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
> - * system.h to hide the details.
> + * Once we are willing to require GCC 8.1 or better for 64-bit stackprotector
> + * support, we can remove some of this complexity.
>   */
>
>  #ifndef _ASM_STACKPROTECTOR_H
> @@ -44,14 +37,6 @@
>  #include <linux/random.h>
>  #include <linux/sched.h>
>
> -/*
> - * 24 byte read-only segment initializer for stack canary.  Linker
> - * can't handle the address bit shifting.  Address will be set in
> - * head_32 for boot CPU and setup_per_cpu_areas() for others.
> - */
> -#define GDT_STACK_CANARY_INIT                                          \
> -       [GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
> -
>  /*
>   * Initialize the stackprotector canary value.
>   *
> @@ -86,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
>  #ifdef CONFIG_X86_64
>         this_cpu_write(fixed_percpu_data.stack_canary, canary);
>  #else
> -       this_cpu_write(stack_canary.canary, canary);
> +       this_cpu_write(__stack_chk_guard, canary);
>  #endif
>  }
>
> @@ -95,48 +80,16 @@ static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
>  #ifdef CONFIG_X86_64
>         per_cpu(fixed_percpu_data.stack_canary, cpu) = idle->stack_canary;
>  #else
> -       per_cpu(stack_canary.canary, cpu) = idle->stack_canary;
> -#endif
> -}
> -
> -static inline void setup_stack_canary_segment(int cpu)
> -{
> -#ifdef CONFIG_X86_32
> -       unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
> -       struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
> -       struct desc_struct desc;
> -
> -       desc = gdt_table[GDT_ENTRY_STACK_CANARY];
> -       set_desc_base(&desc, canary);
> -       write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
> -#endif
> -}
> -
> -static inline void load_stack_canary_segment(void)
> -{
> -#ifdef CONFIG_X86_32
> -       asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
> +       per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
>  #endif
>  }
>
>  #else  /* STACKPROTECTOR */
>
> -#define GDT_STACK_CANARY_INIT
> -
>  /* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
>
> -static inline void setup_stack_canary_segment(int cpu)
> -{ }
> -
>  static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
>  { }
>
> -static inline void load_stack_canary_segment(void)
> -{
> -#ifdef CONFIG_X86_32
> -       asm volatile ("mov %0, %%gs" : : "r" (0));
> -#endif
> -}
> -
>  #endif /* STACKPROTECTOR */
>  #endif /* _ASM_STACKPROTECTOR_H */
> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> index fdbd9d7b7bca..7b132d0312eb 100644
> --- a/arch/x86/include/asm/suspend_32.h
> +++ b/arch/x86/include/asm/suspend_32.h
> @@ -13,12 +13,10 @@
>  /* image of the saved processor state */
>  struct saved_context {
>         /*
> -        * On x86_32, all segment registers, with the possible exception of
> -        * gs, are saved at kernel entry in pt_regs.
> +        * On x86_32, all segment registers except gs are saved at kernel
> +        * entry in pt_regs.
>          */
> -#ifdef CONFIG_X86_32_LAZY_GS
>         u16 gs;
> -#endif
>         unsigned long cr0, cr2, cr3, cr4;
>         u64 misc_enable;
>         bool misc_enable_saved;
> diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
> index 6e043f295a60..2b411cd00a4e 100644
> --- a/arch/x86/kernel/asm-offsets_32.c
> +++ b/arch/x86/kernel/asm-offsets_32.c
> @@ -53,11 +53,6 @@ void foo(void)
>                offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
>                offsetofend(struct cpu_entry_area, entry_stack_page.stack));
>
> -#ifdef CONFIG_STACKPROTECTOR
> -       BLANK();
> -       OFFSET(stack_canary_offset, stack_canary, canary);
> -#endif
> -
>         BLANK();
>         DEFINE(EFI_svam, offsetof(efi_runtime_services_t, set_virtual_address_map));
>  }
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 35ad8480c464..f208569d2d3b 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -161,7 +161,6 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
>
>         [GDT_ENTRY_ESPFIX_SS]           = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
>         [GDT_ENTRY_PERCPU]              = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
> -       GDT_STACK_CANARY_INIT
>  #endif
>  } };
>  EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
> @@ -599,7 +598,6 @@ void load_percpu_segment(int cpu)
>         __loadsegment_simple(gs, 0);
>         wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>  #endif
> -       load_stack_canary_segment();
>  }
>
>  #ifdef CONFIG_X86_32
> @@ -1793,7 +1791,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
>  EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
>
>  #ifdef CONFIG_STACKPROTECTOR
> -DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> +DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
> +EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
>  #endif
>
>  #endif /* CONFIG_X86_64 */
> diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
> index 759d392cbe9f..d1d49e3d536b 100644
> --- a/arch/x86/kernel/doublefault_32.c
> +++ b/arch/x86/kernel/doublefault_32.c
> @@ -100,9 +100,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
>                 .ss             = __KERNEL_DS,
>                 .ds             = __USER_DS,
>                 .fs             = __KERNEL_PERCPU,
> -#ifndef CONFIG_X86_32_LAZY_GS
> -               .gs             = __KERNEL_STACK_CANARY,
> -#endif
> +               .gs             = 0,
>
>                 .__cr3          = __pa_nodebug(swapper_pg_dir),
>         },
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 7ed84c282233..67f590425d90 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -318,8 +318,8 @@ SYM_FUNC_START(startup_32_smp)
>         movl $(__KERNEL_PERCPU), %eax
>         movl %eax,%fs                   # set this cpu's percpu
>
> -       movl $(__KERNEL_STACK_CANARY),%eax
> -       movl %eax,%gs
> +       xorl %eax,%eax
> +       movl %eax,%gs                   # clear possible garbage in %gs
>
>         xorl %eax,%eax                  # Clear LDT
>         lldt %ax
> @@ -339,20 +339,6 @@ SYM_FUNC_END(startup_32_smp)
>   */
>  __INIT
>  setup_once:
> -#ifdef CONFIG_STACKPROTECTOR
> -       /*
> -        * Configure the stack canary. The linker can't handle this by
> -        * relocation.  Manually set base address in stack canary
> -        * segment descriptor.
> -        */
> -       movl $gdt_page,%eax
> -       movl $stack_canary,%ecx
> -       movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
> -       shrl $16, %ecx
> -       movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
> -       movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
> -#endif
> -
>         andl $0,setup_once_ref  /* Once is enough, thanks */
>         ret
>
> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
> index fd945ce78554..0941d2f44f2a 100644
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -224,7 +224,6 @@ void __init setup_per_cpu_areas(void)
>                 per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
>                 per_cpu(cpu_number, cpu) = cpu;
>                 setup_percpu_segment(cpu);
> -               setup_stack_canary_segment(cpu);
>                 /*
>                  * Copy data used in early init routines from the
>                  * initial arrays to the per cpu data areas.  These
> diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
> index 64a496a0687f..3c883e064242 100644
> --- a/arch/x86/kernel/tls.c
> +++ b/arch/x86/kernel/tls.c
> @@ -164,17 +164,11 @@ int do_set_thread_area(struct task_struct *p, int idx,
>                 savesegment(fs, sel);
>                 if (sel == modified_sel)
>                         loadsegment(fs, sel);
> -
> -               savesegment(gs, sel);
> -               if (sel == modified_sel)
> -                       load_gs_index(sel);
>  #endif
>
> -#ifdef CONFIG_X86_32_LAZY_GS
>                 savesegment(gs, sel);
>                 if (sel == modified_sel)
> -                       loadsegment(gs, sel);
> -#endif
> +                       load_gs_index(sel);
>         } else {
>  #ifdef CONFIG_X86_64
>                 if (p->thread.fsindex == modified_sel)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f923e14e87df..ec39073b4897 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>  #ifdef CONFIG_X86_64
>                 loadsegment(fs, svm->host.fs);
>                 wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> -               load_gs_index(svm->host.gs);
> -#else
> -#ifdef CONFIG_X86_32_LAZY_GS
> -               loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
> +               load_gs_index(svm->host.gs);
>
>                 for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
>                         wrmsrl(host_save_user_msrs[i].index,
> @@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>         } else {
>                 __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
>
> +               /* Restore the percpu segment immediately. */
>  #ifdef CONFIG_X86_64
>                 native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>  #else
>                 loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> -               loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
>         }
>
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 4229950a5d78..7f89a091f1fb 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -404,10 +404,6 @@ static short get_segment_selector(struct pt_regs *regs, int seg_reg_idx)
>         case INAT_SEG_REG_FS:
>                 return (unsigned short)(regs->fs & 0xffff);
>         case INAT_SEG_REG_GS:
> -               /*
> -                * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
> -                * The macro below takes care of both cases.
> -                */
>                 return get_user_gs(regs);
>         case INAT_SEG_REG_IGNORE:
>         default:
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index 43b4d864817e..afbf0bb252da 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -45,10 +45,8 @@
>
>  #define PVH_GDT_ENTRY_CS       1
>  #define PVH_GDT_ENTRY_DS       2
> -#define PVH_GDT_ENTRY_CANARY   3
>  #define PVH_CS_SEL             (PVH_GDT_ENTRY_CS * 8)
>  #define PVH_DS_SEL             (PVH_GDT_ENTRY_DS * 8)
> -#define PVH_CANARY_SEL         (PVH_GDT_ENTRY_CANARY * 8)
>
>  SYM_CODE_START_LOCAL(pvh_start_xen)
>         cld
> @@ -109,17 +107,6 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>
>  #else /* CONFIG_X86_64 */
>
> -       /* Set base address in stack canary descriptor. */
> -       movl $_pa(gdt_start),%eax
> -       movl $_pa(canary),%ecx
> -       movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
> -       shrl $16, %ecx
> -       movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
> -       movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
> -
> -       mov $PVH_CANARY_SEL,%eax
> -       mov %eax,%gs
> -
>         call mk_early_pgtbl_32
>
>         mov $_pa(initial_page_table), %eax
> @@ -163,7 +150,6 @@ SYM_DATA_START_LOCAL(gdt_start)
>         .quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
>  #endif
>         .quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
> -       .quad GDT_ENTRY(0x4090, 0, 0x18)    /* PVH_CANARY_SEL */
>  SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
>
>         .balign 16
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index db1378c6ff26..ef4329d67a5f 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -99,11 +99,8 @@ static void __save_processor_state(struct saved_context *ctxt)
>         /*
>          * segment registers
>          */
> -#ifdef CONFIG_X86_32_LAZY_GS
>         savesegment(gs, ctxt->gs);
> -#endif
>  #ifdef CONFIG_X86_64
> -       savesegment(gs, ctxt->gs);
>         savesegment(fs, ctxt->fs);
>         savesegment(ds, ctxt->ds);
>         savesegment(es, ctxt->es);
> @@ -232,7 +229,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>         wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
>  #else
>         loadsegment(fs, __KERNEL_PERCPU);
> -       loadsegment(gs, __KERNEL_STACK_CANARY);
>  #endif
>
>         /* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
> @@ -255,7 +251,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>          */
>         wrmsrl(MSR_FS_BASE, ctxt->fs_base);
>         wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
> -#elif defined(CONFIG_X86_32_LAZY_GS)
> +#else
>         loadsegment(gs, ctxt->gs);
>  #endif
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 9a5a50cdaab5..e18235a6390d 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1190,7 +1190,6 @@ static void __init xen_setup_gdt(int cpu)
>         pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
>         pv_ops.cpu.load_gdt = xen_load_gdt_boot;
>
> -       setup_stack_canary_segment(cpu);
>         switch_to_new_gdt(cpu);
>
>         pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
> diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
> index f5c119495254..825c75c5b715 100755
> --- a/scripts/gcc-x86_32-has-stack-protector.sh
> +++ b/scripts/gcc-x86_32-has-stack-protector.sh
> @@ -1,4 +1,8 @@
>  #!/bin/sh
>  # SPDX-License-Identifier: GPL-2.0
>
> -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
> +# This requires GCC 8.1 or better.  Specifically, we require
> +# -mstack-protector-guard-reg, added by
> +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
> +
> +echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/2] Clean up x86_32 stackprotector
  2021-02-13 19:19 [PATCH v2 0/2] Clean up x86_32 stackprotector Andy Lutomirski
                   ` (2 preceding siblings ...)
  2021-02-13 19:33 ` [PATCH v2 0/2] Clean up x86_32 stackprotector Sedat Dilek
@ 2021-02-14 14:42 ` Sedat Dilek
  2021-02-16 18:13 ` Nick Desaulniers
  4 siblings, 0 replies; 24+ messages in thread
From: Sedat Dilek @ 2021-02-14 14:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Nick Desaulniers, Sean Christopherson, Brian Gerst,
	Joerg Roedel

On Sat, Feb 13, 2021 at 8:19 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> x86_32 stackprotector is a maintenance nightmare.  Clean it up.  This
> disables stackprotector on x86_32 on GCC 8.1 and on all clang
> versions.  Some clang people are cc'd.
>
> Changes from v1:
>  - Changelog fixes.
>  - Comment fixes (mostly from Sean).
>  - Fix the !SMP case.
>
> Andy Lutomirski (2):
>   x86/stackprotector/32: Make the canary into a regular percpu variable
>   x86/entry/32: Remove leftover macros after stackprotector cleanups
>

Happy Valentine's Day,

I have no rose or pralines for you, Andy :-).

First, I tried to apply your patchset on top of <tip.git#autolatest>
which has shown some conflicts.
AFAICS it conflicts with <tip.git#x86/entry> (or more precisely with
the recent "Merge branch 'x86/paravirt' into x86/entry").
Just wanna let you know.

I have tested this on Debian/testing AMD64 (x86 64bit) with LLVM/Clang
v12.0.0-rc1 and my custom patchset.
Means compile-tested and booted into bare metal and checked build-log
and dmesg-log for warnings.
So far nothing scary.

Feel free to give credits like a Tested-by if you like.
( Roses and pralines are welcome, too, hahaha :-). )

Icecold and sunshiny Greetings from North-West Germany,
- Sedat -

>  arch/x86/Kconfig                          |  7 +-
>  arch/x86/Makefile                         |  8 ++
>  arch/x86/entry/entry_32.S                 | 95 +----------------------
>  arch/x86/include/asm/processor.h          | 15 +---
>  arch/x86/include/asm/ptrace.h             |  5 +-
>  arch/x86/include/asm/segment.h            | 30 ++-----
>  arch/x86/include/asm/stackprotector.h     | 79 ++++---------------
>  arch/x86/include/asm/suspend_32.h         |  6 +-
>  arch/x86/kernel/asm-offsets_32.c          |  5 --
>  arch/x86/kernel/cpu/common.c              |  5 +-
>  arch/x86/kernel/doublefault_32.c          |  4 +-
>  arch/x86/kernel/head_32.S                 | 18 +----
>  arch/x86/kernel/setup_percpu.c            |  1 -
>  arch/x86/kernel/tls.c                     |  8 +-
>  arch/x86/kvm/svm/svm.c                    | 10 +--
>  arch/x86/lib/insn-eval.c                  |  4 -
>  arch/x86/platform/pvh/head.S              | 14 ----
>  arch/x86/power/cpu.c                      |  6 +-
>  arch/x86/xen/enlighten_pv.c               |  1 -
>  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
>  20 files changed, 62 insertions(+), 265 deletions(-)
>
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
  2021-02-13 19:49   ` Sedat Dilek
@ 2021-02-16 16:21   ` Sean Christopherson
  2021-02-16 20:23     ` Sedat Dilek
  2021-02-16 18:45   ` Nick Desaulniers
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-02-16 16:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Sedat Dilek, Nick Desaulniers, Brian Gerst, Joerg Roedel

On Sat, Feb 13, 2021, Andy Lutomirski wrote:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f923e14e87df..ec39073b4897 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>  #ifdef CONFIG_X86_64
>  		loadsegment(fs, svm->host.fs);
>  		wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> -		load_gs_index(svm->host.gs);
> -#else
> -#ifdef CONFIG_X86_32_LAZY_GS
> -		loadsegment(gs, svm->host.gs);
> -#endif

This manually GS crud is gone as of commit e79b91bb3c91 ("KVM: SVM: use
vmsave/vmload for saving/restoring additional host state"), which is queued for
5.12.

>  #endif
> +		load_gs_index(svm->host.gs);
>  
>  		for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
>  			wrmsrl(host_save_user_msrs[i].index,
> @@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  	} else {
>  		__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
>  
> +		/* Restore the percpu segment immediately. */
>  #ifdef CONFIG_X86_64
>  		native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>  #else
>  		loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> -		loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
>  	}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/2] Clean up x86_32 stackprotector
  2021-02-13 19:19 [PATCH v2 0/2] Clean up x86_32 stackprotector Andy Lutomirski
                   ` (3 preceding siblings ...)
  2021-02-14 14:42 ` Sedat Dilek
@ 2021-02-16 18:13 ` Nick Desaulniers
  4 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2021-02-16 18:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, Sedat Dilek, Sean Christopherson, Brian Gerst,
	Joerg Roedel, Nathan Chancellor, clang-built-linux

On Sat, Feb 13, 2021 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> x86_32 stackprotector is a maintenance nightmare.  Clean it up.  This
> disables stackprotector on x86_32 on GCC 8.1 and on all clang
> versions.  Some clang people are cc'd.

When in doubt, check the MAINTAINERS entry for LLVM for our mailing
list and listed maintainers.

>
> Changes from v1:
>  - Changelog fixes.
>  - Comment fixes (mostly from Sean).
>  - Fix the !SMP case.
>
> Andy Lutomirski (2):
>   x86/stackprotector/32: Make the canary into a regular percpu variable
>   x86/entry/32: Remove leftover macros after stackprotector cleanups
>
>  arch/x86/Kconfig                          |  7 +-
>  arch/x86/Makefile                         |  8 ++
>  arch/x86/entry/entry_32.S                 | 95 +----------------------
>  arch/x86/include/asm/processor.h          | 15 +---
>  arch/x86/include/asm/ptrace.h             |  5 +-
>  arch/x86/include/asm/segment.h            | 30 ++-----
>  arch/x86/include/asm/stackprotector.h     | 79 ++++---------------
>  arch/x86/include/asm/suspend_32.h         |  6 +-
>  arch/x86/kernel/asm-offsets_32.c          |  5 --
>  arch/x86/kernel/cpu/common.c              |  5 +-
>  arch/x86/kernel/doublefault_32.c          |  4 +-
>  arch/x86/kernel/head_32.S                 | 18 +----
>  arch/x86/kernel/setup_percpu.c            |  1 -
>  arch/x86/kernel/tls.c                     |  8 +-
>  arch/x86/kvm/svm/svm.c                    | 10 +--
>  arch/x86/lib/insn-eval.c                  |  4 -
>  arch/x86/platform/pvh/head.S              | 14 ----
>  arch/x86/power/cpu.c                      |  6 +-
>  arch/x86/xen/enlighten_pv.c               |  1 -
>  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
>  20 files changed, 62 insertions(+), 265 deletions(-)
>
> --
> 2.29.2
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
  2021-02-13 19:49   ` Sedat Dilek
  2021-02-16 16:21   ` Sean Christopherson
@ 2021-02-16 18:45   ` Nick Desaulniers
  2021-02-16 20:29     ` Sedat Dilek
  2021-03-08 13:14   ` [tip: x86/core] " tip-bot2 for Andy Lutomirski
  2022-09-29 13:56   ` [PATCH v2 1/2] " Andy Shevchenko
  4 siblings, 1 reply; 24+ messages in thread
From: Nick Desaulniers @ 2021-02-16 18:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, Sedat Dilek, Sean Christopherson, Brian Gerst,
	Joerg Roedel, clang-built-linux, Nathan Chancellor

On Sat, Feb 13, 2021 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> percpu storage.  It's even nastier because it means that whether %gs
> contains userspace state or kernel state while running kernel code
> depends on whether stackprotector is enabled (this is
> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> that segment selectors work.  Supporting both variants is a
> maintenance and testing mess.
>
> Merely rearranging so that percpu and the stack canary
> share the same segment would be messy as the 32-bit percpu address
> layout isn't currently compatible with putting a variable at a fixed
> offset.
>
> Fortunately, GCC 8.1 added options that allow the stack canary to be
> accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
> percpu variable.  This lets us get rid of all of the code to manage the
> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>
> (That name is special.  We could use any symbol we want for the
>  %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
>  name other than __stack_chk_guard.)
>
> This patch forcibly disables stackprotector on older compilers that
> don't support the new options and makes the stack canary into a
> percpu variable.  The "lazy GS" approach is now used for all 32-bit
> configurations.
>
> This patch also makes load_gs_index() work on 32-bit kernels.  On
> 64-bit kernels, it loads the GS selector and updates the user
> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> it loads the GS selector and updates GSBASE, which is now
> always the user base.  This means that the overall effect is
> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>
> Cc: Sedat Dilek <sedat.dilek@gmail.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/Kconfig                          |  7 +-
>  arch/x86/Makefile                         |  8 +++
>  arch/x86/entry/entry_32.S                 | 56 ++--------------
>  arch/x86/include/asm/processor.h          | 15 ++---
>  arch/x86/include/asm/ptrace.h             |  5 +-
>  arch/x86/include/asm/segment.h            | 30 +++------
>  arch/x86/include/asm/stackprotector.h     | 79 +++++------------------
>  arch/x86/include/asm/suspend_32.h         |  6 +-
>  arch/x86/kernel/asm-offsets_32.c          |  5 --
>  arch/x86/kernel/cpu/common.c              |  5 +-
>  arch/x86/kernel/doublefault_32.c          |  4 +-
>  arch/x86/kernel/head_32.S                 | 18 +-----
>  arch/x86/kernel/setup_percpu.c            |  1 -
>  arch/x86/kernel/tls.c                     |  8 +--
>  arch/x86/kvm/svm/svm.c                    | 10 +--
>  arch/x86/lib/insn-eval.c                  |  4 --
>  arch/x86/platform/pvh/head.S              | 14 ----
>  arch/x86/power/cpu.c                      |  6 +-
>  arch/x86/xen/enlighten_pv.c               |  1 -
>  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
>  20 files changed, 62 insertions(+), 226 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..12d8bf011d08 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -353,10 +353,6 @@ config X86_64_SMP
>         def_bool y
>         depends on X86_64 && SMP
>
> -config X86_32_LAZY_GS
> -       def_bool y
> -       depends on X86_32 && !STACKPROTECTOR
> -
>  config ARCH_SUPPORTS_UPROBES
>         def_bool y
>
> @@ -379,7 +375,8 @@ config CC_HAS_SANE_STACKPROTECTOR
>         default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
>         help
>            We have to make sure stack protector is unconditionally disabled if
> -          the compiler produces broken code.
> +          the compiler produces broken code or if it does not let us control
> +          the segment on 32-bit kernels.
>
>  menu "Processor type and features"
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 7116da3980be..0b5cd8c49ccb 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -76,6 +76,14 @@ ifeq ($(CONFIG_X86_32),y)
>
>          # temporary until string.h is fixed
>          KBUILD_CFLAGS += -ffreestanding
> +
> +       ifeq ($(CONFIG_STACKPROTECTOR),y)
> +               ifeq ($(CONFIG_SMP),y)
> +                       KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard

I'm guessing the CC is because this removes stack protector support
with clang, because it does not yet support
-mstack-protector-guard-symbol= (as Sedat notes)?  I would like to see
this called out explicitly in the commit message.

(If folks are looking to use various compiler features/flags and find
support missing, please let us know ASAP as it helps give us more time
to triage and plan work around existing schedules.  Our ML from
MAINTAINERS <clang-built-linux@googlegroups.com> or
linux-toolchains@vger.kernel.org works.)

For now I've filed:
https://bugs.llvm.org/show_bug.cgi?id=49209

> +               else
> +                       KBUILD_CFLAGS += -mstack-protector-guard=global
> +               endif
> +       endif
>  else
>          BITS := 64
>          UTS_MACHINE := x86_64
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index df8c017e6161..eb0cb662bca5 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -20,7 +20,7 @@
>   *     1C(%esp) - %ds
>   *     20(%esp) - %es
>   *     24(%esp) - %fs
> - *     28(%esp) - %gs          saved iff !CONFIG_X86_32_LAZY_GS
> + *     28(%esp) - unused -- was %gs on old stackprotector kernels
>   *     2C(%esp) - orig_eax
>   *     30(%esp) - %eip
>   *     34(%esp) - %cs
> @@ -56,14 +56,9 @@
>  /*
>   * User gs save/restore
>   *
> - * %gs is used for userland TLS and kernel only uses it for stack
> - * canary which is required to be at %gs:20 by gcc.  Read the comment
> - * at the top of stackprotector.h for more info.
> - *
> - * Local labels 98 and 99 are used.
> + * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
> + * will remove it entirely.
>   */
> -#ifdef CONFIG_X86_32_LAZY_GS
> -
>   /* unfortunately push/pop can't be no-op */
>  .macro PUSH_GS
>         pushl   $0
> @@ -86,49 +81,6 @@
>  .macro SET_KERNEL_GS reg
>  .endm
>
> -#else  /* CONFIG_X86_32_LAZY_GS */
> -
> -.macro PUSH_GS
> -       pushl   %gs
> -.endm
> -
> -.macro POP_GS pop=0
> -98:    popl    %gs
> -  .if \pop <> 0
> -       add     $\pop, %esp
> -  .endif
> -.endm
> -.macro POP_GS_EX
> -.pushsection .fixup, "ax"
> -99:    movl    $0, (%esp)
> -       jmp     98b
> -.popsection
> -       _ASM_EXTABLE(98b, 99b)
> -.endm
> -
> -.macro PTGS_TO_GS
> -98:    mov     PT_GS(%esp), %gs
> -.endm
> -.macro PTGS_TO_GS_EX
> -.pushsection .fixup, "ax"
> -99:    movl    $0, PT_GS(%esp)
> -       jmp     98b
> -.popsection
> -       _ASM_EXTABLE(98b, 99b)
> -.endm
> -
> -.macro GS_TO_REG reg
> -       movl    %gs, \reg
> -.endm
> -.macro REG_TO_PTGS reg
> -       movl    \reg, PT_GS(%esp)
> -.endm
> -.macro SET_KERNEL_GS reg
> -       movl    $(__KERNEL_STACK_CANARY), \reg
> -       movl    \reg, %gs
> -.endm
> -
> -#endif /* CONFIG_X86_32_LAZY_GS */
>
>  /* Unconditionally switch to user cr3 */
>  .macro SWITCH_TO_USER_CR3 scratch_reg:req
> @@ -779,7 +731,7 @@ SYM_CODE_START(__switch_to_asm)
>
>  #ifdef CONFIG_STACKPROTECTOR
>         movl    TASK_stack_canary(%edx), %ebx
> -       movl    %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
> +       movl    %ebx, PER_CPU_VAR(__stack_chk_guard)
>  #endif
>
>  #ifdef CONFIG_RETPOLINE
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index c20a52b5534b..c59dff4bbc38 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -441,6 +441,9 @@ struct fixed_percpu_data {
>          * GCC hardcodes the stack canary as %gs:40.  Since the
>          * irq_stack is the object at %gs:0, we reserve the bottom
>          * 48 bytes of the irq stack for the canary.
> +        *
> +        * Once we are willing to require -mstack-protector-guard-symbol=
> +        * support for x86_64 stackprotector, we can get rid of this.
>          */
>         char            gs_base[40];
>         unsigned long   stack_canary;
> @@ -461,17 +464,7 @@ extern asmlinkage void ignore_sysret(void);
>  void current_save_fsgs(void);
>  #else  /* X86_64 */
>  #ifdef CONFIG_STACKPROTECTOR
> -/*
> - * Make sure stack canary segment base is cached-aligned:
> - *   "For Intel Atom processors, avoid non zero segment base address
> - *    that is not aligned to cache line boundary at all cost."
> - * (Optim Ref Manual Assembly/Compiler Coding Rule 15.)
> - */
> -struct stack_canary {
> -       char __pad[20];         /* canary at %gs:20 */
> -       unsigned long canary;
> -};
> -DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> +DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
>  #endif
>  /* Per CPU softirq stack pointer */
>  DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index d8324a236696..b2c4c12d237c 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -37,7 +37,10 @@ struct pt_regs {
>         unsigned short __esh;
>         unsigned short fs;
>         unsigned short __fsh;
> -       /* On interrupt, gs and __gsh store the vector number. */
> +       /*
> +        * On interrupt, gs and __gsh store the vector number.  They never
> +        * store gs any more.
> +        */
>         unsigned short gs;
>         unsigned short __gsh;
>         /* On interrupt, this is the error code. */
> diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
> index 7fdd4facfce7..72044026eb3c 100644
> --- a/arch/x86/include/asm/segment.h
> +++ b/arch/x86/include/asm/segment.h
> @@ -95,7 +95,7 @@
>   *
>   *  26 - ESPFIX small SS
>   *  27 - per-cpu                       [ offset to per-cpu data area ]
> - *  28 - stack_canary-20               [ for stack protector ]         <=== cacheline #8
> + *  28 - unused
>   *  29 - unused
>   *  30 - unused
>   *  31 - TSS for double fault handler
> @@ -118,7 +118,6 @@
>
>  #define GDT_ENTRY_ESPFIX_SS            26
>  #define GDT_ENTRY_PERCPU               27
> -#define GDT_ENTRY_STACK_CANARY         28
>
>  #define GDT_ENTRY_DOUBLEFAULT_TSS      31
>
> @@ -158,12 +157,6 @@
>  # define __KERNEL_PERCPU               0
>  #endif
>
> -#ifdef CONFIG_STACKPROTECTOR
> -# define __KERNEL_STACK_CANARY         (GDT_ENTRY_STACK_CANARY*8)
> -#else
> -# define __KERNEL_STACK_CANARY         0
> -#endif
> -
>  #else /* 64-bit: */
>
>  #include <asm/cache.h>
> @@ -364,22 +357,15 @@ static inline void __loadsegment_fs(unsigned short value)
>         asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
>
>  /*
> - * x86-32 user GS accessors:
> + * x86-32 user GS accessors.  This is ugly and could do with some cleaning up.
>   */
>  #ifdef CONFIG_X86_32
> -# ifdef CONFIG_X86_32_LAZY_GS
> -#  define get_user_gs(regs)            (u16)({ unsigned long v; savesegment(gs, v); v; })
> -#  define set_user_gs(regs, v)         loadsegment(gs, (unsigned long)(v))
> -#  define task_user_gs(tsk)            ((tsk)->thread.gs)
> -#  define lazy_save_gs(v)              savesegment(gs, (v))
> -#  define lazy_load_gs(v)              loadsegment(gs, (v))
> -# else /* X86_32_LAZY_GS */
> -#  define get_user_gs(regs)            (u16)((regs)->gs)
> -#  define set_user_gs(regs, v)         do { (regs)->gs = (v); } while (0)
> -#  define task_user_gs(tsk)            (task_pt_regs(tsk)->gs)
> -#  define lazy_save_gs(v)              do { } while (0)
> -#  define lazy_load_gs(v)              do { } while (0)
> -# endif        /* X86_32_LAZY_GS */
> +# define get_user_gs(regs)             (u16)({ unsigned long v; savesegment(gs, v); v; })
> +# define set_user_gs(regs, v)          loadsegment(gs, (unsigned long)(v))
> +# define task_user_gs(tsk)             ((tsk)->thread.gs)
> +# define lazy_save_gs(v)               savesegment(gs, (v))
> +# define lazy_load_gs(v)               loadsegment(gs, (v))
> +# define load_gs_index(v)              loadsegment(gs, (v))
>  #endif /* X86_32 */
>
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> index 7fb482f0f25b..b6ffe58c70fa 100644
> --- a/arch/x86/include/asm/stackprotector.h
> +++ b/arch/x86/include/asm/stackprotector.h
> @@ -5,30 +5,23 @@
>   * Stack protector works by putting predefined pattern at the start of
>   * the stack frame and verifying that it hasn't been overwritten when
>   * returning from the function.  The pattern is called stack canary
> - * and unfortunately gcc requires it to be at a fixed offset from %gs.
> - * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
> - * and x86_32 use segment registers differently and thus handles this
> - * requirement differently.
> + * and unfortunately gcc historically required it to be at a fixed offset
> + * from the percpu segment base.  On x86_64, the offset is 40 bytes.
>   *
> - * On x86_64, %gs is shared by percpu area and stack canary.  All
> - * percpu symbols are zero based and %gs points to the base of percpu
> - * area.  The first occupant of the percpu area is always
> - * fixed_percpu_data which contains stack_canary at offset 40.  Userland
> - * %gs is always saved and restored on kernel entry and exit using
> - * swapgs, so stack protector doesn't add any complexity there.
> + * The same segment is shared by percpu area and stack canary.  On
> + * x86_64, percpu symbols are zero based and %gs (64-bit) points to the
> + * base of percpu area.  The first occupant of the percpu area is always
> + * fixed_percpu_data which contains stack_canary at the approproate
> + * offset.  On x86_32, the stack canary is just a regular percpu
> + * variable.
>   *
> - * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
> - * used for userland TLS.  Unfortunately, some processors are much
> - * slower at loading segment registers with different value when
> - * entering and leaving the kernel, so the kernel uses %fs for percpu
> - * area and manages %gs lazily so that %gs is switched only when
> - * necessary, usually during task switch.
> + * Putting percpu data in %fs on 32-bit is a minor optimization compared to
> + * using %gs.  Since 32-bit userspace normally has %fs == 0, we are likely
> + * to load 0 into %fs on exit to usermode, whereas with percpu data in
> + * %gs, we are likely to load a non-null %gs on return to user mode.
>   *
> - * As gcc requires the stack canary at %gs:20, %gs can't be managed
> - * lazily if stack protector is enabled, so the kernel saves and
> - * restores userland %gs on kernel entry and exit.  This behavior is
> - * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
> - * system.h to hide the details.
> + * Once we are willing to require GCC 8.1 or better for 64-bit stackprotector
> + * support, we can remove some of this complexity.
>   */
>
>  #ifndef _ASM_STACKPROTECTOR_H
> @@ -44,14 +37,6 @@
>  #include <linux/random.h>
>  #include <linux/sched.h>
>
> -/*
> - * 24 byte read-only segment initializer for stack canary.  Linker
> - * can't handle the address bit shifting.  Address will be set in
> - * head_32 for boot CPU and setup_per_cpu_areas() for others.
> - */
> -#define GDT_STACK_CANARY_INIT                                          \
> -       [GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
> -
>  /*
>   * Initialize the stackprotector canary value.
>   *
> @@ -86,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
>  #ifdef CONFIG_X86_64
>         this_cpu_write(fixed_percpu_data.stack_canary, canary);
>  #else
> -       this_cpu_write(stack_canary.canary, canary);
> +       this_cpu_write(__stack_chk_guard, canary);
>  #endif
>  }
>
> @@ -95,48 +80,16 @@ static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
>  #ifdef CONFIG_X86_64
>         per_cpu(fixed_percpu_data.stack_canary, cpu) = idle->stack_canary;
>  #else
> -       per_cpu(stack_canary.canary, cpu) = idle->stack_canary;
> -#endif
> -}
> -
> -static inline void setup_stack_canary_segment(int cpu)
> -{
> -#ifdef CONFIG_X86_32
> -       unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
> -       struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
> -       struct desc_struct desc;
> -
> -       desc = gdt_table[GDT_ENTRY_STACK_CANARY];
> -       set_desc_base(&desc, canary);
> -       write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
> -#endif
> -}
> -
> -static inline void load_stack_canary_segment(void)
> -{
> -#ifdef CONFIG_X86_32
> -       asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
> +       per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
>  #endif
>  }
>
>  #else  /* STACKPROTECTOR */
>
> -#define GDT_STACK_CANARY_INIT
> -
>  /* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
>
> -static inline void setup_stack_canary_segment(int cpu)
> -{ }
> -
>  static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
>  { }
>
> -static inline void load_stack_canary_segment(void)
> -{
> -#ifdef CONFIG_X86_32
> -       asm volatile ("mov %0, %%gs" : : "r" (0));
> -#endif
> -}
> -
>  #endif /* STACKPROTECTOR */
>  #endif /* _ASM_STACKPROTECTOR_H */
> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> index fdbd9d7b7bca..7b132d0312eb 100644
> --- a/arch/x86/include/asm/suspend_32.h
> +++ b/arch/x86/include/asm/suspend_32.h
> @@ -13,12 +13,10 @@
>  /* image of the saved processor state */
>  struct saved_context {
>         /*
> -        * On x86_32, all segment registers, with the possible exception of
> -        * gs, are saved at kernel entry in pt_regs.
> +        * On x86_32, all segment registers except gs are saved at kernel
> +        * entry in pt_regs.
>          */
> -#ifdef CONFIG_X86_32_LAZY_GS
>         u16 gs;
> -#endif
>         unsigned long cr0, cr2, cr3, cr4;
>         u64 misc_enable;
>         bool misc_enable_saved;
> diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
> index 6e043f295a60..2b411cd00a4e 100644
> --- a/arch/x86/kernel/asm-offsets_32.c
> +++ b/arch/x86/kernel/asm-offsets_32.c
> @@ -53,11 +53,6 @@ void foo(void)
>                offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
>                offsetofend(struct cpu_entry_area, entry_stack_page.stack));
>
> -#ifdef CONFIG_STACKPROTECTOR
> -       BLANK();
> -       OFFSET(stack_canary_offset, stack_canary, canary);
> -#endif
> -
>         BLANK();
>         DEFINE(EFI_svam, offsetof(efi_runtime_services_t, set_virtual_address_map));
>  }
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 35ad8480c464..f208569d2d3b 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -161,7 +161,6 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
>
>         [GDT_ENTRY_ESPFIX_SS]           = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
>         [GDT_ENTRY_PERCPU]              = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
> -       GDT_STACK_CANARY_INIT
>  #endif
>  } };
>  EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
> @@ -599,7 +598,6 @@ void load_percpu_segment(int cpu)
>         __loadsegment_simple(gs, 0);
>         wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>  #endif
> -       load_stack_canary_segment();
>  }
>
>  #ifdef CONFIG_X86_32
> @@ -1793,7 +1791,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
>  EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
>
>  #ifdef CONFIG_STACKPROTECTOR
> -DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> +DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
> +EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
>  #endif
>
>  #endif /* CONFIG_X86_64 */
> diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
> index 759d392cbe9f..d1d49e3d536b 100644
> --- a/arch/x86/kernel/doublefault_32.c
> +++ b/arch/x86/kernel/doublefault_32.c
> @@ -100,9 +100,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
>                 .ss             = __KERNEL_DS,
>                 .ds             = __USER_DS,
>                 .fs             = __KERNEL_PERCPU,
> -#ifndef CONFIG_X86_32_LAZY_GS
> -               .gs             = __KERNEL_STACK_CANARY,
> -#endif
> +               .gs             = 0,
>
>                 .__cr3          = __pa_nodebug(swapper_pg_dir),
>         },
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 7ed84c282233..67f590425d90 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -318,8 +318,8 @@ SYM_FUNC_START(startup_32_smp)
>         movl $(__KERNEL_PERCPU), %eax
>         movl %eax,%fs                   # set this cpu's percpu
>
> -       movl $(__KERNEL_STACK_CANARY),%eax
> -       movl %eax,%gs
> +       xorl %eax,%eax
> +       movl %eax,%gs                   # clear possible garbage in %gs
>
>         xorl %eax,%eax                  # Clear LDT
>         lldt %ax
> @@ -339,20 +339,6 @@ SYM_FUNC_END(startup_32_smp)
>   */
>  __INIT
>  setup_once:
> -#ifdef CONFIG_STACKPROTECTOR
> -       /*
> -        * Configure the stack canary. The linker can't handle this by
> -        * relocation.  Manually set base address in stack canary
> -        * segment descriptor.
> -        */
> -       movl $gdt_page,%eax
> -       movl $stack_canary,%ecx
> -       movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
> -       shrl $16, %ecx
> -       movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
> -       movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
> -#endif
> -
>         andl $0,setup_once_ref  /* Once is enough, thanks */
>         ret
>
> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
> index fd945ce78554..0941d2f44f2a 100644
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -224,7 +224,6 @@ void __init setup_per_cpu_areas(void)
>                 per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
>                 per_cpu(cpu_number, cpu) = cpu;
>                 setup_percpu_segment(cpu);
> -               setup_stack_canary_segment(cpu);
>                 /*
>                  * Copy data used in early init routines from the
>                  * initial arrays to the per cpu data areas.  These
> diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
> index 64a496a0687f..3c883e064242 100644
> --- a/arch/x86/kernel/tls.c
> +++ b/arch/x86/kernel/tls.c
> @@ -164,17 +164,11 @@ int do_set_thread_area(struct task_struct *p, int idx,
>                 savesegment(fs, sel);
>                 if (sel == modified_sel)
>                         loadsegment(fs, sel);
> -
> -               savesegment(gs, sel);
> -               if (sel == modified_sel)
> -                       load_gs_index(sel);
>  #endif
>
> -#ifdef CONFIG_X86_32_LAZY_GS
>                 savesegment(gs, sel);
>                 if (sel == modified_sel)
> -                       loadsegment(gs, sel);
> -#endif
> +                       load_gs_index(sel);
>         } else {
>  #ifdef CONFIG_X86_64
>                 if (p->thread.fsindex == modified_sel)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f923e14e87df..ec39073b4897 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>  #ifdef CONFIG_X86_64
>                 loadsegment(fs, svm->host.fs);
>                 wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> -               load_gs_index(svm->host.gs);
> -#else
> -#ifdef CONFIG_X86_32_LAZY_GS
> -               loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
> +               load_gs_index(svm->host.gs);
>
>                 for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
>                         wrmsrl(host_save_user_msrs[i].index,
> @@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>         } else {
>                 __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
>
> +               /* Restore the percpu segment immediately. */
>  #ifdef CONFIG_X86_64
>                 native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>  #else
>                 loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> -               loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
>         }
>
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 4229950a5d78..7f89a091f1fb 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -404,10 +404,6 @@ static short get_segment_selector(struct pt_regs *regs, int seg_reg_idx)
>         case INAT_SEG_REG_FS:
>                 return (unsigned short)(regs->fs & 0xffff);
>         case INAT_SEG_REG_GS:
> -               /*
> -                * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
> -                * The macro below takes care of both cases.
> -                */
>                 return get_user_gs(regs);
>         case INAT_SEG_REG_IGNORE:
>         default:
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index 43b4d864817e..afbf0bb252da 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -45,10 +45,8 @@
>
>  #define PVH_GDT_ENTRY_CS       1
>  #define PVH_GDT_ENTRY_DS       2
> -#define PVH_GDT_ENTRY_CANARY   3
>  #define PVH_CS_SEL             (PVH_GDT_ENTRY_CS * 8)
>  #define PVH_DS_SEL             (PVH_GDT_ENTRY_DS * 8)
> -#define PVH_CANARY_SEL         (PVH_GDT_ENTRY_CANARY * 8)
>
>  SYM_CODE_START_LOCAL(pvh_start_xen)
>         cld
> @@ -109,17 +107,6 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>
>  #else /* CONFIG_X86_64 */
>
> -       /* Set base address in stack canary descriptor. */
> -       movl $_pa(gdt_start),%eax
> -       movl $_pa(canary),%ecx
> -       movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
> -       shrl $16, %ecx
> -       movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
> -       movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
> -
> -       mov $PVH_CANARY_SEL,%eax
> -       mov %eax,%gs
> -
>         call mk_early_pgtbl_32
>
>         mov $_pa(initial_page_table), %eax
> @@ -163,7 +150,6 @@ SYM_DATA_START_LOCAL(gdt_start)
>         .quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
>  #endif
>         .quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
> -       .quad GDT_ENTRY(0x4090, 0, 0x18)    /* PVH_CANARY_SEL */
>  SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
>
>         .balign 16
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index db1378c6ff26..ef4329d67a5f 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -99,11 +99,8 @@ static void __save_processor_state(struct saved_context *ctxt)
>         /*
>          * segment registers
>          */
> -#ifdef CONFIG_X86_32_LAZY_GS
>         savesegment(gs, ctxt->gs);
> -#endif
>  #ifdef CONFIG_X86_64
> -       savesegment(gs, ctxt->gs);
>         savesegment(fs, ctxt->fs);
>         savesegment(ds, ctxt->ds);
>         savesegment(es, ctxt->es);
> @@ -232,7 +229,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>         wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
>  #else
>         loadsegment(fs, __KERNEL_PERCPU);
> -       loadsegment(gs, __KERNEL_STACK_CANARY);
>  #endif
>
>         /* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
> @@ -255,7 +251,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>          */
>         wrmsrl(MSR_FS_BASE, ctxt->fs_base);
>         wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
> -#elif defined(CONFIG_X86_32_LAZY_GS)
> +#else
>         loadsegment(gs, ctxt->gs);
>  #endif
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 9a5a50cdaab5..e18235a6390d 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1190,7 +1190,6 @@ static void __init xen_setup_gdt(int cpu)
>         pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
>         pv_ops.cpu.load_gdt = xen_load_gdt_boot;
>
> -       setup_stack_canary_segment(cpu);
>         switch_to_new_gdt(cpu);
>
>         pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
> diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
> index f5c119495254..825c75c5b715 100755
> --- a/scripts/gcc-x86_32-has-stack-protector.sh
> +++ b/scripts/gcc-x86_32-has-stack-protector.sh
> @@ -1,4 +1,8 @@
>  #!/bin/sh
>  # SPDX-License-Identifier: GPL-2.0
>
> -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
> +# This requires GCC 8.1 or better.  Specifically, we require
> +# -mstack-protector-guard-reg, added by
> +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708

And -mstack-protector-guard-symbol= it would seem, since that's what
will cause this to fail with Clang?

What about -mstack-protector-guard=global which can be used for non
SMP?  I'm guessing that came out in an earlier version of GCC than
-mstack-protector-guard-reg, so it doesn't need to be checked?  FWIW,
some of these were added to Clang in https://reviews.llvm.org/D88631.

> +
> +echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
> --
> 2.29.2
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-16 16:21   ` Sean Christopherson
@ 2021-02-16 20:23     ` Sedat Dilek
  0 siblings, 0 replies; 24+ messages in thread
From: Sedat Dilek @ 2021-02-16 20:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, x86, LKML, Nick Desaulniers, Brian Gerst, Joerg Roedel

On Tue, Feb 16, 2021 at 5:21 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sat, Feb 13, 2021, Andy Lutomirski wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index f923e14e87df..ec39073b4897 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> >  #ifdef CONFIG_X86_64
> >               loadsegment(fs, svm->host.fs);
> >               wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> > -             load_gs_index(svm->host.gs);
> > -#else
> > -#ifdef CONFIG_X86_32_LAZY_GS
> > -             loadsegment(gs, svm->host.gs);
> > -#endif
>
> This manually GS crud is gone as of commit e79b91bb3c91 ("KVM: SVM: use
> vmsave/vmload for saving/restoring additional host state"), which is queued for
> 5.12.
>

Link to the above KVM patch see [1].

As said the base for this patchset should be changed - for example it
conflicts with [2].

Maybe wait for Linux v5.12-rc1?

- Sedat -

[1] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?h=tags/kvm-5.12-1&id=e79b91bb3c916a52ce823ab60489c717c925c49f
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/tag/?h=x86-entry-2021-02-15

> >  #endif
> > +             load_gs_index(svm->host.gs);
> >
> >               for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> >                       wrmsrl(host_save_user_msrs[i].index,
> > @@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> >       } else {
> >               __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
> >
> > +             /* Restore the percpu segment immediately. */
> >  #ifdef CONFIG_X86_64
> >               native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> >  #else
> >               loadsegment(fs, svm->host.fs);
> > -#ifndef CONFIG_X86_32_LAZY_GS
> > -             loadsegment(gs, svm->host.gs);
> > -#endif
> >  #endif
> >       }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-16 18:45   ` Nick Desaulniers
@ 2021-02-16 20:29     ` Sedat Dilek
  0 siblings, 0 replies; 24+ messages in thread
From: Sedat Dilek @ 2021-02-16 20:29 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Andy Lutomirski, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, Sean Christopherson, Brian Gerst, Joerg Roedel,
	clang-built-linux, Nathan Chancellor

On Tue, Feb 16, 2021 at 7:45 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Sat, Feb 13, 2021 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> > stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> > percpu storage.  It's even nastier because it means that whether %gs
> > contains userspace state or kernel state while running kernel code
> > depends on whether stackprotector is enabled (this is
> > CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> > that segment selectors work.  Supporting both variants is a
> > maintenance and testing mess.
> >
> > Merely rearranging so that percpu and the stack canary
> > share the same segment would be messy as the 32-bit percpu address
> > layout isn't currently compatible with putting a variable at a fixed
> > offset.
> >
> > Fortunately, GCC 8.1 added options that allow the stack canary to be
> > accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
> > percpu variable.  This lets us get rid of all of the code to manage the
> > stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> >
> > (That name is special.  We could use any symbol we want for the
> >  %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
> >  name other than __stack_chk_guard.)
> >
> > This patch forcibly disables stackprotector on older compilers that
> > don't support the new options and makes the stack canary into a
> > percpu variable.  The "lazy GS" approach is now used for all 32-bit
> > configurations.
> >
> > This patch also makes load_gs_index() work on 32-bit kernels.  On
> > 64-bit kernels, it loads the GS selector and updates the user
> > GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> > it loads the GS selector and updates GSBASE, which is now
> > always the user base.  This means that the overall effect is
> > the same on 32-bit and 64-bit, which avoids some ifdeffery.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > ---
> >  arch/x86/Kconfig                          |  7 +-
> >  arch/x86/Makefile                         |  8 +++
> >  arch/x86/entry/entry_32.S                 | 56 ++--------------
> >  arch/x86/include/asm/processor.h          | 15 ++---
> >  arch/x86/include/asm/ptrace.h             |  5 +-
> >  arch/x86/include/asm/segment.h            | 30 +++------
> >  arch/x86/include/asm/stackprotector.h     | 79 +++++------------------
> >  arch/x86/include/asm/suspend_32.h         |  6 +-
> >  arch/x86/kernel/asm-offsets_32.c          |  5 --
> >  arch/x86/kernel/cpu/common.c              |  5 +-
> >  arch/x86/kernel/doublefault_32.c          |  4 +-
> >  arch/x86/kernel/head_32.S                 | 18 +-----
> >  arch/x86/kernel/setup_percpu.c            |  1 -
> >  arch/x86/kernel/tls.c                     |  8 +--
> >  arch/x86/kvm/svm/svm.c                    | 10 +--
> >  arch/x86/lib/insn-eval.c                  |  4 --
> >  arch/x86/platform/pvh/head.S              | 14 ----
> >  arch/x86/power/cpu.c                      |  6 +-
> >  arch/x86/xen/enlighten_pv.c               |  1 -
> >  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
> >  20 files changed, 62 insertions(+), 226 deletions(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff0..12d8bf011d08 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -353,10 +353,6 @@ config X86_64_SMP
> >         def_bool y
> >         depends on X86_64 && SMP
> >
> > -config X86_32_LAZY_GS
> > -       def_bool y
> > -       depends on X86_32 && !STACKPROTECTOR
> > -
> >  config ARCH_SUPPORTS_UPROBES
> >         def_bool y
> >
> > @@ -379,7 +375,8 @@ config CC_HAS_SANE_STACKPROTECTOR
> >         default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
> >         help
> >            We have to make sure stack protector is unconditionally disabled if
> > -          the compiler produces broken code.
> > +          the compiler produces broken code or if it does not let us control
> > +          the segment on 32-bit kernels.
> >
> >  menu "Processor type and features"
> >
> > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > index 7116da3980be..0b5cd8c49ccb 100644
> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -76,6 +76,14 @@ ifeq ($(CONFIG_X86_32),y)
> >
> >          # temporary until string.h is fixed
> >          KBUILD_CFLAGS += -ffreestanding
> > +
> > +       ifeq ($(CONFIG_STACKPROTECTOR),y)
> > +               ifeq ($(CONFIG_SMP),y)
> > +                       KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
>
> I'm guessing the CC is because this removes stack protector support
> with clang, because it does not yet support
> -mstack-protector-guard-symbol= (as Sedat notes)?  I would like to see
> this called out explicitly in the commit message.
>
> (If folks are looking to use various compiler features/flags and find
> support missing, please let us know ASAP as it helps give us more time
> to triage and plan work around existing schedules.  Our ML from
> MAINTAINERS <clang-built-linux@googlegroups.com> or
> linux-toolchains@vger.kernel.org works.)
>
> For now I've filed:
> https://bugs.llvm.org/show_bug.cgi?id=49209
>

Thanks for the bug-report "Bug 49209 - [X86] support for
-mstack-protector-guard-symbol="

> > +               else
> > +                       KBUILD_CFLAGS += -mstack-protector-guard=global
> > +               endif
> > +       endif
> >  else
> >          BITS := 64
> >          UTS_MACHINE := x86_64
> > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > index df8c017e6161..eb0cb662bca5 100644
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -20,7 +20,7 @@
> >   *     1C(%esp) - %ds
> >   *     20(%esp) - %es
> >   *     24(%esp) - %fs
> > - *     28(%esp) - %gs          saved iff !CONFIG_X86_32_LAZY_GS
> > + *     28(%esp) - unused -- was %gs on old stackprotector kernels
> >   *     2C(%esp) - orig_eax
> >   *     30(%esp) - %eip
> >   *     34(%esp) - %cs
> > @@ -56,14 +56,9 @@
> >  /*
> >   * User gs save/restore
> >   *
> > - * %gs is used for userland TLS and kernel only uses it for stack
> > - * canary which is required to be at %gs:20 by gcc.  Read the comment
> > - * at the top of stackprotector.h for more info.
> > - *
> > - * Local labels 98 and 99 are used.
> > + * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
> > + * will remove it entirely.
> >   */
> > -#ifdef CONFIG_X86_32_LAZY_GS
> > -
> >   /* unfortunately push/pop can't be no-op */
> >  .macro PUSH_GS
> >         pushl   $0
> > @@ -86,49 +81,6 @@
> >  .macro SET_KERNEL_GS reg
> >  .endm
> >
> > -#else  /* CONFIG_X86_32_LAZY_GS */
> > -
> > -.macro PUSH_GS
> > -       pushl   %gs
> > -.endm
> > -
> > -.macro POP_GS pop=0
> > -98:    popl    %gs
> > -  .if \pop <> 0
> > -       add     $\pop, %esp
> > -  .endif
> > -.endm
> > -.macro POP_GS_EX
> > -.pushsection .fixup, "ax"
> > -99:    movl    $0, (%esp)
> > -       jmp     98b
> > -.popsection
> > -       _ASM_EXTABLE(98b, 99b)
> > -.endm
> > -
> > -.macro PTGS_TO_GS
> > -98:    mov     PT_GS(%esp), %gs
> > -.endm
> > -.macro PTGS_TO_GS_EX
> > -.pushsection .fixup, "ax"
> > -99:    movl    $0, PT_GS(%esp)
> > -       jmp     98b
> > -.popsection
> > -       _ASM_EXTABLE(98b, 99b)
> > -.endm
> > -
> > -.macro GS_TO_REG reg
> > -       movl    %gs, \reg
> > -.endm
> > -.macro REG_TO_PTGS reg
> > -       movl    \reg, PT_GS(%esp)
> > -.endm
> > -.macro SET_KERNEL_GS reg
> > -       movl    $(__KERNEL_STACK_CANARY), \reg
> > -       movl    \reg, %gs
> > -.endm
> > -
> > -#endif /* CONFIG_X86_32_LAZY_GS */
> >
> >  /* Unconditionally switch to user cr3 */
> >  .macro SWITCH_TO_USER_CR3 scratch_reg:req
> > @@ -779,7 +731,7 @@ SYM_CODE_START(__switch_to_asm)
> >
> >  #ifdef CONFIG_STACKPROTECTOR
> >         movl    TASK_stack_canary(%edx), %ebx
> > -       movl    %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
> > +       movl    %ebx, PER_CPU_VAR(__stack_chk_guard)
> >  #endif
> >
> >  #ifdef CONFIG_RETPOLINE
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index c20a52b5534b..c59dff4bbc38 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -441,6 +441,9 @@ struct fixed_percpu_data {
> >          * GCC hardcodes the stack canary as %gs:40.  Since the
> >          * irq_stack is the object at %gs:0, we reserve the bottom
> >          * 48 bytes of the irq stack for the canary.
> > +        *
> > +        * Once we are willing to require -mstack-protector-guard-symbol=
> > +        * support for x86_64 stackprotector, we can get rid of this.
> >          */
> >         char            gs_base[40];
> >         unsigned long   stack_canary;
> > @@ -461,17 +464,7 @@ extern asmlinkage void ignore_sysret(void);
> >  void current_save_fsgs(void);
> >  #else  /* X86_64 */
> >  #ifdef CONFIG_STACKPROTECTOR
> > -/*
> > - * Make sure stack canary segment base is cached-aligned:
> > - *   "For Intel Atom processors, avoid non zero segment base address
> > - *    that is not aligned to cache line boundary at all cost."
> > - * (Optim Ref Manual Assembly/Compiler Coding Rule 15.)
> > - */
> > -struct stack_canary {
> > -       char __pad[20];         /* canary at %gs:20 */
> > -       unsigned long canary;
> > -};
> > -DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> > +DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
> >  #endif
> >  /* Per CPU softirq stack pointer */
> >  DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
> > diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> > index d8324a236696..b2c4c12d237c 100644
> > --- a/arch/x86/include/asm/ptrace.h
> > +++ b/arch/x86/include/asm/ptrace.h
> > @@ -37,7 +37,10 @@ struct pt_regs {
> >         unsigned short __esh;
> >         unsigned short fs;
> >         unsigned short __fsh;
> > -       /* On interrupt, gs and __gsh store the vector number. */
> > +       /*
> > +        * On interrupt, gs and __gsh store the vector number.  They never
> > +        * store gs any more.
> > +        */
> >         unsigned short gs;
> >         unsigned short __gsh;
> >         /* On interrupt, this is the error code. */
> > diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
> > index 7fdd4facfce7..72044026eb3c 100644
> > --- a/arch/x86/include/asm/segment.h
> > +++ b/arch/x86/include/asm/segment.h
> > @@ -95,7 +95,7 @@
> >   *
> >   *  26 - ESPFIX small SS
> >   *  27 - per-cpu                       [ offset to per-cpu data area ]
> > - *  28 - stack_canary-20               [ for stack protector ]         <=== cacheline #8
> > + *  28 - unused
> >   *  29 - unused
> >   *  30 - unused
> >   *  31 - TSS for double fault handler
> > @@ -118,7 +118,6 @@
> >
> >  #define GDT_ENTRY_ESPFIX_SS            26
> >  #define GDT_ENTRY_PERCPU               27
> > -#define GDT_ENTRY_STACK_CANARY         28
> >
> >  #define GDT_ENTRY_DOUBLEFAULT_TSS      31
> >
> > @@ -158,12 +157,6 @@
> >  # define __KERNEL_PERCPU               0
> >  #endif
> >
> > -#ifdef CONFIG_STACKPROTECTOR
> > -# define __KERNEL_STACK_CANARY         (GDT_ENTRY_STACK_CANARY*8)
> > -#else
> > -# define __KERNEL_STACK_CANARY         0
> > -#endif
> > -
> >  #else /* 64-bit: */
> >
> >  #include <asm/cache.h>
> > @@ -364,22 +357,15 @@ static inline void __loadsegment_fs(unsigned short value)
> >         asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
> >
> >  /*
> > - * x86-32 user GS accessors:
> > + * x86-32 user GS accessors.  This is ugly and could do with some cleaning up.
> >   */
> >  #ifdef CONFIG_X86_32
> > -# ifdef CONFIG_X86_32_LAZY_GS
> > -#  define get_user_gs(regs)            (u16)({ unsigned long v; savesegment(gs, v); v; })
> > -#  define set_user_gs(regs, v)         loadsegment(gs, (unsigned long)(v))
> > -#  define task_user_gs(tsk)            ((tsk)->thread.gs)
> > -#  define lazy_save_gs(v)              savesegment(gs, (v))
> > -#  define lazy_load_gs(v)              loadsegment(gs, (v))
> > -# else /* X86_32_LAZY_GS */
> > -#  define get_user_gs(regs)            (u16)((regs)->gs)
> > -#  define set_user_gs(regs, v)         do { (regs)->gs = (v); } while (0)
> > -#  define task_user_gs(tsk)            (task_pt_regs(tsk)->gs)
> > -#  define lazy_save_gs(v)              do { } while (0)
> > -#  define lazy_load_gs(v)              do { } while (0)
> > -# endif        /* X86_32_LAZY_GS */
> > +# define get_user_gs(regs)             (u16)({ unsigned long v; savesegment(gs, v); v; })
> > +# define set_user_gs(regs, v)          loadsegment(gs, (unsigned long)(v))
> > +# define task_user_gs(tsk)             ((tsk)->thread.gs)
> > +# define lazy_save_gs(v)               savesegment(gs, (v))
> > +# define lazy_load_gs(v)               loadsegment(gs, (v))
> > +# define load_gs_index(v)              loadsegment(gs, (v))
> >  #endif /* X86_32 */
> >
> >  #endif /* !__ASSEMBLY__ */
> > diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> > index 7fb482f0f25b..b6ffe58c70fa 100644
> > --- a/arch/x86/include/asm/stackprotector.h
> > +++ b/arch/x86/include/asm/stackprotector.h
> > @@ -5,30 +5,23 @@
> >   * Stack protector works by putting predefined pattern at the start of
> >   * the stack frame and verifying that it hasn't been overwritten when
> >   * returning from the function.  The pattern is called stack canary
> > - * and unfortunately gcc requires it to be at a fixed offset from %gs.
> > - * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
> > - * and x86_32 use segment registers differently and thus handles this
> > - * requirement differently.
> > + * and unfortunately gcc historically required it to be at a fixed offset
> > + * from the percpu segment base.  On x86_64, the offset is 40 bytes.
> >   *
> > - * On x86_64, %gs is shared by percpu area and stack canary.  All
> > - * percpu symbols are zero based and %gs points to the base of percpu
> > - * area.  The first occupant of the percpu area is always
> > - * fixed_percpu_data which contains stack_canary at offset 40.  Userland
> > - * %gs is always saved and restored on kernel entry and exit using
> > - * swapgs, so stack protector doesn't add any complexity there.
> > + * The same segment is shared by percpu area and stack canary.  On
> > + * x86_64, percpu symbols are zero based and %gs (64-bit) points to the
> > + * base of percpu area.  The first occupant of the percpu area is always
> > + * fixed_percpu_data which contains stack_canary at the approproate
> > + * offset.  On x86_32, the stack canary is just a regular percpu
> > + * variable.
> >   *
> > - * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
> > - * used for userland TLS.  Unfortunately, some processors are much
> > - * slower at loading segment registers with different value when
> > - * entering and leaving the kernel, so the kernel uses %fs for percpu
> > - * area and manages %gs lazily so that %gs is switched only when
> > - * necessary, usually during task switch.
> > + * Putting percpu data in %fs on 32-bit is a minor optimization compared to
> > + * using %gs.  Since 32-bit userspace normally has %fs == 0, we are likely
> > + * to load 0 into %fs on exit to usermode, whereas with percpu data in
> > + * %gs, we are likely to load a non-null %gs on return to user mode.
> >   *
> > - * As gcc requires the stack canary at %gs:20, %gs can't be managed
> > - * lazily if stack protector is enabled, so the kernel saves and
> > - * restores userland %gs on kernel entry and exit.  This behavior is
> > - * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
> > - * system.h to hide the details.
> > + * Once we are willing to require GCC 8.1 or better for 64-bit stackprotector
> > + * support, we can remove some of this complexity.
> >   */
> >
> >  #ifndef _ASM_STACKPROTECTOR_H
> > @@ -44,14 +37,6 @@
> >  #include <linux/random.h>
> >  #include <linux/sched.h>
> >
> > -/*
> > - * 24 byte read-only segment initializer for stack canary.  Linker
> > - * can't handle the address bit shifting.  Address will be set in
> > - * head_32 for boot CPU and setup_per_cpu_areas() for others.
> > - */
> > -#define GDT_STACK_CANARY_INIT                                          \
> > -       [GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
> > -
> >  /*
> >   * Initialize the stackprotector canary value.
> >   *
> > @@ -86,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
> >  #ifdef CONFIG_X86_64
> >         this_cpu_write(fixed_percpu_data.stack_canary, canary);
> >  #else
> > -       this_cpu_write(stack_canary.canary, canary);
> > +       this_cpu_write(__stack_chk_guard, canary);
> >  #endif
> >  }
> >
> > @@ -95,48 +80,16 @@ static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
> >  #ifdef CONFIG_X86_64
> >         per_cpu(fixed_percpu_data.stack_canary, cpu) = idle->stack_canary;
> >  #else
> > -       per_cpu(stack_canary.canary, cpu) = idle->stack_canary;
> > -#endif
> > -}
> > -
> > -static inline void setup_stack_canary_segment(int cpu)
> > -{
> > -#ifdef CONFIG_X86_32
> > -       unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
> > -       struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
> > -       struct desc_struct desc;
> > -
> > -       desc = gdt_table[GDT_ENTRY_STACK_CANARY];
> > -       set_desc_base(&desc, canary);
> > -       write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
> > -#endif
> > -}
> > -
> > -static inline void load_stack_canary_segment(void)
> > -{
> > -#ifdef CONFIG_X86_32
> > -       asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
> > +       per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
> >  #endif
> >  }
> >
> >  #else  /* STACKPROTECTOR */
> >
> > -#define GDT_STACK_CANARY_INIT
> > -
> >  /* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
> >
> > -static inline void setup_stack_canary_segment(int cpu)
> > -{ }
> > -
> >  static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
> >  { }
> >
> > -static inline void load_stack_canary_segment(void)
> > -{
> > -#ifdef CONFIG_X86_32
> > -       asm volatile ("mov %0, %%gs" : : "r" (0));
> > -#endif
> > -}
> > -
> >  #endif /* STACKPROTECTOR */
> >  #endif /* _ASM_STACKPROTECTOR_H */
> > diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> > index fdbd9d7b7bca..7b132d0312eb 100644
> > --- a/arch/x86/include/asm/suspend_32.h
> > +++ b/arch/x86/include/asm/suspend_32.h
> > @@ -13,12 +13,10 @@
> >  /* image of the saved processor state */
> >  struct saved_context {
> >         /*
> > -        * On x86_32, all segment registers, with the possible exception of
> > -        * gs, are saved at kernel entry in pt_regs.
> > +        * On x86_32, all segment registers except gs are saved at kernel
> > +        * entry in pt_regs.
> >          */
> > -#ifdef CONFIG_X86_32_LAZY_GS
> >         u16 gs;
> > -#endif
> >         unsigned long cr0, cr2, cr3, cr4;
> >         u64 misc_enable;
> >         bool misc_enable_saved;
> > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
> > index 6e043f295a60..2b411cd00a4e 100644
> > --- a/arch/x86/kernel/asm-offsets_32.c
> > +++ b/arch/x86/kernel/asm-offsets_32.c
> > @@ -53,11 +53,6 @@ void foo(void)
> >                offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
> >                offsetofend(struct cpu_entry_area, entry_stack_page.stack));
> >
> > -#ifdef CONFIG_STACKPROTECTOR
> > -       BLANK();
> > -       OFFSET(stack_canary_offset, stack_canary, canary);
> > -#endif
> > -
> >         BLANK();
> >         DEFINE(EFI_svam, offsetof(efi_runtime_services_t, set_virtual_address_map));
> >  }
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 35ad8480c464..f208569d2d3b 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -161,7 +161,6 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
> >
> >         [GDT_ENTRY_ESPFIX_SS]           = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
> >         [GDT_ENTRY_PERCPU]              = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
> > -       GDT_STACK_CANARY_INIT
> >  #endif
> >  } };
> >  EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
> > @@ -599,7 +598,6 @@ void load_percpu_segment(int cpu)
> >         __loadsegment_simple(gs, 0);
> >         wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
> >  #endif
> > -       load_stack_canary_segment();
> >  }
> >
> >  #ifdef CONFIG_X86_32
> > @@ -1793,7 +1791,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
> >  EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
> >
> >  #ifdef CONFIG_STACKPROTECTOR
> > -DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> > +DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
> > +EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
> >  #endif
> >
> >  #endif /* CONFIG_X86_64 */
> > diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
> > index 759d392cbe9f..d1d49e3d536b 100644
> > --- a/arch/x86/kernel/doublefault_32.c
> > +++ b/arch/x86/kernel/doublefault_32.c
> > @@ -100,9 +100,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
> >                 .ss             = __KERNEL_DS,
> >                 .ds             = __USER_DS,
> >                 .fs             = __KERNEL_PERCPU,
> > -#ifndef CONFIG_X86_32_LAZY_GS
> > -               .gs             = __KERNEL_STACK_CANARY,
> > -#endif
> > +               .gs             = 0,
> >
> >                 .__cr3          = __pa_nodebug(swapper_pg_dir),
> >         },
> > diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> > index 7ed84c282233..67f590425d90 100644
> > --- a/arch/x86/kernel/head_32.S
> > +++ b/arch/x86/kernel/head_32.S
> > @@ -318,8 +318,8 @@ SYM_FUNC_START(startup_32_smp)
> >         movl $(__KERNEL_PERCPU), %eax
> >         movl %eax,%fs                   # set this cpu's percpu
> >
> > -       movl $(__KERNEL_STACK_CANARY),%eax
> > -       movl %eax,%gs
> > +       xorl %eax,%eax
> > +       movl %eax,%gs                   # clear possible garbage in %gs
> >
> >         xorl %eax,%eax                  # Clear LDT
> >         lldt %ax
> > @@ -339,20 +339,6 @@ SYM_FUNC_END(startup_32_smp)
> >   */
> >  __INIT
> >  setup_once:
> > -#ifdef CONFIG_STACKPROTECTOR
> > -       /*
> > -        * Configure the stack canary. The linker can't handle this by
> > -        * relocation.  Manually set base address in stack canary
> > -        * segment descriptor.
> > -        */
> > -       movl $gdt_page,%eax
> > -       movl $stack_canary,%ecx
> > -       movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
> > -       shrl $16, %ecx
> > -       movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
> > -       movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
> > -#endif
> > -
> >         andl $0,setup_once_ref  /* Once is enough, thanks */
> >         ret
> >
> > diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
> > index fd945ce78554..0941d2f44f2a 100644
> > --- a/arch/x86/kernel/setup_percpu.c
> > +++ b/arch/x86/kernel/setup_percpu.c
> > @@ -224,7 +224,6 @@ void __init setup_per_cpu_areas(void)
> >                 per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
> >                 per_cpu(cpu_number, cpu) = cpu;
> >                 setup_percpu_segment(cpu);
> > -               setup_stack_canary_segment(cpu);
> >                 /*
> >                  * Copy data used in early init routines from the
> >                  * initial arrays to the per cpu data areas.  These
> > diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
> > index 64a496a0687f..3c883e064242 100644
> > --- a/arch/x86/kernel/tls.c
> > +++ b/arch/x86/kernel/tls.c
> > @@ -164,17 +164,11 @@ int do_set_thread_area(struct task_struct *p, int idx,
> >                 savesegment(fs, sel);
> >                 if (sel == modified_sel)
> >                         loadsegment(fs, sel);
> > -
> > -               savesegment(gs, sel);
> > -               if (sel == modified_sel)
> > -                       load_gs_index(sel);
> >  #endif
> >
> > -#ifdef CONFIG_X86_32_LAZY_GS
> >                 savesegment(gs, sel);
> >                 if (sel == modified_sel)
> > -                       loadsegment(gs, sel);
> > -#endif
> > +                       load_gs_index(sel);
> >         } else {
> >  #ifdef CONFIG_X86_64
> >                 if (p->thread.fsindex == modified_sel)
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index f923e14e87df..ec39073b4897 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> >  #ifdef CONFIG_X86_64
> >                 loadsegment(fs, svm->host.fs);
> >                 wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> > -               load_gs_index(svm->host.gs);
> > -#else
> > -#ifdef CONFIG_X86_32_LAZY_GS
> > -               loadsegment(gs, svm->host.gs);
> > -#endif
> >  #endif
> > +               load_gs_index(svm->host.gs);
> >
> >                 for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> >                         wrmsrl(host_save_user_msrs[i].index,
> > @@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> >         } else {
> >                 __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
> >
> > +               /* Restore the percpu segment immediately. */
> >  #ifdef CONFIG_X86_64
> >                 native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> >  #else
> >                 loadsegment(fs, svm->host.fs);
> > -#ifndef CONFIG_X86_32_LAZY_GS
> > -               loadsegment(gs, svm->host.gs);
> > -#endif
> >  #endif
> >         }
> >
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 4229950a5d78..7f89a091f1fb 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -404,10 +404,6 @@ static short get_segment_selector(struct pt_regs *regs, int seg_reg_idx)
> >         case INAT_SEG_REG_FS:
> >                 return (unsigned short)(regs->fs & 0xffff);
> >         case INAT_SEG_REG_GS:
> > -               /*
> > -                * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
> > -                * The macro below takes care of both cases.
> > -                */
> >                 return get_user_gs(regs);
> >         case INAT_SEG_REG_IGNORE:
> >         default:
> > diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> > index 43b4d864817e..afbf0bb252da 100644
> > --- a/arch/x86/platform/pvh/head.S
> > +++ b/arch/x86/platform/pvh/head.S
> > @@ -45,10 +45,8 @@
> >
> >  #define PVH_GDT_ENTRY_CS       1
> >  #define PVH_GDT_ENTRY_DS       2
> > -#define PVH_GDT_ENTRY_CANARY   3
> >  #define PVH_CS_SEL             (PVH_GDT_ENTRY_CS * 8)
> >  #define PVH_DS_SEL             (PVH_GDT_ENTRY_DS * 8)
> > -#define PVH_CANARY_SEL         (PVH_GDT_ENTRY_CANARY * 8)
> >
> >  SYM_CODE_START_LOCAL(pvh_start_xen)
> >         cld
> > @@ -109,17 +107,6 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> >
> >  #else /* CONFIG_X86_64 */
> >
> > -       /* Set base address in stack canary descriptor. */
> > -       movl $_pa(gdt_start),%eax
> > -       movl $_pa(canary),%ecx
> > -       movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
> > -       shrl $16, %ecx
> > -       movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
> > -       movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
> > -
> > -       mov $PVH_CANARY_SEL,%eax
> > -       mov %eax,%gs
> > -
> >         call mk_early_pgtbl_32
> >
> >         mov $_pa(initial_page_table), %eax
> > @@ -163,7 +150,6 @@ SYM_DATA_START_LOCAL(gdt_start)
> >         .quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
> >  #endif
> >         .quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
> > -       .quad GDT_ENTRY(0x4090, 0, 0x18)    /* PVH_CANARY_SEL */
> >  SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
> >
> >         .balign 16
> > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> > index db1378c6ff26..ef4329d67a5f 100644
> > --- a/arch/x86/power/cpu.c
> > +++ b/arch/x86/power/cpu.c
> > @@ -99,11 +99,8 @@ static void __save_processor_state(struct saved_context *ctxt)
> >         /*
> >          * segment registers
> >          */
> > -#ifdef CONFIG_X86_32_LAZY_GS
> >         savesegment(gs, ctxt->gs);
> > -#endif
> >  #ifdef CONFIG_X86_64
> > -       savesegment(gs, ctxt->gs);
> >         savesegment(fs, ctxt->fs);
> >         savesegment(ds, ctxt->ds);
> >         savesegment(es, ctxt->es);
> > @@ -232,7 +229,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >         wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
> >  #else
> >         loadsegment(fs, __KERNEL_PERCPU);
> > -       loadsegment(gs, __KERNEL_STACK_CANARY);
> >  #endif
> >
> >         /* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
> > @@ -255,7 +251,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >          */
> >         wrmsrl(MSR_FS_BASE, ctxt->fs_base);
> >         wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
> > -#elif defined(CONFIG_X86_32_LAZY_GS)
> > +#else
> >         loadsegment(gs, ctxt->gs);
> >  #endif
> >
> > diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> > index 9a5a50cdaab5..e18235a6390d 100644
> > --- a/arch/x86/xen/enlighten_pv.c
> > +++ b/arch/x86/xen/enlighten_pv.c
> > @@ -1190,7 +1190,6 @@ static void __init xen_setup_gdt(int cpu)
> >         pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
> >         pv_ops.cpu.load_gdt = xen_load_gdt_boot;
> >
> > -       setup_stack_canary_segment(cpu);
> >         switch_to_new_gdt(cpu);
> >
> >         pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
> > diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
> > index f5c119495254..825c75c5b715 100755
> > --- a/scripts/gcc-x86_32-has-stack-protector.sh
> > +++ b/scripts/gcc-x86_32-has-stack-protector.sh
> > @@ -1,4 +1,8 @@
> >  #!/bin/sh
> >  # SPDX-License-Identifier: GPL-2.0
> >
> > -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
> > +# This requires GCC 8.1 or better.  Specifically, we require
> > +# -mstack-protector-guard-reg, added by
> > +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
>
> And -mstack-protector-guard-symbol= it would seem, since that's what
> will cause this to fail with Clang?
>
> What about -mstack-protector-guard=global which can be used for non
> SMP?  I'm guessing that came out in an earlier version of GCC than
> -mstack-protector-guard-reg, so it doesn't need to be checked?  FWIW,
> some of these were added to Clang in https://reviews.llvm.org/D88631.
>

That might be better to check if compiler supports
-mstack-protector-guard-reg=<arg> flag/option.

- Sedat -

> > +
> > +echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
> > --
> > 2.29.2
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip: x86/core] x86/entry/32: Remove leftover macros after stackprotector cleanups
  2021-02-13 19:19 ` [PATCH v2 2/2] x86/entry/32: Remove leftover macros after stackprotector cleanups Andy Lutomirski
@ 2021-03-08 13:14   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2021-03-08 13:14 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Andy Lutomirski, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     d0962f2b24c99889a386f0658c71535f56358f77
Gitweb:        https://git.kernel.org/tip/d0962f2b24c99889a386f0658c71535f56358f77
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Sat, 13 Feb 2021 11:19:45 -08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Mon, 08 Mar 2021 13:27:31 +01:00

x86/entry/32: Remove leftover macros after stackprotector cleanups

Now that nonlazy-GS mode is gone, remove the macros from entry_32.S
that obfuscated^Wabstracted GS handling.  The assembled output is
identical before and after this patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/b1543116f0f0e68f1763d90d5f7fcec27885dff5.1613243844.git.luto@kernel.org
---
 arch/x86/entry/entry_32.S | 43 +-------------------------------------
 1 file changed, 2 insertions(+), 41 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index eb0cb66..bee9101 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -53,35 +53,6 @@
 
 #define PTI_SWITCH_MASK         (1 << PAGE_SHIFT)
 
-/*
- * User gs save/restore
- *
- * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
- * will remove it entirely.
- */
- /* unfortunately push/pop can't be no-op */
-.macro PUSH_GS
-	pushl	$0
-.endm
-.macro POP_GS pop=0
-	addl	$(4 + \pop), %esp
-.endm
-.macro POP_GS_EX
-.endm
-
- /* all the rest are no-op */
-.macro PTGS_TO_GS
-.endm
-.macro PTGS_TO_GS_EX
-.endm
-.macro GS_TO_REG reg
-.endm
-.macro REG_TO_PTGS reg
-.endm
-.macro SET_KERNEL_GS reg
-.endm
-
-
 /* Unconditionally switch to user cr3 */
 .macro SWITCH_TO_USER_CR3 scratch_reg:req
 	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
@@ -234,7 +205,7 @@
 .macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 skip_gs=0 unwind_espfix=0
 	cld
 .if \skip_gs == 0
-	PUSH_GS
+	pushl	$0
 .endif
 	pushl	%fs
 
@@ -259,9 +230,6 @@
 	movl	$(__USER_DS), %edx
 	movl	%edx, %ds
 	movl	%edx, %es
-.if \skip_gs == 0
-	SET_KERNEL_GS %edx
-.endif
 	/* Switch to kernel stack if necessary */
 .if \switch_stacks > 0
 	SWITCH_TO_KERNEL_STACK
@@ -300,7 +268,7 @@
 1:	popl	%ds
 2:	popl	%es
 3:	popl	%fs
-	POP_GS \pop
+	addl	$(4 + \pop), %esp	/* pop the unused "gs" slot */
 	IRET_FRAME
 .pushsection .fixup, "ax"
 4:	movl	$0, (%esp)
@@ -313,7 +281,6 @@
 	_ASM_EXTABLE(1b, 4b)
 	_ASM_EXTABLE(2b, 5b)
 	_ASM_EXTABLE(3b, 6b)
-	POP_GS_EX
 .endm
 
 .macro RESTORE_ALL_NMI cr3_reg:req pop=0
@@ -928,7 +895,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
-	PTGS_TO_GS
 
 	popl	%ebx			/* pt_regs->bx */
 	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
@@ -964,7 +930,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	jmp	1b
 .popsection
 	_ASM_EXTABLE(1b, 2b)
-	PTGS_TO_GS_EX
 
 .Lsysenter_fix_flags:
 	pushl	$X86_EFLAGS_FIXED
@@ -1106,11 +1071,7 @@ SYM_CODE_START_LOCAL_NOALIGN(handle_exception)
 	SAVE_ALL switch_stacks=1 skip_gs=1 unwind_espfix=1
 	ENCODE_FRAME_POINTER
 
-	/* fixup %gs */
-	GS_TO_REG %ecx
 	movl	PT_GS(%esp), %edi		# get the function address
-	REG_TO_PTGS %ecx
-	SET_KERNEL_GS %ecx
 
 	/* fixup orig %eax */
 	movl	PT_ORIG_EAX(%esp), %edx		# get the error code

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip: x86/core] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
                     ` (2 preceding siblings ...)
  2021-02-16 18:45   ` Nick Desaulniers
@ 2021-03-08 13:14   ` tip-bot2 for Andy Lutomirski
  2022-09-29 13:56   ` [PATCH v2 1/2] " Andy Shevchenko
  4 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2021-03-08 13:14 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Andy Lutomirski, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     3fb0fdb3bbe7aed495109b3296b06c2409734023
Gitweb:        https://git.kernel.org/tip/3fb0fdb3bbe7aed495109b3296b06c2409734023
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Sat, 13 Feb 2021 11:19:44 -08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Mon, 08 Mar 2021 13:19:05 +01:00

x86/stackprotector/32: Make the canary into a regular percpu variable

On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage.  It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work.  Supporting both variants is a
maintenance and testing mess.

Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.

Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable.  This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.

(That name is special.  We could use any symbol we want for the
 %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
 name other than __stack_chk_guard.)

Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.

Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
---
 arch/x86/Kconfig                          |  7 +--
 arch/x86/Makefile                         |  8 ++-
 arch/x86/entry/entry_32.S                 | 56 +---------------
 arch/x86/include/asm/processor.h          | 15 +---
 arch/x86/include/asm/ptrace.h             |  5 +-
 arch/x86/include/asm/segment.h            | 30 ++------
 arch/x86/include/asm/stackprotector.h     | 79 ++++------------------
 arch/x86/include/asm/suspend_32.h         |  6 +--
 arch/x86/kernel/asm-offsets_32.c          |  5 +-
 arch/x86/kernel/cpu/common.c              |  5 +-
 arch/x86/kernel/doublefault_32.c          |  4 +-
 arch/x86/kernel/head_32.S                 | 18 +-----
 arch/x86/kernel/setup_percpu.c            |  1 +-
 arch/x86/kernel/tls.c                     |  8 +--
 arch/x86/lib/insn-eval.c                  |  4 +-
 arch/x86/platform/pvh/head.S              | 14 +----
 arch/x86/power/cpu.c                      |  6 +--
 arch/x86/xen/enlighten_pv.c               |  1 +-
 scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
 19 files changed, 60 insertions(+), 218 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..10cc619 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -360,10 +360,6 @@ config X86_64_SMP
 	def_bool y
 	depends on X86_64 && SMP
 
-config X86_32_LAZY_GS
-	def_bool y
-	depends on X86_32 && !STACKPROTECTOR
-
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
 
@@ -386,7 +382,8 @@ config CC_HAS_SANE_STACKPROTECTOR
 	default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
 	help
 	   We have to make sure stack protector is unconditionally disabled if
-	   the compiler produces broken code.
+	   the compiler produces broken code or if it does not let us control
+	   the segment on 32-bit kernels.
 
 menu "Processor type and features"
 
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d6d5a2..952f534 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -79,6 +79,14 @@ ifeq ($(CONFIG_X86_32),y)
 
         # temporary until string.h is fixed
         KBUILD_CFLAGS += -ffreestanding
+
+	ifeq ($(CONFIG_STACKPROTECTOR),y)
+		ifeq ($(CONFIG_SMP),y)
+			KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+		else
+			KBUILD_CFLAGS += -mstack-protector-guard=global
+		endif
+	endif
 else
         BITS := 64
         UTS_MACHINE := x86_64
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017..eb0cb66 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -20,7 +20,7 @@
  *	1C(%esp) - %ds
  *	20(%esp) - %es
  *	24(%esp) - %fs
- *	28(%esp) - %gs		saved iff !CONFIG_X86_32_LAZY_GS
+ *	28(%esp) - unused -- was %gs on old stackprotector kernels
  *	2C(%esp) - orig_eax
  *	30(%esp) - %eip
  *	34(%esp) - %cs
@@ -56,14 +56,9 @@
 /*
  * User gs save/restore
  *
- * %gs is used for userland TLS and kernel only uses it for stack
- * canary which is required to be at %gs:20 by gcc.  Read the comment
- * at the top of stackprotector.h for more info.
- *
- * Local labels 98 and 99 are used.
+ * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
+ * will remove it entirely.
  */
-#ifdef CONFIG_X86_32_LAZY_GS
-
  /* unfortunately push/pop can't be no-op */
 .macro PUSH_GS
 	pushl	$0
@@ -86,49 +81,6 @@
 .macro SET_KERNEL_GS reg
 .endm
 
-#else	/* CONFIG_X86_32_LAZY_GS */
-
-.macro PUSH_GS
-	pushl	%gs
-.endm
-
-.macro POP_GS pop=0
-98:	popl	%gs
-  .if \pop <> 0
-	add	$\pop, %esp
-  .endif
-.endm
-.macro POP_GS_EX
-.pushsection .fixup, "ax"
-99:	movl	$0, (%esp)
-	jmp	98b
-.popsection
-	_ASM_EXTABLE(98b, 99b)
-.endm
-
-.macro PTGS_TO_GS
-98:	mov	PT_GS(%esp), %gs
-.endm
-.macro PTGS_TO_GS_EX
-.pushsection .fixup, "ax"
-99:	movl	$0, PT_GS(%esp)
-	jmp	98b
-.popsection
-	_ASM_EXTABLE(98b, 99b)
-.endm
-
-.macro GS_TO_REG reg
-	movl	%gs, \reg
-.endm
-.macro REG_TO_PTGS reg
-	movl	\reg, PT_GS(%esp)
-.endm
-.macro SET_KERNEL_GS reg
-	movl	$(__KERNEL_STACK_CANARY), \reg
-	movl	\reg, %gs
-.endm
-
-#endif /* CONFIG_X86_32_LAZY_GS */
 
 /* Unconditionally switch to user cr3 */
 .macro SWITCH_TO_USER_CR3 scratch_reg:req
@@ -779,7 +731,7 @@ SYM_CODE_START(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movl	TASK_stack_canary(%edx), %ebx
-	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
+	movl	%ebx, PER_CPU_VAR(__stack_chk_guard)
 #endif
 
 #ifdef CONFIG_RETPOLINE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index dc6d149..bac2a42 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -439,6 +439,9 @@ struct fixed_percpu_data {
 	 * GCC hardcodes the stack canary as %gs:40.  Since the
 	 * irq_stack is the object at %gs:0, we reserve the bottom
 	 * 48 bytes of the irq stack for the canary.
+	 *
+	 * Once we are willing to require -mstack-protector-guard-symbol=
+	 * support for x86_64 stackprotector, we can get rid of this.
 	 */
 	char		gs_base[40];
 	unsigned long	stack_canary;
@@ -460,17 +463,7 @@ extern asmlinkage void ignore_sysret(void);
 void current_save_fsgs(void);
 #else	/* X86_64 */
 #ifdef CONFIG_STACKPROTECTOR
-/*
- * Make sure stack canary segment base is cached-aligned:
- *   "For Intel Atom processors, avoid non zero segment base address
- *    that is not aligned to cache line boundary at all cost."
- * (Optim Ref Manual Assembly/Compiler Coding Rule 15.)
- */
-struct stack_canary {
-	char __pad[20];		/* canary at %gs:20 */
-	unsigned long canary;
-};
-DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
+DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
 #endif
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index d8324a2..b2c4c12 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -37,7 +37,10 @@ struct pt_regs {
 	unsigned short __esh;
 	unsigned short fs;
 	unsigned short __fsh;
-	/* On interrupt, gs and __gsh store the vector number. */
+	/*
+	 * On interrupt, gs and __gsh store the vector number.  They never
+	 * store gs any more.
+	 */
 	unsigned short gs;
 	unsigned short __gsh;
 	/* On interrupt, this is the error code. */
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 7fdd4fa..7204402 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -95,7 +95,7 @@
  *
  *  26 - ESPFIX small SS
  *  27 - per-cpu			[ offset to per-cpu data area ]
- *  28 - stack_canary-20		[ for stack protector ]		<=== cacheline #8
+ *  28 - unused
  *  29 - unused
  *  30 - unused
  *  31 - TSS for double fault handler
@@ -118,7 +118,6 @@
 
 #define GDT_ENTRY_ESPFIX_SS		26
 #define GDT_ENTRY_PERCPU		27
-#define GDT_ENTRY_STACK_CANARY		28
 
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
@@ -158,12 +157,6 @@
 # define __KERNEL_PERCPU		0
 #endif
 
-#ifdef CONFIG_STACKPROTECTOR
-# define __KERNEL_STACK_CANARY		(GDT_ENTRY_STACK_CANARY*8)
-#else
-# define __KERNEL_STACK_CANARY		0
-#endif
-
 #else /* 64-bit: */
 
 #include <asm/cache.h>
@@ -364,22 +357,15 @@ static inline void __loadsegment_fs(unsigned short value)
 	asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
 
 /*
- * x86-32 user GS accessors:
+ * x86-32 user GS accessors.  This is ugly and could do with some cleaning up.
  */
 #ifdef CONFIG_X86_32
-# ifdef CONFIG_X86_32_LAZY_GS
-#  define get_user_gs(regs)		(u16)({ unsigned long v; savesegment(gs, v); v; })
-#  define set_user_gs(regs, v)		loadsegment(gs, (unsigned long)(v))
-#  define task_user_gs(tsk)		((tsk)->thread.gs)
-#  define lazy_save_gs(v)		savesegment(gs, (v))
-#  define lazy_load_gs(v)		loadsegment(gs, (v))
-# else	/* X86_32_LAZY_GS */
-#  define get_user_gs(regs)		(u16)((regs)->gs)
-#  define set_user_gs(regs, v)		do { (regs)->gs = (v); } while (0)
-#  define task_user_gs(tsk)		(task_pt_regs(tsk)->gs)
-#  define lazy_save_gs(v)		do { } while (0)
-#  define lazy_load_gs(v)		do { } while (0)
-# endif	/* X86_32_LAZY_GS */
+# define get_user_gs(regs)		(u16)({ unsigned long v; savesegment(gs, v); v; })
+# define set_user_gs(regs, v)		loadsegment(gs, (unsigned long)(v))
+# define task_user_gs(tsk)		((tsk)->thread.gs)
+# define lazy_save_gs(v)		savesegment(gs, (v))
+# define lazy_load_gs(v)		loadsegment(gs, (v))
+# define load_gs_index(v)		loadsegment(gs, (v))
 #endif	/* X86_32 */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 7fb482f..b6ffe58 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -5,30 +5,23 @@
  * Stack protector works by putting predefined pattern at the start of
  * the stack frame and verifying that it hasn't been overwritten when
  * returning from the function.  The pattern is called stack canary
- * and unfortunately gcc requires it to be at a fixed offset from %gs.
- * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
- * and x86_32 use segment registers differently and thus handles this
- * requirement differently.
+ * and unfortunately gcc historically required it to be at a fixed offset
+ * from the percpu segment base.  On x86_64, the offset is 40 bytes.
  *
- * On x86_64, %gs is shared by percpu area and stack canary.  All
- * percpu symbols are zero based and %gs points to the base of percpu
- * area.  The first occupant of the percpu area is always
- * fixed_percpu_data which contains stack_canary at offset 40.  Userland
- * %gs is always saved and restored on kernel entry and exit using
- * swapgs, so stack protector doesn't add any complexity there.
+ * The same segment is shared by percpu area and stack canary.  On
+ * x86_64, percpu symbols are zero based and %gs (64-bit) points to the
+ * base of percpu area.  The first occupant of the percpu area is always
+ * fixed_percpu_data which contains stack_canary at the approproate
+ * offset.  On x86_32, the stack canary is just a regular percpu
+ * variable.
  *
- * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
- * used for userland TLS.  Unfortunately, some processors are much
- * slower at loading segment registers with different value when
- * entering and leaving the kernel, so the kernel uses %fs for percpu
- * area and manages %gs lazily so that %gs is switched only when
- * necessary, usually during task switch.
+ * Putting percpu data in %fs on 32-bit is a minor optimization compared to
+ * using %gs.  Since 32-bit userspace normally has %fs == 0, we are likely
+ * to load 0 into %fs on exit to usermode, whereas with percpu data in
+ * %gs, we are likely to load a non-null %gs on return to user mode.
  *
- * As gcc requires the stack canary at %gs:20, %gs can't be managed
- * lazily if stack protector is enabled, so the kernel saves and
- * restores userland %gs on kernel entry and exit.  This behavior is
- * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
- * system.h to hide the details.
+ * Once we are willing to require GCC 8.1 or better for 64-bit stackprotector
+ * support, we can remove some of this complexity.
  */
 
 #ifndef _ASM_STACKPROTECTOR_H
@@ -45,14 +38,6 @@
 #include <linux/sched.h>
 
 /*
- * 24 byte read-only segment initializer for stack canary.  Linker
- * can't handle the address bit shifting.  Address will be set in
- * head_32 for boot CPU and setup_per_cpu_areas() for others.
- */
-#define GDT_STACK_CANARY_INIT						\
-	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
-
-/*
  * Initialize the stackprotector canary value.
  *
  * NOTE: this must only be called from functions that never return
@@ -86,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
 #ifdef CONFIG_X86_64
 	this_cpu_write(fixed_percpu_data.stack_canary, canary);
 #else
-	this_cpu_write(stack_canary.canary, canary);
+	this_cpu_write(__stack_chk_guard, canary);
 #endif
 }
 
@@ -95,48 +80,16 @@ static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_64
 	per_cpu(fixed_percpu_data.stack_canary, cpu) = idle->stack_canary;
 #else
-	per_cpu(stack_canary.canary, cpu) = idle->stack_canary;
-#endif
-}
-
-static inline void setup_stack_canary_segment(int cpu)
-{
-#ifdef CONFIG_X86_32
-	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
-	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
-	struct desc_struct desc;
-
-	desc = gdt_table[GDT_ENTRY_STACK_CANARY];
-	set_desc_base(&desc, canary);
-	write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
-#endif
-}
-
-static inline void load_stack_canary_segment(void)
-{
-#ifdef CONFIG_X86_32
-	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
+	per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
 #endif
 }
 
 #else	/* STACKPROTECTOR */
 
-#define GDT_STACK_CANARY_INIT
-
 /* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
 
-static inline void setup_stack_canary_segment(int cpu)
-{ }
-
 static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
 { }
 
-static inline void load_stack_canary_segment(void)
-{
-#ifdef CONFIG_X86_32
-	asm volatile ("mov %0, %%gs" : : "r" (0));
-#endif
-}
-
 #endif	/* STACKPROTECTOR */
 #endif	/* _ASM_STACKPROTECTOR_H */
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index fdbd9d7..7b132d0 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -13,12 +13,10 @@
 /* image of the saved processor state */
 struct saved_context {
 	/*
-	 * On x86_32, all segment registers, with the possible exception of
-	 * gs, are saved at kernel entry in pt_regs.
+	 * On x86_32, all segment registers except gs are saved at kernel
+	 * entry in pt_regs.
 	 */
-#ifdef CONFIG_X86_32_LAZY_GS
 	u16 gs;
-#endif
 	unsigned long cr0, cr2, cr3, cr4;
 	u64 misc_enable;
 	bool misc_enable_saved;
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 6e043f2..2b411cd 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -53,11 +53,6 @@ void foo(void)
 	       offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
-#ifdef CONFIG_STACKPROTECTOR
-	BLANK();
-	OFFSET(stack_canary_offset, stack_canary, canary);
-#endif
-
 	BLANK();
 	DEFINE(EFI_svam, offsetof(efi_runtime_services_t, set_virtual_address_map));
 }
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ab640ab..23cb9d6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -161,7 +161,6 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
 
 	[GDT_ENTRY_ESPFIX_SS]		= GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
 	[GDT_ENTRY_PERCPU]		= GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
-	GDT_STACK_CANARY_INIT
 #endif
 } };
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
@@ -599,7 +598,6 @@ void load_percpu_segment(int cpu)
 	__loadsegment_simple(gs, 0);
 	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
 #endif
-	load_stack_canary_segment();
 }
 
 #ifdef CONFIG_X86_32
@@ -1796,7 +1794,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
 #ifdef CONFIG_STACKPROTECTOR
-DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
+DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
 #endif
 
 #endif	/* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
index 759d392..d1d49e3 100644
--- a/arch/x86/kernel/doublefault_32.c
+++ b/arch/x86/kernel/doublefault_32.c
@@ -100,9 +100,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
 		.ss		= __KERNEL_DS,
 		.ds		= __USER_DS,
 		.fs		= __KERNEL_PERCPU,
-#ifndef CONFIG_X86_32_LAZY_GS
-		.gs		= __KERNEL_STACK_CANARY,
-#endif
+		.gs		= 0,
 
 		.__cr3		= __pa_nodebug(swapper_pg_dir),
 	},
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 7ed84c2..67f5904 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -318,8 +318,8 @@ SYM_FUNC_START(startup_32_smp)
 	movl $(__KERNEL_PERCPU), %eax
 	movl %eax,%fs			# set this cpu's percpu
 
-	movl $(__KERNEL_STACK_CANARY),%eax
-	movl %eax,%gs
+	xorl %eax,%eax
+	movl %eax,%gs			# clear possible garbage in %gs
 
 	xorl %eax,%eax			# Clear LDT
 	lldt %ax
@@ -339,20 +339,6 @@ SYM_FUNC_END(startup_32_smp)
  */
 __INIT
 setup_once:
-#ifdef CONFIG_STACKPROTECTOR
-	/*
-	 * Configure the stack canary. The linker can't handle this by
-	 * relocation.  Manually set base address in stack canary
-	 * segment descriptor.
-	 */
-	movl $gdt_page,%eax
-	movl $stack_canary,%ecx
-	movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
-	shrl $16, %ecx
-	movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
-	movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
-#endif
-
 	andl $0,setup_once_ref	/* Once is enough, thanks */
 	ret
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index fd945ce..0941d2f 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -224,7 +224,6 @@ void __init setup_per_cpu_areas(void)
 		per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
 		per_cpu(cpu_number, cpu) = cpu;
 		setup_percpu_segment(cpu);
-		setup_stack_canary_segment(cpu);
 		/*
 		 * Copy data used in early init routines from the
 		 * initial arrays to the per cpu data areas.  These
diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 64a496a..3c883e0 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -164,17 +164,11 @@ int do_set_thread_area(struct task_struct *p, int idx,
 		savesegment(fs, sel);
 		if (sel == modified_sel)
 			loadsegment(fs, sel);
-
-		savesegment(gs, sel);
-		if (sel == modified_sel)
-			load_gs_index(sel);
 #endif
 
-#ifdef CONFIG_X86_32_LAZY_GS
 		savesegment(gs, sel);
 		if (sel == modified_sel)
-			loadsegment(gs, sel);
-#endif
+			load_gs_index(sel);
 	} else {
 #ifdef CONFIG_X86_64
 		if (p->thread.fsindex == modified_sel)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4229950..7f89a09 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -404,10 +404,6 @@ static short get_segment_selector(struct pt_regs *regs, int seg_reg_idx)
 	case INAT_SEG_REG_FS:
 		return (unsigned short)(regs->fs & 0xffff);
 	case INAT_SEG_REG_GS:
-		/*
-		 * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
-		 * The macro below takes care of both cases.
-		 */
 		return get_user_gs(regs);
 	case INAT_SEG_REG_IGNORE:
 	default:
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index d2ccadc..b049070 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -46,10 +46,8 @@
 
 #define PVH_GDT_ENTRY_CS	1
 #define PVH_GDT_ENTRY_DS	2
-#define PVH_GDT_ENTRY_CANARY	3
 #define PVH_CS_SEL		(PVH_GDT_ENTRY_CS * 8)
 #define PVH_DS_SEL		(PVH_GDT_ENTRY_DS * 8)
-#define PVH_CANARY_SEL		(PVH_GDT_ENTRY_CANARY * 8)
 
 SYM_CODE_START_LOCAL(pvh_start_xen)
 	cld
@@ -111,17 +109,6 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 
 #else /* CONFIG_X86_64 */
 
-	/* Set base address in stack canary descriptor. */
-	movl $_pa(gdt_start),%eax
-	movl $_pa(canary),%ecx
-	movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
-	shrl $16, %ecx
-	movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
-	movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
-
-	mov $PVH_CANARY_SEL,%eax
-	mov %eax,%gs
-
 	call mk_early_pgtbl_32
 
 	mov $_pa(initial_page_table), %eax
@@ -165,7 +152,6 @@ SYM_DATA_START_LOCAL(gdt_start)
 	.quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
 #endif
 	.quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
-	.quad GDT_ENTRY(0x4090, 0, 0x18)    /* PVH_CANARY_SEL */
 SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
 
 	.balign 16
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index db1378c..ef4329d 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -99,11 +99,8 @@ static void __save_processor_state(struct saved_context *ctxt)
 	/*
 	 * segment registers
 	 */
-#ifdef CONFIG_X86_32_LAZY_GS
 	savesegment(gs, ctxt->gs);
-#endif
 #ifdef CONFIG_X86_64
-	savesegment(gs, ctxt->gs);
 	savesegment(fs, ctxt->fs);
 	savesegment(ds, ctxt->ds);
 	savesegment(es, ctxt->es);
@@ -232,7 +229,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
 #else
 	loadsegment(fs, __KERNEL_PERCPU);
-	loadsegment(gs, __KERNEL_STACK_CANARY);
 #endif
 
 	/* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
@@ -255,7 +251,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	 */
 	wrmsrl(MSR_FS_BASE, ctxt->fs_base);
 	wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
-#elif defined(CONFIG_X86_32_LAZY_GS)
+#else
 	loadsegment(gs, ctxt->gs);
 #endif
 
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index dc0a337..33e797b 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1204,7 +1204,6 @@ static void __init xen_setup_gdt(int cpu)
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
 
-	setup_stack_canary_segment(cpu);
 	switch_to_new_gdt(cpu);
 
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
index f5c1194..825c75c 100755
--- a/scripts/gcc-x86_32-has-stack-protector.sh
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -1,4 +1,8 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+# This requires GCC 8.1 or better.  Specifically, we require
+# -mstack-protector-guard-reg, added by
+# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
+
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
                     ` (3 preceding siblings ...)
  2021-03-08 13:14   ` [tip: x86/core] " tip-bot2 for Andy Lutomirski
@ 2022-09-29 13:56   ` Andy Shevchenko
  2022-09-29 14:20     ` Andy Shevchenko
  4 siblings, 1 reply; 24+ messages in thread
From: Andy Shevchenko @ 2022-09-29 13:56 UTC (permalink / raw)
  To: Andy Lutomirski, Ferry Toth
  Cc: x86, LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel

+Cc: Ferry

On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> percpu storage.  It's even nastier because it means that whether %gs
> contains userspace state or kernel state while running kernel code
> depends on whether stackprotector is enabled (this is
> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> that segment selectors work.  Supporting both variants is a
> maintenance and testing mess.
> 
> Merely rearranging so that percpu and the stack canary
> share the same segment would be messy as the 32-bit percpu address
> layout isn't currently compatible with putting a variable at a fixed
> offset.
> 
> Fortunately, GCC 8.1 added options that allow the stack canary to be
> accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
> percpu variable.  This lets us get rid of all of the code to manage the
> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> 
> (That name is special.  We could use any symbol we want for the
>  %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
>  name other than __stack_chk_guard.)
> 
> This patch forcibly disables stackprotector on older compilers that
> don't support the new options and makes the stack canary into a
> percpu variable.  The "lazy GS" approach is now used for all 32-bit
> configurations.
> 
> This patch also makes load_gs_index() work on 32-bit kernels.  On
> 64-bit kernels, it loads the GS selector and updates the user
> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> it loads the GS selector and updates GSBASE, which is now
> always the user base.  This means that the overall effect is
> the same on 32-bit and 64-bit, which avoids some ifdeffery.

This patch broke 32-bit boot on Intel Merrifield

git bisect start
# good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
# bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
git bisect bad 62fb9874f5da54fdb243003b386128037319b219
# bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.13
git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
# good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag 'regulator-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
git bisect good ca62e9090d229926f43f20291bb44d67897baab7
# bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
# good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26] drm/amd/display: Change input parameter for set_drr
git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
# good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda: fix macroblocks count control usage
git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
# bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
# good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect good d1466bc583a81830cef2399a4b8a514398351b40
# good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
# bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve a semantic conflict
git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
# bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d] x86/tools/insn_decoder_test: Convert to insn_decode()
git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
# bad: [514ef77607b9ff184c11b88e8f100bc27f07460d] x86/boot/compressed/sev-es: Convert to insn_decode()
git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
# bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn: Rename insn_decode() to insn_decode_from_regs()
git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
# bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32: Remove leftover macros after stackprotector cleanups
git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
# bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023] x86/stackprotector/32: Make the canary into a regular percpu variable
git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
# first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023] x86/stackprotector/32: Make the canary into a regular percpu variable

Any suggestions how to fix are welcome!

Configuration is based on in-tree i386_defconfig with some drivers enabled
on top (no core stuff was altered, but if you wish to check, it's here:
https://github.com/andy-shev/linux/blob/eds-acpi/arch/x86/configs/i386_defconfig).

> Cc: Sedat Dilek <sedat.dilek@gmail.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/Kconfig                          |  7 +-
>  arch/x86/Makefile                         |  8 +++
>  arch/x86/entry/entry_32.S                 | 56 ++--------------
>  arch/x86/include/asm/processor.h          | 15 ++---
>  arch/x86/include/asm/ptrace.h             |  5 +-
>  arch/x86/include/asm/segment.h            | 30 +++------
>  arch/x86/include/asm/stackprotector.h     | 79 +++++------------------
>  arch/x86/include/asm/suspend_32.h         |  6 +-
>  arch/x86/kernel/asm-offsets_32.c          |  5 --
>  arch/x86/kernel/cpu/common.c              |  5 +-
>  arch/x86/kernel/doublefault_32.c          |  4 +-
>  arch/x86/kernel/head_32.S                 | 18 +-----
>  arch/x86/kernel/setup_percpu.c            |  1 -
>  arch/x86/kernel/tls.c                     |  8 +--
>  arch/x86/kvm/svm/svm.c                    | 10 +--
>  arch/x86/lib/insn-eval.c                  |  4 --
>  arch/x86/platform/pvh/head.S              | 14 ----
>  arch/x86/power/cpu.c                      |  6 +-
>  arch/x86/xen/enlighten_pv.c               |  1 -
>  scripts/gcc-x86_32-has-stack-protector.sh |  6 +-
>  20 files changed, 62 insertions(+), 226 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..12d8bf011d08 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -353,10 +353,6 @@ config X86_64_SMP
>  	def_bool y
>  	depends on X86_64 && SMP
>  
> -config X86_32_LAZY_GS
> -	def_bool y
> -	depends on X86_32 && !STACKPROTECTOR
> -
>  config ARCH_SUPPORTS_UPROBES
>  	def_bool y
>  
> @@ -379,7 +375,8 @@ config CC_HAS_SANE_STACKPROTECTOR
>  	default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
>  	help
>  	   We have to make sure stack protector is unconditionally disabled if
> -	   the compiler produces broken code.
> +	   the compiler produces broken code or if it does not let us control
> +	   the segment on 32-bit kernels.
>  
>  menu "Processor type and features"
>  
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 7116da3980be..0b5cd8c49ccb 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -76,6 +76,14 @@ ifeq ($(CONFIG_X86_32),y)
>  
>          # temporary until string.h is fixed
>          KBUILD_CFLAGS += -ffreestanding
> +
> +	ifeq ($(CONFIG_STACKPROTECTOR),y)
> +		ifeq ($(CONFIG_SMP),y)
> +			KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
> +		else
> +			KBUILD_CFLAGS += -mstack-protector-guard=global
> +		endif
> +	endif
>  else
>          BITS := 64
>          UTS_MACHINE := x86_64
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index df8c017e6161..eb0cb662bca5 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -20,7 +20,7 @@
>   *	1C(%esp) - %ds
>   *	20(%esp) - %es
>   *	24(%esp) - %fs
> - *	28(%esp) - %gs		saved iff !CONFIG_X86_32_LAZY_GS
> + *	28(%esp) - unused -- was %gs on old stackprotector kernels
>   *	2C(%esp) - orig_eax
>   *	30(%esp) - %eip
>   *	34(%esp) - %cs
> @@ -56,14 +56,9 @@
>  /*
>   * User gs save/restore
>   *
> - * %gs is used for userland TLS and kernel only uses it for stack
> - * canary which is required to be at %gs:20 by gcc.  Read the comment
> - * at the top of stackprotector.h for more info.
> - *
> - * Local labels 98 and 99 are used.
> + * This is leftover junk from CONFIG_X86_32_LAZY_GS.  A subsequent patch
> + * will remove it entirely.
>   */
> -#ifdef CONFIG_X86_32_LAZY_GS
> -
>   /* unfortunately push/pop can't be no-op */
>  .macro PUSH_GS
>  	pushl	$0
> @@ -86,49 +81,6 @@
>  .macro SET_KERNEL_GS reg
>  .endm
>  
> -#else	/* CONFIG_X86_32_LAZY_GS */
> -
> -.macro PUSH_GS
> -	pushl	%gs
> -.endm
> -
> -.macro POP_GS pop=0
> -98:	popl	%gs
> -  .if \pop <> 0
> -	add	$\pop, %esp
> -  .endif
> -.endm
> -.macro POP_GS_EX
> -.pushsection .fixup, "ax"
> -99:	movl	$0, (%esp)
> -	jmp	98b
> -.popsection
> -	_ASM_EXTABLE(98b, 99b)
> -.endm
> -
> -.macro PTGS_TO_GS
> -98:	mov	PT_GS(%esp), %gs
> -.endm
> -.macro PTGS_TO_GS_EX
> -.pushsection .fixup, "ax"
> -99:	movl	$0, PT_GS(%esp)
> -	jmp	98b
> -.popsection
> -	_ASM_EXTABLE(98b, 99b)
> -.endm
> -
> -.macro GS_TO_REG reg
> -	movl	%gs, \reg
> -.endm
> -.macro REG_TO_PTGS reg
> -	movl	\reg, PT_GS(%esp)
> -.endm
> -.macro SET_KERNEL_GS reg
> -	movl	$(__KERNEL_STACK_CANARY), \reg
> -	movl	\reg, %gs
> -.endm
> -
> -#endif /* CONFIG_X86_32_LAZY_GS */
>  
>  /* Unconditionally switch to user cr3 */
>  .macro SWITCH_TO_USER_CR3 scratch_reg:req
> @@ -779,7 +731,7 @@ SYM_CODE_START(__switch_to_asm)
>  
>  #ifdef CONFIG_STACKPROTECTOR
>  	movl	TASK_stack_canary(%edx), %ebx
> -	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
> +	movl	%ebx, PER_CPU_VAR(__stack_chk_guard)
>  #endif
>  
>  #ifdef CONFIG_RETPOLINE
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index c20a52b5534b..c59dff4bbc38 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -441,6 +441,9 @@ struct fixed_percpu_data {
>  	 * GCC hardcodes the stack canary as %gs:40.  Since the
>  	 * irq_stack is the object at %gs:0, we reserve the bottom
>  	 * 48 bytes of the irq stack for the canary.
> +	 *
> +	 * Once we are willing to require -mstack-protector-guard-symbol=
> +	 * support for x86_64 stackprotector, we can get rid of this.
>  	 */
>  	char		gs_base[40];
>  	unsigned long	stack_canary;
> @@ -461,17 +464,7 @@ extern asmlinkage void ignore_sysret(void);
>  void current_save_fsgs(void);
>  #else	/* X86_64 */
>  #ifdef CONFIG_STACKPROTECTOR
> -/*
> - * Make sure stack canary segment base is cached-aligned:
> - *   "For Intel Atom processors, avoid non zero segment base address
> - *    that is not aligned to cache line boundary at all cost."
> - * (Optim Ref Manual Assembly/Compiler Coding Rule 15.)
> - */
> -struct stack_canary {
> -	char __pad[20];		/* canary at %gs:20 */
> -	unsigned long canary;
> -};
> -DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> +DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
>  #endif
>  /* Per CPU softirq stack pointer */
>  DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index d8324a236696..b2c4c12d237c 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -37,7 +37,10 @@ struct pt_regs {
>  	unsigned short __esh;
>  	unsigned short fs;
>  	unsigned short __fsh;
> -	/* On interrupt, gs and __gsh store the vector number. */
> +	/*
> +	 * On interrupt, gs and __gsh store the vector number.  They never
> +	 * store gs any more.
> +	 */
>  	unsigned short gs;
>  	unsigned short __gsh;
>  	/* On interrupt, this is the error code. */
> diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
> index 7fdd4facfce7..72044026eb3c 100644
> --- a/arch/x86/include/asm/segment.h
> +++ b/arch/x86/include/asm/segment.h
> @@ -95,7 +95,7 @@
>   *
>   *  26 - ESPFIX small SS
>   *  27 - per-cpu			[ offset to per-cpu data area ]
> - *  28 - stack_canary-20		[ for stack protector ]		<=== cacheline #8
> + *  28 - unused
>   *  29 - unused
>   *  30 - unused
>   *  31 - TSS for double fault handler
> @@ -118,7 +118,6 @@
>  
>  #define GDT_ENTRY_ESPFIX_SS		26
>  #define GDT_ENTRY_PERCPU		27
> -#define GDT_ENTRY_STACK_CANARY		28
>  
>  #define GDT_ENTRY_DOUBLEFAULT_TSS	31
>  
> @@ -158,12 +157,6 @@
>  # define __KERNEL_PERCPU		0
>  #endif
>  
> -#ifdef CONFIG_STACKPROTECTOR
> -# define __KERNEL_STACK_CANARY		(GDT_ENTRY_STACK_CANARY*8)
> -#else
> -# define __KERNEL_STACK_CANARY		0
> -#endif
> -
>  #else /* 64-bit: */
>  
>  #include <asm/cache.h>
> @@ -364,22 +357,15 @@ static inline void __loadsegment_fs(unsigned short value)
>  	asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
>  
>  /*
> - * x86-32 user GS accessors:
> + * x86-32 user GS accessors.  This is ugly and could do with some cleaning up.
>   */
>  #ifdef CONFIG_X86_32
> -# ifdef CONFIG_X86_32_LAZY_GS
> -#  define get_user_gs(regs)		(u16)({ unsigned long v; savesegment(gs, v); v; })
> -#  define set_user_gs(regs, v)		loadsegment(gs, (unsigned long)(v))
> -#  define task_user_gs(tsk)		((tsk)->thread.gs)
> -#  define lazy_save_gs(v)		savesegment(gs, (v))
> -#  define lazy_load_gs(v)		loadsegment(gs, (v))
> -# else	/* X86_32_LAZY_GS */
> -#  define get_user_gs(regs)		(u16)((regs)->gs)
> -#  define set_user_gs(regs, v)		do { (regs)->gs = (v); } while (0)
> -#  define task_user_gs(tsk)		(task_pt_regs(tsk)->gs)
> -#  define lazy_save_gs(v)		do { } while (0)
> -#  define lazy_load_gs(v)		do { } while (0)
> -# endif	/* X86_32_LAZY_GS */
> +# define get_user_gs(regs)		(u16)({ unsigned long v; savesegment(gs, v); v; })
> +# define set_user_gs(regs, v)		loadsegment(gs, (unsigned long)(v))
> +# define task_user_gs(tsk)		((tsk)->thread.gs)
> +# define lazy_save_gs(v)		savesegment(gs, (v))
> +# define lazy_load_gs(v)		loadsegment(gs, (v))
> +# define load_gs_index(v)		loadsegment(gs, (v))
>  #endif	/* X86_32 */
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
> index 7fb482f0f25b..b6ffe58c70fa 100644
> --- a/arch/x86/include/asm/stackprotector.h
> +++ b/arch/x86/include/asm/stackprotector.h
> @@ -5,30 +5,23 @@
>   * Stack protector works by putting predefined pattern at the start of
>   * the stack frame and verifying that it hasn't been overwritten when
>   * returning from the function.  The pattern is called stack canary
> - * and unfortunately gcc requires it to be at a fixed offset from %gs.
> - * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
> - * and x86_32 use segment registers differently and thus handles this
> - * requirement differently.
> + * and unfortunately gcc historically required it to be at a fixed offset
> + * from the percpu segment base.  On x86_64, the offset is 40 bytes.
>   *
> - * On x86_64, %gs is shared by percpu area and stack canary.  All
> - * percpu symbols are zero based and %gs points to the base of percpu
> - * area.  The first occupant of the percpu area is always
> - * fixed_percpu_data which contains stack_canary at offset 40.  Userland
> - * %gs is always saved and restored on kernel entry and exit using
> - * swapgs, so stack protector doesn't add any complexity there.
> + * The same segment is shared by percpu area and stack canary.  On
> + * x86_64, percpu symbols are zero based and %gs (64-bit) points to the
> + * base of percpu area.  The first occupant of the percpu area is always
> + * fixed_percpu_data which contains stack_canary at the approproate
> + * offset.  On x86_32, the stack canary is just a regular percpu
> + * variable.
>   *
> - * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
> - * used for userland TLS.  Unfortunately, some processors are much
> - * slower at loading segment registers with different value when
> - * entering and leaving the kernel, so the kernel uses %fs for percpu
> - * area and manages %gs lazily so that %gs is switched only when
> - * necessary, usually during task switch.
> + * Putting percpu data in %fs on 32-bit is a minor optimization compared to
> + * using %gs.  Since 32-bit userspace normally has %fs == 0, we are likely
> + * to load 0 into %fs on exit to usermode, whereas with percpu data in
> + * %gs, we are likely to load a non-null %gs on return to user mode.
>   *
> - * As gcc requires the stack canary at %gs:20, %gs can't be managed
> - * lazily if stack protector is enabled, so the kernel saves and
> - * restores userland %gs on kernel entry and exit.  This behavior is
> - * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
> - * system.h to hide the details.
> + * Once we are willing to require GCC 8.1 or better for 64-bit stackprotector
> + * support, we can remove some of this complexity.
>   */
>  
>  #ifndef _ASM_STACKPROTECTOR_H
> @@ -44,14 +37,6 @@
>  #include <linux/random.h>
>  #include <linux/sched.h>
>  
> -/*
> - * 24 byte read-only segment initializer for stack canary.  Linker
> - * can't handle the address bit shifting.  Address will be set in
> - * head_32 for boot CPU and setup_per_cpu_areas() for others.
> - */
> -#define GDT_STACK_CANARY_INIT						\
> -	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
> -
>  /*
>   * Initialize the stackprotector canary value.
>   *
> @@ -86,7 +71,7 @@ static __always_inline void boot_init_stack_canary(void)
>  #ifdef CONFIG_X86_64
>  	this_cpu_write(fixed_percpu_data.stack_canary, canary);
>  #else
> -	this_cpu_write(stack_canary.canary, canary);
> +	this_cpu_write(__stack_chk_guard, canary);
>  #endif
>  }
>  
> @@ -95,48 +80,16 @@ static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
>  #ifdef CONFIG_X86_64
>  	per_cpu(fixed_percpu_data.stack_canary, cpu) = idle->stack_canary;
>  #else
> -	per_cpu(stack_canary.canary, cpu) = idle->stack_canary;
> -#endif
> -}
> -
> -static inline void setup_stack_canary_segment(int cpu)
> -{
> -#ifdef CONFIG_X86_32
> -	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
> -	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
> -	struct desc_struct desc;
> -
> -	desc = gdt_table[GDT_ENTRY_STACK_CANARY];
> -	set_desc_base(&desc, canary);
> -	write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
> -#endif
> -}
> -
> -static inline void load_stack_canary_segment(void)
> -{
> -#ifdef CONFIG_X86_32
> -	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
> +	per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
>  #endif
>  }
>  
>  #else	/* STACKPROTECTOR */
>  
> -#define GDT_STACK_CANARY_INIT
> -
>  /* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
>  
> -static inline void setup_stack_canary_segment(int cpu)
> -{ }
> -
>  static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
>  { }
>  
> -static inline void load_stack_canary_segment(void)
> -{
> -#ifdef CONFIG_X86_32
> -	asm volatile ("mov %0, %%gs" : : "r" (0));
> -#endif
> -}
> -
>  #endif	/* STACKPROTECTOR */
>  #endif	/* _ASM_STACKPROTECTOR_H */
> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> index fdbd9d7b7bca..7b132d0312eb 100644
> --- a/arch/x86/include/asm/suspend_32.h
> +++ b/arch/x86/include/asm/suspend_32.h
> @@ -13,12 +13,10 @@
>  /* image of the saved processor state */
>  struct saved_context {
>  	/*
> -	 * On x86_32, all segment registers, with the possible exception of
> -	 * gs, are saved at kernel entry in pt_regs.
> +	 * On x86_32, all segment registers except gs are saved at kernel
> +	 * entry in pt_regs.
>  	 */
> -#ifdef CONFIG_X86_32_LAZY_GS
>  	u16 gs;
> -#endif
>  	unsigned long cr0, cr2, cr3, cr4;
>  	u64 misc_enable;
>  	bool misc_enable_saved;
> diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
> index 6e043f295a60..2b411cd00a4e 100644
> --- a/arch/x86/kernel/asm-offsets_32.c
> +++ b/arch/x86/kernel/asm-offsets_32.c
> @@ -53,11 +53,6 @@ void foo(void)
>  	       offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
>  	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
>  
> -#ifdef CONFIG_STACKPROTECTOR
> -	BLANK();
> -	OFFSET(stack_canary_offset, stack_canary, canary);
> -#endif
> -
>  	BLANK();
>  	DEFINE(EFI_svam, offsetof(efi_runtime_services_t, set_virtual_address_map));
>  }
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 35ad8480c464..f208569d2d3b 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -161,7 +161,6 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
>  
>  	[GDT_ENTRY_ESPFIX_SS]		= GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
>  	[GDT_ENTRY_PERCPU]		= GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
> -	GDT_STACK_CANARY_INIT
>  #endif
>  } };
>  EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
> @@ -599,7 +598,6 @@ void load_percpu_segment(int cpu)
>  	__loadsegment_simple(gs, 0);
>  	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>  #endif
> -	load_stack_canary_segment();
>  }
>  
>  #ifdef CONFIG_X86_32
> @@ -1793,7 +1791,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
>  EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
>  
>  #ifdef CONFIG_STACKPROTECTOR
> -DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
> +DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
> +EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
>  #endif
>  
>  #endif	/* CONFIG_X86_64 */
> diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
> index 759d392cbe9f..d1d49e3d536b 100644
> --- a/arch/x86/kernel/doublefault_32.c
> +++ b/arch/x86/kernel/doublefault_32.c
> @@ -100,9 +100,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
>  		.ss		= __KERNEL_DS,
>  		.ds		= __USER_DS,
>  		.fs		= __KERNEL_PERCPU,
> -#ifndef CONFIG_X86_32_LAZY_GS
> -		.gs		= __KERNEL_STACK_CANARY,
> -#endif
> +		.gs		= 0,
>  
>  		.__cr3		= __pa_nodebug(swapper_pg_dir),
>  	},
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 7ed84c282233..67f590425d90 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -318,8 +318,8 @@ SYM_FUNC_START(startup_32_smp)
>  	movl $(__KERNEL_PERCPU), %eax
>  	movl %eax,%fs			# set this cpu's percpu
>  
> -	movl $(__KERNEL_STACK_CANARY),%eax
> -	movl %eax,%gs
> +	xorl %eax,%eax
> +	movl %eax,%gs			# clear possible garbage in %gs
>  
>  	xorl %eax,%eax			# Clear LDT
>  	lldt %ax
> @@ -339,20 +339,6 @@ SYM_FUNC_END(startup_32_smp)
>   */
>  __INIT
>  setup_once:
> -#ifdef CONFIG_STACKPROTECTOR
> -	/*
> -	 * Configure the stack canary. The linker can't handle this by
> -	 * relocation.  Manually set base address in stack canary
> -	 * segment descriptor.
> -	 */
> -	movl $gdt_page,%eax
> -	movl $stack_canary,%ecx
> -	movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
> -	shrl $16, %ecx
> -	movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
> -	movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
> -#endif
> -
>  	andl $0,setup_once_ref	/* Once is enough, thanks */
>  	ret
>  
> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
> index fd945ce78554..0941d2f44f2a 100644
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -224,7 +224,6 @@ void __init setup_per_cpu_areas(void)
>  		per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
>  		per_cpu(cpu_number, cpu) = cpu;
>  		setup_percpu_segment(cpu);
> -		setup_stack_canary_segment(cpu);
>  		/*
>  		 * Copy data used in early init routines from the
>  		 * initial arrays to the per cpu data areas.  These
> diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
> index 64a496a0687f..3c883e064242 100644
> --- a/arch/x86/kernel/tls.c
> +++ b/arch/x86/kernel/tls.c
> @@ -164,17 +164,11 @@ int do_set_thread_area(struct task_struct *p, int idx,
>  		savesegment(fs, sel);
>  		if (sel == modified_sel)
>  			loadsegment(fs, sel);
> -
> -		savesegment(gs, sel);
> -		if (sel == modified_sel)
> -			load_gs_index(sel);
>  #endif
>  
> -#ifdef CONFIG_X86_32_LAZY_GS
>  		savesegment(gs, sel);
>  		if (sel == modified_sel)
> -			loadsegment(gs, sel);
> -#endif
> +			load_gs_index(sel);
>  	} else {
>  #ifdef CONFIG_X86_64
>  		if (p->thread.fsindex == modified_sel)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f923e14e87df..ec39073b4897 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1467,12 +1467,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>  #ifdef CONFIG_X86_64
>  		loadsegment(fs, svm->host.fs);
>  		wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> -		load_gs_index(svm->host.gs);
> -#else
> -#ifdef CONFIG_X86_32_LAZY_GS
> -		loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
> +		load_gs_index(svm->host.gs);
>  
>  		for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
>  			wrmsrl(host_save_user_msrs[i].index,
> @@ -3705,13 +3701,11 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  	} else {
>  		__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
>  
> +		/* Restore the percpu segment immediately. */
>  #ifdef CONFIG_X86_64
>  		native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>  #else
>  		loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> -		loadsegment(gs, svm->host.gs);
> -#endif
>  #endif
>  	}
>  
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 4229950a5d78..7f89a091f1fb 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -404,10 +404,6 @@ static short get_segment_selector(struct pt_regs *regs, int seg_reg_idx)
>  	case INAT_SEG_REG_FS:
>  		return (unsigned short)(regs->fs & 0xffff);
>  	case INAT_SEG_REG_GS:
> -		/*
> -		 * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
> -		 * The macro below takes care of both cases.
> -		 */
>  		return get_user_gs(regs);
>  	case INAT_SEG_REG_IGNORE:
>  	default:
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index 43b4d864817e..afbf0bb252da 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -45,10 +45,8 @@
>  
>  #define PVH_GDT_ENTRY_CS	1
>  #define PVH_GDT_ENTRY_DS	2
> -#define PVH_GDT_ENTRY_CANARY	3
>  #define PVH_CS_SEL		(PVH_GDT_ENTRY_CS * 8)
>  #define PVH_DS_SEL		(PVH_GDT_ENTRY_DS * 8)
> -#define PVH_CANARY_SEL		(PVH_GDT_ENTRY_CANARY * 8)
>  
>  SYM_CODE_START_LOCAL(pvh_start_xen)
>  	cld
> @@ -109,17 +107,6 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>  
>  #else /* CONFIG_X86_64 */
>  
> -	/* Set base address in stack canary descriptor. */
> -	movl $_pa(gdt_start),%eax
> -	movl $_pa(canary),%ecx
> -	movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
> -	shrl $16, %ecx
> -	movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
> -	movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
> -
> -	mov $PVH_CANARY_SEL,%eax
> -	mov %eax,%gs
> -
>  	call mk_early_pgtbl_32
>  
>  	mov $_pa(initial_page_table), %eax
> @@ -163,7 +150,6 @@ SYM_DATA_START_LOCAL(gdt_start)
>  	.quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
>  #endif
>  	.quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
> -	.quad GDT_ENTRY(0x4090, 0, 0x18)    /* PVH_CANARY_SEL */
>  SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
>  
>  	.balign 16
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index db1378c6ff26..ef4329d67a5f 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -99,11 +99,8 @@ static void __save_processor_state(struct saved_context *ctxt)
>  	/*
>  	 * segment registers
>  	 */
> -#ifdef CONFIG_X86_32_LAZY_GS
>  	savesegment(gs, ctxt->gs);
> -#endif
>  #ifdef CONFIG_X86_64
> -	savesegment(gs, ctxt->gs);
>  	savesegment(fs, ctxt->fs);
>  	savesegment(ds, ctxt->ds);
>  	savesegment(es, ctxt->es);
> @@ -232,7 +229,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>  	wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
>  #else
>  	loadsegment(fs, __KERNEL_PERCPU);
> -	loadsegment(gs, __KERNEL_STACK_CANARY);
>  #endif
>  
>  	/* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
> @@ -255,7 +251,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>  	 */
>  	wrmsrl(MSR_FS_BASE, ctxt->fs_base);
>  	wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
> -#elif defined(CONFIG_X86_32_LAZY_GS)
> +#else
>  	loadsegment(gs, ctxt->gs);
>  #endif
>  
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 9a5a50cdaab5..e18235a6390d 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1190,7 +1190,6 @@ static void __init xen_setup_gdt(int cpu)
>  	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
>  	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
>  
> -	setup_stack_canary_segment(cpu);
>  	switch_to_new_gdt(cpu);
>  
>  	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
> diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
> index f5c119495254..825c75c5b715 100755
> --- a/scripts/gcc-x86_32-has-stack-protector.sh
> +++ b/scripts/gcc-x86_32-has-stack-protector.sh
> @@ -1,4 +1,8 @@
>  #!/bin/sh
>  # SPDX-License-Identifier: GPL-2.0
>  
> -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
> +# This requires GCC 8.1 or better.  Specifically, we require
> +# -mstack-protector-guard-reg, added by
> +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
> +
> +echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
> -- 
> 2.29.2
> 
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-09-29 13:56   ` [PATCH v2 1/2] " Andy Shevchenko
@ 2022-09-29 14:20     ` Andy Shevchenko
  2022-09-30 20:30       ` Ferry Toth
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Shevchenko @ 2022-09-29 14:20 UTC (permalink / raw)
  To: Andy Lutomirski, Ferry Toth
  Cc: x86, LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel

On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
> +Cc: Ferry
> 
> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
> > On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> > stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> > percpu storage.  It's even nastier because it means that whether %gs
> > contains userspace state or kernel state while running kernel code
> > depends on whether stackprotector is enabled (this is
> > CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> > that segment selectors work.  Supporting both variants is a
> > maintenance and testing mess.
> > 
> > Merely rearranging so that percpu and the stack canary
> > share the same segment would be messy as the 32-bit percpu address
> > layout isn't currently compatible with putting a variable at a fixed
> > offset.
> > 
> > Fortunately, GCC 8.1 added options that allow the stack canary to be
> > accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
> > percpu variable.  This lets us get rid of all of the code to manage the
> > stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> > 
> > (That name is special.  We could use any symbol we want for the
> >  %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
> >  name other than __stack_chk_guard.)
> > 
> > This patch forcibly disables stackprotector on older compilers that
> > don't support the new options and makes the stack canary into a
> > percpu variable.  The "lazy GS" approach is now used for all 32-bit
> > configurations.
> > 
> > This patch also makes load_gs_index() work on 32-bit kernels.  On
> > 64-bit kernels, it loads the GS selector and updates the user
> > GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> > it loads the GS selector and updates GSBASE, which is now
> > always the user base.  This means that the overall effect is
> > the same on 32-bit and 64-bit, which avoids some ifdeffery.
> 
> This patch broke 32-bit boot on Intel Merrifield
> 
> git bisect start
> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.13
> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag 'regulator-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26] drm/amd/display: Change input parameter for set_drr
> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda: fix macroblocks count control usage
> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve a semantic conflict
> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d] x86/tools/insn_decoder_test: Convert to insn_decode()
> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d] x86/boot/compressed/sev-es: Convert to insn_decode()
> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn: Rename insn_decode() to insn_decode_from_regs()
> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32: Remove leftover macros after stackprotector cleanups
> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023] x86/stackprotector/32: Make the canary into a regular percpu variable
> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023] x86/stackprotector/32: Make the canary into a regular percpu variable
> 
> Any suggestions how to fix are welcome!
> 
> Configuration is based on in-tree i386_defconfig with some drivers enabled
> on top (no core stuff was altered, but if you wish to check, it's here:
> https://github.com/andy-shev/linux/blob/eds-acpi/arch/x86/configs/i386_defconfig).

For the record (and preventing some questions) the v6.0-rc7 still has this issue.

I can't test reverts, because it's huge pile of changes in that area happened
for the last year or so.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-09-29 14:20     ` Andy Shevchenko
@ 2022-09-30 20:30       ` Ferry Toth
  2022-09-30 21:18         ` Ferry Toth
  0 siblings, 1 reply; 24+ messages in thread
From: Ferry Toth @ 2022-09-30 20:30 UTC (permalink / raw)
  To: Andy Shevchenko, Andy Lutomirski
  Cc: x86, LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel

Hi,

Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
> On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
>> +Cc: Ferry
>>
>> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
>>> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
>>> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
>>> percpu storage.  It's even nastier because it means that whether %gs
>>> contains userspace state or kernel state while running kernel code
>>> depends on whether stackprotector is enabled (this is
>>> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
>>> that segment selectors work.  Supporting both variants is a
>>> maintenance and testing mess.
>>>
>>> Merely rearranging so that percpu and the stack canary
>>> share the same segment would be messy as the 32-bit percpu address
>>> layout isn't currently compatible with putting a variable at a fixed
>>> offset.
>>>
>>> Fortunately, GCC 8.1 added options that allow the stack canary to be
>>> accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
>>> percpu variable.  This lets us get rid of all of the code to manage the
>>> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>>>
>>> (That name is special.  We could use any symbol we want for the
>>>   %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
>>>   name other than __stack_chk_guard.)
>>>
>>> This patch forcibly disables stackprotector on older compilers that
>>> don't support the new options and makes the stack canary into a
>>> percpu variable.  The "lazy GS" approach is now used for all 32-bit
>>> configurations.
>>>
>>> This patch also makes load_gs_index() work on 32-bit kernels.  On
>>> 64-bit kernels, it loads the GS selector and updates the user
>>> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
>>> it loads the GS selector and updates GSBASE, which is now
>>> always the user base.  This means that the overall effect is
>>> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>> This patch broke 32-bit boot on Intel Merrifield
>>
>> git bisect start
>> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
>> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
>> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
>> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
>> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.13
>> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
>> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag 'regulator-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
>> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
>> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
>> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26] drm/amd/display: Change input parameter for set_drr
>> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
>> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda: fix macroblocks count control usage
>> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
>> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
>> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
>> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
>> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
>> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
>> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve a semantic conflict
>> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
>> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d] x86/tools/insn_decoder_test: Convert to insn_decode()
>> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
>> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d] x86/boot/compressed/sev-es: Convert to insn_decode()
>> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
>> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn: Rename insn_decode() to insn_decode_from_regs()
>> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
>> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32: Remove leftover macros after stackprotector cleanups
>> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
>> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023] x86/stackprotector/32: Make the canary into a regular percpu variable
>> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
>> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023] x86/stackprotector/32: Make the canary into a regular percpu variable

With the bad commit the last words in dmesg are:

mem auto-init: stack:off, heap alloc:off, heap free:off
Initializing HighMem for node 0 (00036ffe:0003f500)
Initializing Movable for node 0 (00000000:00000000)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata, 
4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved, 
136200K highmem)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
trace event string verifier disabled
Dynamic Preempt: voluntary
rcu: Preemptible hierarchical RCU implementation.
rcu:     RCU event tracing is enabled.
rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
  Trampoline variant of Tasks RCU enabled.
  Tracing variant of Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0

without the bad commit dmesg continues:

random: get_random_bytes called from start_kernel+0x492/0x65a with 
crng_init=0
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [uart0] disabled

....

>> Any suggestions how to fix are welcome!
>>
>> Configuration is based on in-tree i386_defconfig with some drivers enabled
>> on top (no core stuff was altered, but if you wish to check, it's here:
>> https://github.com/andy-shev/linux/blob/eds-acpi/arch/x86/configs/i386_defconfig).
> For the record (and preventing some questions) the v6.0-rc7 still has this issue.
>
> I can't test reverts, because it's huge pile of changes in that area happened
> for the last year or so.
>
I just tested this by reverting 3fb0fdb3 "x86/stackprotector/32: Make 
the canary into a regular percpu variable" and it's prerequisite 
d0962f2b "x86/entry/32: Remove leftover macros after stackprotector 
cleanups" on top of v5.13 and indeed this resolves the boot issue.

I can also confirm the 2 reverts will not apply on top of v6.0-rc7.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-09-30 20:30       ` Ferry Toth
@ 2022-09-30 21:18         ` Ferry Toth
  2022-11-09 14:50           ` Andy Shevchenko
  0 siblings, 1 reply; 24+ messages in thread
From: Ferry Toth @ 2022-09-30 21:18 UTC (permalink / raw)
  To: Andy Shevchenko, Andy Lutomirski
  Cc: x86, LKML, Sedat Dilek, Nick Desaulniers, Sean Christopherson,
	Brian Gerst, Joerg Roedel

Hi,

Op 30-09-2022 om 22:30 schreef Ferry Toth:
> Hi,
>
> Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
>> On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
>>> +Cc: Ferry
>>>
>>> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
>>>> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
>>>> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
>>>> percpu storage.  It's even nastier because it means that whether %gs
>>>> contains userspace state or kernel state while running kernel code
>>>> depends on whether stackprotector is enabled (this is
>>>> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
>>>> that segment selectors work.  Supporting both variants is a
>>>> maintenance and testing mess.
>>>>
>>>> Merely rearranging so that percpu and the stack canary
>>>> share the same segment would be messy as the 32-bit percpu address
>>>> layout isn't currently compatible with putting a variable at a fixed
>>>> offset.
>>>>
>>>> Fortunately, GCC 8.1 added options that allow the stack canary to be
>>>> accessed as %fs:__stack_chk_guard, effectively turning it into an 
>>>> ordinary
>>>> percpu variable.  This lets us get rid of all of the code to manage 
>>>> the
>>>> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>>>>
>>>> (That name is special.  We could use any symbol we want for the
>>>>   %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us 
>>>> use any
>>>>   name other than __stack_chk_guard.)
>>>>
>>>> This patch forcibly disables stackprotector on older compilers that
>>>> don't support the new options and makes the stack canary into a
>>>> percpu variable.  The "lazy GS" approach is now used for all 32-bit
>>>> configurations.
>>>>
>>>> This patch also makes load_gs_index() work on 32-bit kernels.  On
>>>> 64-bit kernels, it loads the GS selector and updates the user
>>>> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
>>>> it loads the GS selector and updates GSBASE, which is now
>>>> always the user base.  This means that the overall effect is
>>>> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>>> This patch broke 32-bit boot on Intel Merrifield
>>>
>>> git bisect start
>>> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
>>> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
>>> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
>>> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
>>> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch 
>>> 'md-fixes' of 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.13
>>> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
>>> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag 
>>> 'regulator-v5.13' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>>> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
>>> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 
>>> 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
>>> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
>>> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26] drm/amd/display: 
>>> Change input parameter for set_drr
>>> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
>>> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda: fix 
>>> macroblocks count control usage
>>> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
>>> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag 
>>> 'x86_core_for_v5.13' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
>>> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch 
>>> 'work.inode-type-fixes' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
>>> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
>>> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag 
>>> 'afs-netfs-lib-20210426' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
>>> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
>>> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch 
>>> 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve a 
>>> semantic conflict
>>> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
>>> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d] 
>>> x86/tools/insn_decoder_test: Convert to insn_decode()
>>> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
>>> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d] 
>>> x86/boot/compressed/sev-es: Convert to insn_decode()
>>> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
>>> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn: Rename 
>>> insn_decode() to insn_decode_from_regs()
>>> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
>>> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32: 
>>> Remove leftover macros after stackprotector cleanups
>>> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
>>> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023] 
>>> x86/stackprotector/32: Make the canary into a regular percpu variable
>>> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
>>> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023] 
>>> x86/stackprotector/32: Make the canary into a regular percpu variable
>
> With the bad commit the last words in dmesg are:
>
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Initializing HighMem for node 0 (00036ffe:0003f500)
> Initializing Movable for node 0 (00000000:00000000)
> Checking if this processor honours the WP bit even in supervisor 
> mode...Ok.
> Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata, 
> 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved, 
> 136200K highmem)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> trace event string verifier disabled
> Dynamic Preempt: voluntary
> rcu: Preemptible hierarchical RCU implementation.
> rcu:     RCU event tracing is enabled.
> rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
>  Trampoline variant of Tasks RCU enabled.
>  Tracing variant of Tasks RCU enabled.
> rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
>
> without the bad commit dmesg continues:
>
> random: get_random_bytes called from start_kernel+0x492/0x65a with 
> crng_init=0
> Console: colour dummy device 80x25
> printk: console [tty0] enabled
> printk: bootconsole [uart0] disabled
>
> ....
>
>>> Any suggestions how to fix are welcome!
>>>
Interesting. I added the following fragment to the kernel config:

# CONFIG_STACKPROTECTOR is not set

And this resolves the boot issue (tested with v5.17 i686 on Intel 
Merrifield)

>>> Configuration is based on in-tree i386_defconfig with some drivers 
>>> enabled
>>> on top (no core stuff was altered, but if you wish to check, it's here:
>>> https://github.com/andy-shev/linux/blob/eds-acpi/arch/x86/configs/i386_defconfig). 
>>>
>> For the record (and preventing some questions) the v6.0-rc7 still has 
>> this issue.
>>
>> I can't test reverts, because it's huge pile of changes in that area 
>> happened
>> for the last year or so.
>>
> I just tested this by reverting 3fb0fdb3 "x86/stackprotector/32: Make 
> the canary into a regular percpu variable" and it's prerequisite 
> d0962f2b "x86/entry/32: Remove leftover macros after stackprotector 
> cleanups" on top of v5.13 and indeed this resolves the boot issue.
>
> I can also confirm the 2 reverts will not apply on top of v6.0-rc7.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-09-30 21:18         ` Ferry Toth
@ 2022-11-09 14:50           ` Andy Shevchenko
  2022-11-09 22:33             ` Brian Gerst
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Shevchenko @ 2022-11-09 14:50 UTC (permalink / raw)
  To: Ferry Toth, Dave Hansen
  Cc: Andy Lutomirski, x86, LKML, Sedat Dilek, Nick Desaulniers,
	Sean Christopherson, Brian Gerst, Joerg Roedel

On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
> Op 30-09-2022 om 22:30 schreef Ferry Toth:
> > Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
> > > On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
> > > > On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
> > > > > On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> > > > > stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> > > > > percpu storage.  It's even nastier because it means that whether %gs
> > > > > contains userspace state or kernel state while running kernel code
> > > > > depends on whether stackprotector is enabled (this is
> > > > > CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> > > > > that segment selectors work.  Supporting both variants is a
> > > > > maintenance and testing mess.
> > > > > 
> > > > > Merely rearranging so that percpu and the stack canary
> > > > > share the same segment would be messy as the 32-bit percpu address
> > > > > layout isn't currently compatible with putting a variable at a fixed
> > > > > offset.
> > > > > 
> > > > > Fortunately, GCC 8.1 added options that allow the stack canary to be
> > > > > accessed as %fs:__stack_chk_guard, effectively turning it
> > > > > into an ordinary
> > > > > percpu variable.  This lets us get rid of all of the code to
> > > > > manage the
> > > > > stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> > > > > 
> > > > > (That name is special.  We could use any symbol we want for the
> > > > >   %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
> > > > > let us use any
> > > > >   name other than __stack_chk_guard.)
> > > > > 
> > > > > This patch forcibly disables stackprotector on older compilers that
> > > > > don't support the new options and makes the stack canary into a
> > > > > percpu variable.  The "lazy GS" approach is now used for all 32-bit
> > > > > configurations.
> > > > > 
> > > > > This patch also makes load_gs_index() work on 32-bit kernels.  On
> > > > > 64-bit kernels, it loads the GS selector and updates the user
> > > > > GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> > > > > it loads the GS selector and updates GSBASE, which is now
> > > > > always the user base.  This means that the overall effect is
> > > > > the same on 32-bit and 64-bit, which avoids some ifdeffery.
> > > > This patch broke 32-bit boot on Intel Merrifield
> > > > 
> > > > git bisect start
> > > > # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
> > > > git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> > > > # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
> > > > git bisect bad 62fb9874f5da54fdb243003b386128037319b219
> > > > # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
> > > > 'md-fixes' of
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
> > > > block-5.13
> > > > git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
> > > > # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
> > > > 'regulator-v5.13' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
> > > > git bisect good ca62e9090d229926f43f20291bb44d67897baab7
> > > > # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
> > > > 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
> > > > git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
> > > > # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
> > > > drm/amd/display: Change input parameter for set_drr
> > > > git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
> > > > # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
> > > > fix macroblocks count control usage
> > > > git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
> > > > # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
> > > > 'x86_core_for_v5.13' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > > > git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
> > > > # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
> > > > 'work.inode-type-fixes' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> > > > git bisect good d1466bc583a81830cef2399a4b8a514398351b40
> > > > # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
> > > > 'afs-netfs-lib-20210426' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> > > > git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
> > > > # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
> > > > 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
> > > > a semantic conflict
> > > > git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
> > > > # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
> > > > x86/tools/insn_decoder_test: Convert to insn_decode()
> > > > git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
> > > > # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
> > > > x86/boot/compressed/sev-es: Convert to insn_decode()
> > > > git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
> > > > # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
> > > > Rename insn_decode() to insn_decode_from_regs()
> > > > git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
> > > > # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
> > > > Remove leftover macros after stackprotector cleanups
> > > > git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
> > > > # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> > > > x86/stackprotector/32: Make the canary into a regular percpu
> > > > variable
> > > > git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
> > > > # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> > > > x86/stackprotector/32: Make the canary into a regular percpu
> > > > variable
> > 
> > With the bad commit the last words in dmesg are:
> > 
> > mem auto-init: stack:off, heap alloc:off, heap free:off
> > Initializing HighMem for node 0 (00036ffe:0003f500)
> > Initializing Movable for node 0 (00000000:00000000)
> > Checking if this processor honours the WP bit even in supervisor
> > mode...Ok.
> > Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata,
> > 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved,
> > 136200K highmem)
> > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> > trace event string verifier disabled
> > Dynamic Preempt: voluntary
> > rcu: Preemptible hierarchical RCU implementation.
> > rcu:     RCU event tracing is enabled.
> > rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
> >  Trampoline variant of Tasks RCU enabled.
> >  Tracing variant of Tasks RCU enabled.
> > rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> > rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> > NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
> > 
> > without the bad commit dmesg continues:
> > 
> > random: get_random_bytes called from start_kernel+0x492/0x65a with
> > crng_init=0
> > Console: colour dummy device 80x25
> > printk: console [tty0] enabled
> > printk: bootconsole [uart0] disabled
> > 
> > ....
> > 
> > > > Any suggestions how to fix are welcome!
> > > > 
> Interesting. I added the following fragment to the kernel config:
> 
> # CONFIG_STACKPROTECTOR is not set
> 
> And this resolves the boot issue (tested with v5.17 i686 on Intel
> Merrifield)

I'm not sure that's the correct approach.
Any answer from the Andy Lutomirski?

And in general to x86 maintainers, do we support all features on x86 32-bit? If
no, can it be said explicitly, please?

> > > > Configuration is based on in-tree i386_defconfig with some
> > > > drivers enabled
> > > > on top (no core stuff was altered, but if you wish to check, it's here:
> > > > https://github.com/andy-shev/linux/blob/eds-acpi/arch/x86/configs/i386_defconfig).
> > > > 
> > > For the record (and preventing some questions) the v6.0-rc7 still
> > > has this issue.
> > > 
> > > I can't test reverts, because it's huge pile of changes in that area
> > > happened
> > > for the last year or so.
> > > 
> > I just tested this by reverting 3fb0fdb3 "x86/stackprotector/32: Make
> > the canary into a regular percpu variable" and it's prerequisite
> > d0962f2b "x86/entry/32: Remove leftover macros after stackprotector
> > cleanups" on top of v5.13 and indeed this resolves the boot issue.
> > 
> > I can also confirm the 2 reverts will not apply on top of v6.0-rc7.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-11-09 14:50           ` Andy Shevchenko
@ 2022-11-09 22:33             ` Brian Gerst
  2022-11-10 14:02               ` Andy Shevchenko
  2022-11-10 19:36               ` Ferry Toth
  0 siblings, 2 replies; 24+ messages in thread
From: Brian Gerst @ 2022-11-09 22:33 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Ferry Toth, Dave Hansen, Andy Lutomirski, x86, LKML, Sedat Dilek,
	Nick Desaulniers, Sean Christopherson, Joerg Roedel

On Wed, Nov 9, 2022 at 9:50 AM Andy Shevchenko
<andriy.shevchenko@intel.com> wrote:
>
> On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
> > Op 30-09-2022 om 22:30 schreef Ferry Toth:
> > > Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
> > > > On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
> > > > > On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
> > > > > > On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> > > > > > stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> > > > > > percpu storage.  It's even nastier because it means that whether %gs
> > > > > > contains userspace state or kernel state while running kernel code
> > > > > > depends on whether stackprotector is enabled (this is
> > > > > > CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> > > > > > that segment selectors work.  Supporting both variants is a
> > > > > > maintenance and testing mess.
> > > > > >
> > > > > > Merely rearranging so that percpu and the stack canary
> > > > > > share the same segment would be messy as the 32-bit percpu address
> > > > > > layout isn't currently compatible with putting a variable at a fixed
> > > > > > offset.
> > > > > >
> > > > > > Fortunately, GCC 8.1 added options that allow the stack canary to be
> > > > > > accessed as %fs:__stack_chk_guard, effectively turning it
> > > > > > into an ordinary
> > > > > > percpu variable.  This lets us get rid of all of the code to
> > > > > > manage the
> > > > > > stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> > > > > >
> > > > > > (That name is special.  We could use any symbol we want for the
> > > > > >   %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
> > > > > > let us use any
> > > > > >   name other than __stack_chk_guard.)
> > > > > >
> > > > > > This patch forcibly disables stackprotector on older compilers that
> > > > > > don't support the new options and makes the stack canary into a
> > > > > > percpu variable.  The "lazy GS" approach is now used for all 32-bit
> > > > > > configurations.
> > > > > >
> > > > > > This patch also makes load_gs_index() work on 32-bit kernels.  On
> > > > > > 64-bit kernels, it loads the GS selector and updates the user
> > > > > > GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> > > > > > it loads the GS selector and updates GSBASE, which is now
> > > > > > always the user base.  This means that the overall effect is
> > > > > > the same on 32-bit and 64-bit, which avoids some ifdeffery.
> > > > > This patch broke 32-bit boot on Intel Merrifield
> > > > >
> > > > > git bisect start
> > > > > # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
> > > > > git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> > > > > # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
> > > > > git bisect bad 62fb9874f5da54fdb243003b386128037319b219
> > > > > # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
> > > > > 'md-fixes' of
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
> > > > > block-5.13
> > > > > git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
> > > > > # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
> > > > > 'regulator-v5.13' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
> > > > > git bisect good ca62e9090d229926f43f20291bb44d67897baab7
> > > > > # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
> > > > > 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
> > > > > git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
> > > > > # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
> > > > > drm/amd/display: Change input parameter for set_drr
> > > > > git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
> > > > > # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
> > > > > fix macroblocks count control usage
> > > > > git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
> > > > > # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
> > > > > 'x86_core_for_v5.13' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > > > > git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
> > > > > # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
> > > > > 'work.inode-type-fixes' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> > > > > git bisect good d1466bc583a81830cef2399a4b8a514398351b40
> > > > > # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
> > > > > 'afs-netfs-lib-20210426' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> > > > > git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
> > > > > # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
> > > > > 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
> > > > > a semantic conflict
> > > > > git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
> > > > > # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
> > > > > x86/tools/insn_decoder_test: Convert to insn_decode()
> > > > > git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
> > > > > # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
> > > > > x86/boot/compressed/sev-es: Convert to insn_decode()
> > > > > git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
> > > > > # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
> > > > > Rename insn_decode() to insn_decode_from_regs()
> > > > > git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
> > > > > # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
> > > > > Remove leftover macros after stackprotector cleanups
> > > > > git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
> > > > > # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> > > > > x86/stackprotector/32: Make the canary into a regular percpu
> > > > > variable
> > > > > git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
> > > > > # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> > > > > x86/stackprotector/32: Make the canary into a regular percpu
> > > > > variable
> > >
> > > With the bad commit the last words in dmesg are:
> > >
> > > mem auto-init: stack:off, heap alloc:off, heap free:off
> > > Initializing HighMem for node 0 (00036ffe:0003f500)
> > > Initializing Movable for node 0 (00000000:00000000)
> > > Checking if this processor honours the WP bit even in supervisor
> > > mode...Ok.
> > > Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata,
> > > 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved,
> > > 136200K highmem)
> > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> > > trace event string verifier disabled
> > > Dynamic Preempt: voluntary
> > > rcu: Preemptible hierarchical RCU implementation.
> > > rcu:     RCU event tracing is enabled.
> > > rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
> > >  Trampoline variant of Tasks RCU enabled.
> > >  Tracing variant of Tasks RCU enabled.
> > > rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> > > rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> > > NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
> > >
> > > without the bad commit dmesg continues:
> > >
> > > random: get_random_bytes called from start_kernel+0x492/0x65a with
> > > crng_init=0
> > > Console: colour dummy device 80x25
> > > printk: console [tty0] enabled
> > > printk: bootconsole [uart0] disabled
> > >
> > > ....
> > >
> > > > > Any suggestions how to fix are welcome!
> > > > >
> > Interesting. I added the following fragment to the kernel config:
> >
> > # CONFIG_STACKPROTECTOR is not set
> >
> > And this resolves the boot issue (tested with v5.17 i686 on Intel
> > Merrifield)
>
> I'm not sure that's the correct approach.
> Any answer from the Andy Lutomirski?
>
> And in general to x86 maintainers, do we support all features on x86 32-bit? If
> no, can it be said explicitly, please?

What compiler version are you using?

--
Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-11-09 22:33             ` Brian Gerst
@ 2022-11-10 14:02               ` Andy Shevchenko
  2022-11-10 19:36               ` Ferry Toth
  1 sibling, 0 replies; 24+ messages in thread
From: Andy Shevchenko @ 2022-11-10 14:02 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Ferry Toth, Dave Hansen, Andy Lutomirski, x86, LKML, Sedat Dilek,
	Nick Desaulniers, Sean Christopherson, Joerg Roedel

On Wed, Nov 09, 2022 at 05:33:24PM -0500, Brian Gerst wrote:
> On Wed, Nov 9, 2022 at 9:50 AM Andy Shevchenko
> <andriy.shevchenko@intel.com> wrote:
> > On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
> > > Op 30-09-2022 om 22:30 schreef Ferry Toth:
> > > > Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
> > > > > On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
> > > > > > On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
> > > > > > > On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> > > > > > > stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> > > > > > > percpu storage.  It's even nastier because it means that whether %gs
> > > > > > > contains userspace state or kernel state while running kernel code
> > > > > > > depends on whether stackprotector is enabled (this is
> > > > > > > CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> > > > > > > that segment selectors work.  Supporting both variants is a
> > > > > > > maintenance and testing mess.
> > > > > > >
> > > > > > > Merely rearranging so that percpu and the stack canary
> > > > > > > share the same segment would be messy as the 32-bit percpu address
> > > > > > > layout isn't currently compatible with putting a variable at a fixed
> > > > > > > offset.
> > > > > > >
> > > > > > > Fortunately, GCC 8.1 added options that allow the stack canary to be
> > > > > > > accessed as %fs:__stack_chk_guard, effectively turning it
> > > > > > > into an ordinary
> > > > > > > percpu variable.  This lets us get rid of all of the code to
> > > > > > > manage the
> > > > > > > stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> > > > > > >
> > > > > > > (That name is special.  We could use any symbol we want for the
> > > > > > >   %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
> > > > > > > let us use any
> > > > > > >   name other than __stack_chk_guard.)
> > > > > > >
> > > > > > > This patch forcibly disables stackprotector on older compilers that
> > > > > > > don't support the new options and makes the stack canary into a
> > > > > > > percpu variable.  The "lazy GS" approach is now used for all 32-bit
> > > > > > > configurations.
> > > > > > >
> > > > > > > This patch also makes load_gs_index() work on 32-bit kernels.  On
> > > > > > > 64-bit kernels, it loads the GS selector and updates the user
> > > > > > > GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> > > > > > > it loads the GS selector and updates GSBASE, which is now
> > > > > > > always the user base.  This means that the overall effect is
> > > > > > > the same on 32-bit and 64-bit, which avoids some ifdeffery.
> > > > > > This patch broke 32-bit boot on Intel Merrifield
> > > > > >
> > > > > > git bisect start
> > > > > > # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
> > > > > > git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> > > > > > # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
> > > > > > git bisect bad 62fb9874f5da54fdb243003b386128037319b219
> > > > > > # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
> > > > > > 'md-fixes' of
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
> > > > > > block-5.13
> > > > > > git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
> > > > > > # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
> > > > > > 'regulator-v5.13' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
> > > > > > git bisect good ca62e9090d229926f43f20291bb44d67897baab7
> > > > > > # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
> > > > > > 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
> > > > > > git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
> > > > > > # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
> > > > > > drm/amd/display: Change input parameter for set_drr
> > > > > > git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
> > > > > > # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
> > > > > > fix macroblocks count control usage
> > > > > > git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
> > > > > > # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
> > > > > > 'x86_core_for_v5.13' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > > > > > git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
> > > > > > # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
> > > > > > 'work.inode-type-fixes' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> > > > > > git bisect good d1466bc583a81830cef2399a4b8a514398351b40
> > > > > > # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
> > > > > > 'afs-netfs-lib-20210426' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> > > > > > git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
> > > > > > # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
> > > > > > 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
> > > > > > a semantic conflict
> > > > > > git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
> > > > > > # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
> > > > > > x86/tools/insn_decoder_test: Convert to insn_decode()
> > > > > > git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
> > > > > > # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
> > > > > > x86/boot/compressed/sev-es: Convert to insn_decode()
> > > > > > git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
> > > > > > # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
> > > > > > Rename insn_decode() to insn_decode_from_regs()
> > > > > > git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
> > > > > > # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
> > > > > > Remove leftover macros after stackprotector cleanups
> > > > > > git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
> > > > > > # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> > > > > > x86/stackprotector/32: Make the canary into a regular percpu
> > > > > > variable
> > > > > > git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
> > > > > > # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> > > > > > x86/stackprotector/32: Make the canary into a regular percpu
> > > > > > variable
> > > >
> > > > With the bad commit the last words in dmesg are:
> > > >
> > > > mem auto-init: stack:off, heap alloc:off, heap free:off
> > > > Initializing HighMem for node 0 (00036ffe:0003f500)
> > > > Initializing Movable for node 0 (00000000:00000000)
> > > > Checking if this processor honours the WP bit even in supervisor
> > > > mode...Ok.
> > > > Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata,
> > > > 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved,
> > > > 136200K highmem)
> > > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> > > > trace event string verifier disabled
> > > > Dynamic Preempt: voluntary
> > > > rcu: Preemptible hierarchical RCU implementation.
> > > > rcu:     RCU event tracing is enabled.
> > > > rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
> > > >  Trampoline variant of Tasks RCU enabled.
> > > >  Tracing variant of Tasks RCU enabled.
> > > > rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> > > > rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> > > > NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
> > > >
> > > > without the bad commit dmesg continues:
> > > >
> > > > random: get_random_bytes called from start_kernel+0x492/0x65a with
> > > > crng_init=0
> > > > Console: colour dummy device 80x25
> > > > printk: console [tty0] enabled
> > > > printk: bootconsole [uart0] disabled
> > > >
> > > > ....
> > > >
> > > > > > Any suggestions how to fix are welcome!
> > > > > >
> > > Interesting. I added the following fragment to the kernel config:
> > >
> > > # CONFIG_STACKPROTECTOR is not set
> > >
> > > And this resolves the boot issue (tested with v5.17 i686 on Intel
> > > Merrifield)
> >
> > I'm not sure that's the correct approach.
> > Any answer from the Andy Lutomirski?
> >
> > And in general to x86 maintainers, do we support all features on x86 32-bit? If
> > no, can it be said explicitly, please?
> 
> What compiler version are you using?

I guess it was gcc 11 and now gcc 12 (as of Debian unstable distro).

Currently
gcc (Debian 12.2.0-3) 12.2.0

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-11-09 22:33             ` Brian Gerst
  2022-11-10 14:02               ` Andy Shevchenko
@ 2022-11-10 19:36               ` Ferry Toth
  2022-11-14 21:43                 ` Brian Gerst
  1 sibling, 1 reply; 24+ messages in thread
From: Ferry Toth @ 2022-11-10 19:36 UTC (permalink / raw)
  To: Brian Gerst, Andy Shevchenko
  Cc: Dave Hansen, Andy Lutomirski, x86, LKML, Sedat Dilek,
	Nick Desaulniers, Sean Christopherson, Joerg Roedel

Hi,

Op 09-11-2022 om 23:33 schreef Brian Gerst:
> On Wed, Nov 9, 2022 at 9:50 AM Andy Shevchenko
> <andriy.shevchenko@intel.com> wrote:
>> On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
>>> Op 30-09-2022 om 22:30 schreef Ferry Toth:
>>>> Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
>>>>> On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
>>>>>> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
>>>>>>> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
>>>>>>> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
>>>>>>> percpu storage.  It's even nastier because it means that whether %gs
>>>>>>> contains userspace state or kernel state while running kernel code
>>>>>>> depends on whether stackprotector is enabled (this is
>>>>>>> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
>>>>>>> that segment selectors work.  Supporting both variants is a
>>>>>>> maintenance and testing mess.
>>>>>>>
>>>>>>> Merely rearranging so that percpu and the stack canary
>>>>>>> share the same segment would be messy as the 32-bit percpu address
>>>>>>> layout isn't currently compatible with putting a variable at a fixed
>>>>>>> offset.
>>>>>>>
>>>>>>> Fortunately, GCC 8.1 added options that allow the stack canary to be
>>>>>>> accessed as %fs:__stack_chk_guard, effectively turning it
>>>>>>> into an ordinary
>>>>>>> percpu variable.  This lets us get rid of all of the code to
>>>>>>> manage the
>>>>>>> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>>>>>>>
>>>>>>> (That name is special.  We could use any symbol we want for the
>>>>>>>    %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
>>>>>>> let us use any
>>>>>>>    name other than __stack_chk_guard.)
>>>>>>>
>>>>>>> This patch forcibly disables stackprotector on older compilers that
>>>>>>> don't support the new options and makes the stack canary into a
>>>>>>> percpu variable.  The "lazy GS" approach is now used for all 32-bit
>>>>>>> configurations.
>>>>>>>
>>>>>>> This patch also makes load_gs_index() work on 32-bit kernels.  On
>>>>>>> 64-bit kernels, it loads the GS selector and updates the user
>>>>>>> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
>>>>>>> it loads the GS selector and updates GSBASE, which is now
>>>>>>> always the user base.  This means that the overall effect is
>>>>>>> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>>>>>> This patch broke 32-bit boot on Intel Merrifield
>>>>>>
>>>>>> git bisect start
>>>>>> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
>>>>>> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
>>>>>> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
>>>>>> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
>>>>>> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
>>>>>> 'md-fixes' of
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
>>>>>> block-5.13
>>>>>> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
>>>>>> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
>>>>>> 'regulator-v5.13' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>>>>>> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
>>>>>> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
>>>>>> 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
>>>>>> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
>>>>>> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
>>>>>> drm/amd/display: Change input parameter for set_drr
>>>>>> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
>>>>>> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
>>>>>> fix macroblocks count control usage
>>>>>> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
>>>>>> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
>>>>>> 'x86_core_for_v5.13' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
>>>>>> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
>>>>>> 'work.inode-type-fixes' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
>>>>>> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
>>>>>> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
>>>>>> 'afs-netfs-lib-20210426' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
>>>>>> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
>>>>>> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
>>>>>> 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
>>>>>> a semantic conflict
>>>>>> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
>>>>>> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
>>>>>> x86/tools/insn_decoder_test: Convert to insn_decode()
>>>>>> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
>>>>>> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
>>>>>> x86/boot/compressed/sev-es: Convert to insn_decode()
>>>>>> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
>>>>>> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
>>>>>> Rename insn_decode() to insn_decode_from_regs()
>>>>>> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
>>>>>> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
>>>>>> Remove leftover macros after stackprotector cleanups
>>>>>> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
>>>>>> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
>>>>>> x86/stackprotector/32: Make the canary into a regular percpu
>>>>>> variable
>>>>>> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
>>>>>> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
>>>>>> x86/stackprotector/32: Make the canary into a regular percpu
>>>>>> variable
>>>> With the bad commit the last words in dmesg are:
>>>>
>>>> mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> Initializing HighMem for node 0 (00036ffe:0003f500)
>>>> Initializing Movable for node 0 (00000000:00000000)
>>>> Checking if this processor honours the WP bit even in supervisor
>>>> mode...Ok.
>>>> Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata,
>>>> 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved,
>>>> 136200K highmem)
>>>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
>>>> trace event string verifier disabled
>>>> Dynamic Preempt: voluntary
>>>> rcu: Preemptible hierarchical RCU implementation.
>>>> rcu:     RCU event tracing is enabled.
>>>> rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
>>>>   Trampoline variant of Tasks RCU enabled.
>>>>   Tracing variant of Tasks RCU enabled.
>>>> rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
>>>> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
>>>> NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
>>>>
>>>> without the bad commit dmesg continues:
>>>>
>>>> random: get_random_bytes called from start_kernel+0x492/0x65a with
>>>> crng_init=0
>>>> Console: colour dummy device 80x25
>>>> printk: console [tty0] enabled
>>>> printk: bootconsole [uart0] disabled
>>>>
>>>> ....
>>>>
>>>>>> Any suggestions how to fix are welcome!
>>>>>>
>>> Interesting. I added the following fragment to the kernel config:
>>>
>>> # CONFIG_STACKPROTECTOR is not set
>>>
>>> And this resolves the boot issue (tested with v5.17 i686 on Intel
>>> Merrifield)
>> I'm not sure that's the correct approach.
I didn't intend as a resolution, merely as a workaround. And since 
revert was not possible, as proof issue is localized in stack protector.
>> Any answer from the Andy Lutomirski?
>>
>> And in general to x86 maintainers, do we support all features on x86 32-bit? If
>> no, can it be said explicitly, please?
> What compiler version are you using?

I built with Yocto Honister which builds it's own cross-compiler gcc 
11.2. For completeness:

root@yuna:~# uname -a
Linux yuna 5.17.0-edison-acpi-standard #1 SMP PREEMPT Sun Mar 20 
20:14:17 UTC 2022 i686 i686 i386 GNU/Linux

root@yuna:~# cat /proc/version
Linux version 5.17.0-edison-acpi-standard (oe-user@oe-host) 
(i686-poky-linux-gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37.20210721) 
#1 SMP PREEMPT Sun Mar 20 20:14:17 UTC 2022

root@yuna:~# cat /etc/os-release
ID=poky-edison
NAME="Poky (Yocto Project Reference Distro)"
VERSION="3.4.4 (honister)"
VERSION_ID=3.4.4
PRETTY_NAME="Poky (Yocto Project Reference Distro) 3.4.4 (honister)"

> --
> Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-11-10 19:36               ` Ferry Toth
@ 2022-11-14 21:43                 ` Brian Gerst
  2022-11-14 22:16                   ` Ferry Toth
  0 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2022-11-14 21:43 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Andy Shevchenko, Dave Hansen, Andy Lutomirski, x86, LKML,
	Sedat Dilek, Nick Desaulniers, Sean Christopherson, Joerg Roedel

On Thu, Nov 10, 2022 at 2:36 PM Ferry Toth <fntoth@gmail.com> wrote:
>
> Hi,
>
> Op 09-11-2022 om 23:33 schreef Brian Gerst:
> > On Wed, Nov 9, 2022 at 9:50 AM Andy Shevchenko
> > <andriy.shevchenko@intel.com> wrote:
> >> On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
> >>> Op 30-09-2022 om 22:30 schreef Ferry Toth:
> >>>> Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
> >>>>> On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
> >>>>>> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
> >>>>>>> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
> >>>>>>> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
> >>>>>>> percpu storage.  It's even nastier because it means that whether %gs
> >>>>>>> contains userspace state or kernel state while running kernel code
> >>>>>>> depends on whether stackprotector is enabled (this is
> >>>>>>> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
> >>>>>>> that segment selectors work.  Supporting both variants is a
> >>>>>>> maintenance and testing mess.
> >>>>>>>
> >>>>>>> Merely rearranging so that percpu and the stack canary
> >>>>>>> share the same segment would be messy as the 32-bit percpu address
> >>>>>>> layout isn't currently compatible with putting a variable at a fixed
> >>>>>>> offset.
> >>>>>>>
> >>>>>>> Fortunately, GCC 8.1 added options that allow the stack canary to be
> >>>>>>> accessed as %fs:__stack_chk_guard, effectively turning it
> >>>>>>> into an ordinary
> >>>>>>> percpu variable.  This lets us get rid of all of the code to
> >>>>>>> manage the
> >>>>>>> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
> >>>>>>>
> >>>>>>> (That name is special.  We could use any symbol we want for the
> >>>>>>>    %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
> >>>>>>> let us use any
> >>>>>>>    name other than __stack_chk_guard.)
> >>>>>>>
> >>>>>>> This patch forcibly disables stackprotector on older compilers that
> >>>>>>> don't support the new options and makes the stack canary into a
> >>>>>>> percpu variable.  The "lazy GS" approach is now used for all 32-bit
> >>>>>>> configurations.
> >>>>>>>
> >>>>>>> This patch also makes load_gs_index() work on 32-bit kernels.  On
> >>>>>>> 64-bit kernels, it loads the GS selector and updates the user
> >>>>>>> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
> >>>>>>> it loads the GS selector and updates GSBASE, which is now
> >>>>>>> always the user base.  This means that the overall effect is
> >>>>>>> the same on 32-bit and 64-bit, which avoids some ifdeffery.
> >>>>>> This patch broke 32-bit boot on Intel Merrifield
> >>>>>>
> >>>>>> git bisect start
> >>>>>> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
> >>>>>> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> >>>>>> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
> >>>>>> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
> >>>>>> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
> >>>>>> 'md-fixes' of
> >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
> >>>>>> block-5.13
> >>>>>> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
> >>>>>> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
> >>>>>> 'regulator-v5.13' of
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
> >>>>>> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
> >>>>>> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
> >>>>>> 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
> >>>>>> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
> >>>>>> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
> >>>>>> drm/amd/display: Change input parameter for set_drr
> >>>>>> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
> >>>>>> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
> >>>>>> fix macroblocks count control usage
> >>>>>> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
> >>>>>> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
> >>>>>> 'x86_core_for_v5.13' of
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >>>>>> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
> >>>>>> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
> >>>>>> 'work.inode-type-fixes' of
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> >>>>>> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
> >>>>>> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
> >>>>>> 'afs-netfs-lib-20210426' of
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> >>>>>> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
> >>>>>> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
> >>>>>> 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
> >>>>>> a semantic conflict
> >>>>>> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
> >>>>>> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
> >>>>>> x86/tools/insn_decoder_test: Convert to insn_decode()
> >>>>>> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
> >>>>>> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
> >>>>>> x86/boot/compressed/sev-es: Convert to insn_decode()
> >>>>>> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
> >>>>>> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
> >>>>>> Rename insn_decode() to insn_decode_from_regs()
> >>>>>> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
> >>>>>> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
> >>>>>> Remove leftover macros after stackprotector cleanups
> >>>>>> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
> >>>>>> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> >>>>>> x86/stackprotector/32: Make the canary into a regular percpu
> >>>>>> variable
> >>>>>> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
> >>>>>> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
> >>>>>> x86/stackprotector/32: Make the canary into a regular percpu
> >>>>>> variable
> >>>> With the bad commit the last words in dmesg are:
> >>>>
> >>>> mem auto-init: stack:off, heap alloc:off, heap free:off
> >>>> Initializing HighMem for node 0 (00036ffe:0003f500)
> >>>> Initializing Movable for node 0 (00000000:00000000)
> >>>> Checking if this processor honours the WP bit even in supervisor
> >>>> mode...Ok.
> >>>> Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata,
> >>>> 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved,
> >>>> 136200K highmem)
> >>>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> >>>> trace event string verifier disabled
> >>>> Dynamic Preempt: voluntary
> >>>> rcu: Preemptible hierarchical RCU implementation.
> >>>> rcu:     RCU event tracing is enabled.
> >>>> rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
> >>>>   Trampoline variant of Tasks RCU enabled.
> >>>>   Tracing variant of Tasks RCU enabled.
> >>>> rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> >>>> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> >>>> NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
> >>>>
> >>>> without the bad commit dmesg continues:
> >>>>
> >>>> random: get_random_bytes called from start_kernel+0x492/0x65a with
> >>>> crng_init=0
> >>>> Console: colour dummy device 80x25
> >>>> printk: console [tty0] enabled
> >>>> printk: bootconsole [uart0] disabled
> >>>>
> >>>> ....
> >>>>
> >>>>>> Any suggestions how to fix are welcome!
> >>>>>>
> >>> Interesting. I added the following fragment to the kernel config:
> >>>
> >>> # CONFIG_STACKPROTECTOR is not set
> >>>
> >>> And this resolves the boot issue (tested with v5.17 i686 on Intel
> >>> Merrifield)
> >> I'm not sure that's the correct approach.
> I didn't intend as a resolution, merely as a workaround. And since
> revert was not possible, as proof issue is localized in stack protector.
> >> Any answer from the Andy Lutomirski?
> >>
> >> And in general to x86 maintainers, do we support all features on x86 32-bit? If
> >> no, can it be said explicitly, please?
> > What compiler version are you using?
>
> I built with Yocto Honister which builds it's own cross-compiler gcc
> 11.2. For completeness:
>
> root@yuna:~# uname -a
> Linux yuna 5.17.0-edison-acpi-standard #1 SMP PREEMPT Sun Mar 20
> 20:14:17 UTC 2022 i686 i686 i386 GNU/Linux
>
> root@yuna:~# cat /proc/version
> Linux version 5.17.0-edison-acpi-standard (oe-user@oe-host)
> (i686-poky-linux-gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37.20210721)
> #1 SMP PREEMPT Sun Mar 20 20:14:17 UTC 2022
>
> root@yuna:~# cat /etc/os-release
> ID=poky-edison
> NAME="Poky (Yocto Project Reference Distro)"
> VERSION="3.4.4 (honister)"
> VERSION_ID=3.4.4
> PRETTY_NAME="Poky (Yocto Project Reference Distro) 3.4.4 (honister)"
>
> > --
> > Brian Gerst

What exactly happens when it fails (hang/reboot/oops)?

Does removing the call to boot_init_stack_canary() in init/main.c fix
the problem?

--
Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-11-14 21:43                 ` Brian Gerst
@ 2022-11-14 22:16                   ` Ferry Toth
  2022-11-17 19:28                     ` Ferry Toth
  0 siblings, 1 reply; 24+ messages in thread
From: Ferry Toth @ 2022-11-14 22:16 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Shevchenko, Dave Hansen, Andy Lutomirski, x86, LKML,
	Sedat Dilek, Nick Desaulniers, Sean Christopherson, Joerg Roedel

Hi

Op 14-11-2022 om 22:43 schreef Brian Gerst:
> On Thu, Nov 10, 2022 at 2:36 PM Ferry Toth <fntoth@gmail.com> wrote:
>> Hi,
>>
>> Op 09-11-2022 om 23:33 schreef Brian Gerst:
>>> On Wed, Nov 9, 2022 at 9:50 AM Andy Shevchenko
>>> <andriy.shevchenko@intel.com> wrote:
>>>> On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
>>>>> Op 30-09-2022 om 22:30 schreef Ferry Toth:
>>>>>> Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
>>>>>>> On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
>>>>>>>> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
>>>>>>>>> On 32-bit kernels, the stackprotector canary is quite nasty -- it is
>>>>>>>>> stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
>>>>>>>>> percpu storage.  It's even nastier because it means that whether %gs
>>>>>>>>> contains userspace state or kernel state while running kernel code
>>>>>>>>> depends on whether stackprotector is enabled (this is
>>>>>>>>> CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
>>>>>>>>> that segment selectors work.  Supporting both variants is a
>>>>>>>>> maintenance and testing mess.
>>>>>>>>>
>>>>>>>>> Merely rearranging so that percpu and the stack canary
>>>>>>>>> share the same segment would be messy as the 32-bit percpu address
>>>>>>>>> layout isn't currently compatible with putting a variable at a fixed
>>>>>>>>> offset.
>>>>>>>>>
>>>>>>>>> Fortunately, GCC 8.1 added options that allow the stack canary to be
>>>>>>>>> accessed as %fs:__stack_chk_guard, effectively turning it
>>>>>>>>> into an ordinary
>>>>>>>>> percpu variable.  This lets us get rid of all of the code to
>>>>>>>>> manage the
>>>>>>>>> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>>>>>>>>>
>>>>>>>>> (That name is special.  We could use any symbol we want for the
>>>>>>>>>     %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
>>>>>>>>> let us use any
>>>>>>>>>     name other than __stack_chk_guard.)
>>>>>>>>>
>>>>>>>>> This patch forcibly disables stackprotector on older compilers that
>>>>>>>>> don't support the new options and makes the stack canary into a
>>>>>>>>> percpu variable.  The "lazy GS" approach is now used for all 32-bit
>>>>>>>>> configurations.
>>>>>>>>>
>>>>>>>>> This patch also makes load_gs_index() work on 32-bit kernels.  On
>>>>>>>>> 64-bit kernels, it loads the GS selector and updates the user
>>>>>>>>> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
>>>>>>>>> it loads the GS selector and updates GSBASE, which is now
>>>>>>>>> always the user base.  This means that the overall effect is
>>>>>>>>> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>>>>>>>> This patch broke 32-bit boot on Intel Merrifield
>>>>>>>>
>>>>>>>> git bisect start
>>>>>>>> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
>>>>>>>> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
>>>>>>>> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
>>>>>>>> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
>>>>>>>> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
>>>>>>>> 'md-fixes' of
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
>>>>>>>> block-5.13
>>>>>>>> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
>>>>>>>> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
>>>>>>>> 'regulator-v5.13' of
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>>>>>>>> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
>>>>>>>> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
>>>>>>>> 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
>>>>>>>> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
>>>>>>>> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
>>>>>>>> drm/amd/display: Change input parameter for set_drr
>>>>>>>> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
>>>>>>>> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
>>>>>>>> fix macroblocks count control usage
>>>>>>>> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
>>>>>>>> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
>>>>>>>> 'x86_core_for_v5.13' of
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>>>> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
>>>>>>>> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
>>>>>>>> 'work.inode-type-fixes' of
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
>>>>>>>> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
>>>>>>>> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
>>>>>>>> 'afs-netfs-lib-20210426' of
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
>>>>>>>> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
>>>>>>>> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
>>>>>>>> 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
>>>>>>>> a semantic conflict
>>>>>>>> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
>>>>>>>> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
>>>>>>>> x86/tools/insn_decoder_test: Convert to insn_decode()
>>>>>>>> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
>>>>>>>> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
>>>>>>>> x86/boot/compressed/sev-es: Convert to insn_decode()
>>>>>>>> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
>>>>>>>> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
>>>>>>>> Rename insn_decode() to insn_decode_from_regs()
>>>>>>>> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
>>>>>>>> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
>>>>>>>> Remove leftover macros after stackprotector cleanups
>>>>>>>> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
>>>>>>>> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
>>>>>>>> x86/stackprotector/32: Make the canary into a regular percpu
>>>>>>>> variable
>>>>>>>> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
>>>>>>>> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
>>>>>>>> x86/stackprotector/32: Make the canary into a regular percpu
>>>>>>>> variable
>>>>>> With the bad commit the last words in dmesg are:
>>>>>>
>>>>>> mem auto-init: stack:off, heap alloc:off, heap free:off
>>>>>> Initializing HighMem for node 0 (00036ffe:0003f500)
>>>>>> Initializing Movable for node 0 (00000000:00000000)
>>>>>> Checking if this processor honours the WP bit even in supervisor
>>>>>> mode...Ok.
>>>>>> Memory: 948444K/1004124K available (12430K kernel code, 2167K rwdata,
>>>>>> 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K cma-reserved,
>>>>>> 136200K highmem)
>>>>>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
>>>>>> trace event string verifier disabled
>>>>>> Dynamic Preempt: voluntary
>>>>>> rcu: Preemptible hierarchical RCU implementation.
>>>>>> rcu:     RCU event tracing is enabled.
>>>>>> rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
>>>>>>    Trampoline variant of Tasks RCU enabled.
>>>>>>    Tracing variant of Tasks RCU enabled.
>>>>>> rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
>>>>>> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
>>>>>> NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
>>>>>>
>>>>>> without the bad commit dmesg continues:
>>>>>>
>>>>>> random: get_random_bytes called from start_kernel+0x492/0x65a with
>>>>>> crng_init=0
>>>>>> Console: colour dummy device 80x25
>>>>>> printk: console [tty0] enabled
>>>>>> printk: bootconsole [uart0] disabled
>>>>>>
>>>>>> ....
>>>>>>
>>>>>>>> Any suggestions how to fix are welcome!
>>>>>>>>
>>>>> Interesting. I added the following fragment to the kernel config:
>>>>>
>>>>> # CONFIG_STACKPROTECTOR is not set
>>>>>
>>>>> And this resolves the boot issue (tested with v5.17 i686 on Intel
>>>>> Merrifield)
>>>> I'm not sure that's the correct approach.
>> I didn't intend as a resolution, merely as a workaround. And since
>> revert was not possible, as proof issue is localized in stack protector.
>>>> Any answer from the Andy Lutomirski?
>>>>
>>>> And in general to x86 maintainers, do we support all features on x86 32-bit? If
>>>> no, can it be said explicitly, please?
>>> What compiler version are you using?
>> I built with Yocto Honister which builds it's own cross-compiler gcc
>> 11.2. For completeness:
>>
>> root@yuna:~# uname -a
>> Linux yuna 5.17.0-edison-acpi-standard #1 SMP PREEMPT Sun Mar 20
>> 20:14:17 UTC 2022 i686 i686 i386 GNU/Linux
>>
>> root@yuna:~# cat /proc/version
>> Linux version 5.17.0-edison-acpi-standard (oe-user@oe-host)
>> (i686-poky-linux-gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37.20210721)
>> #1 SMP PREEMPT Sun Mar 20 20:14:17 UTC 2022
>>
>> root@yuna:~# cat /etc/os-release
>> ID=poky-edison
>> NAME="Poky (Yocto Project Reference Distro)"
>> VERSION="3.4.4 (honister)"
>> VERSION_ID=3.4.4
>> PRETTY_NAME="Poky (Yocto Project Reference Distro) 3.4.4 (honister)"
>>
>>> --
>>> Brian Gerst
> What exactly happens when it fails (hang/reboot/oops)?

It just hangs. Rebooting into another kernel with CONFIG_STACKPROTECTOR 
not set and displaying journal of the failed boot show the last words 
mentioned above.

> Does removing the call to boot_init_stack_canary() in init/main.c fix
> the problem?
I will try tomorrow.
> --
> Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable
  2022-11-14 22:16                   ` Ferry Toth
@ 2022-11-17 19:28                     ` Ferry Toth
  0 siblings, 0 replies; 24+ messages in thread
From: Ferry Toth @ 2022-11-17 19:28 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Shevchenko, Dave Hansen, Andy Lutomirski, x86, LKML,
	Sedat Dilek, Nick Desaulniers, Sean Christopherson, Joerg Roedel

Hi

Op 14-11-2022 om 23:16 schreef Ferry Toth:
> Hi
>
> Op 14-11-2022 om 22:43 schreef Brian Gerst:
>> On Thu, Nov 10, 2022 at 2:36 PM Ferry Toth <fntoth@gmail.com> wrote:
>>> Hi,
>>>
>>> Op 09-11-2022 om 23:33 schreef Brian Gerst:
>>>> On Wed, Nov 9, 2022 at 9:50 AM Andy Shevchenko
>>>> <andriy.shevchenko@intel.com> wrote:
>>>>> On Fri, Sep 30, 2022 at 11:18:51PM +0200, Ferry Toth wrote:
>>>>>> Op 30-09-2022 om 22:30 schreef Ferry Toth:
>>>>>>> Op 29-09-2022 om 16:20 schreef Andy Shevchenko:
>>>>>>>> On Thu, Sep 29, 2022 at 04:56:07PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Sat, Feb 13, 2021 at 11:19:44AM -0800, Andy Lutomirski wrote:
>>>>>>>>>> On 32-bit kernels, the stackprotector canary is quite nasty 
>>>>>>>>>> -- it is
>>>>>>>>>> stored at %gs:(20), which is nasty because 32-bit kernels use 
>>>>>>>>>> %fs for
>>>>>>>>>> percpu storage.  It's even nastier because it means that 
>>>>>>>>>> whether %gs
>>>>>>>>>> contains userspace state or kernel state while running kernel 
>>>>>>>>>> code
>>>>>>>>>> depends on whether stackprotector is enabled (this is
>>>>>>>>>> CONFIG_X86_32_LAZY_GS), and this setting radically changes 
>>>>>>>>>> the way
>>>>>>>>>> that segment selectors work.  Supporting both variants is a
>>>>>>>>>> maintenance and testing mess.
>>>>>>>>>>
>>>>>>>>>> Merely rearranging so that percpu and the stack canary
>>>>>>>>>> share the same segment would be messy as the 32-bit percpu 
>>>>>>>>>> address
>>>>>>>>>> layout isn't currently compatible with putting a variable at 
>>>>>>>>>> a fixed
>>>>>>>>>> offset.
>>>>>>>>>>
>>>>>>>>>> Fortunately, GCC 8.1 added options that allow the stack 
>>>>>>>>>> canary to be
>>>>>>>>>> accessed as %fs:__stack_chk_guard, effectively turning it
>>>>>>>>>> into an ordinary
>>>>>>>>>> percpu variable.  This lets us get rid of all of the code to
>>>>>>>>>> manage the
>>>>>>>>>> stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
>>>>>>>>>>
>>>>>>>>>> (That name is special.  We could use any symbol we want for the
>>>>>>>>>>     %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to
>>>>>>>>>> let us use any
>>>>>>>>>>     name other than __stack_chk_guard.)
>>>>>>>>>>
>>>>>>>>>> This patch forcibly disables stackprotector on older 
>>>>>>>>>> compilers that
>>>>>>>>>> don't support the new options and makes the stack canary into a
>>>>>>>>>> percpu variable.  The "lazy GS" approach is now used for all 
>>>>>>>>>> 32-bit
>>>>>>>>>> configurations.
>>>>>>>>>>
>>>>>>>>>> This patch also makes load_gs_index() work on 32-bit 
>>>>>>>>>> kernels.  On
>>>>>>>>>> 64-bit kernels, it loads the GS selector and updates the user
>>>>>>>>>> GSBASE accordingly.  (This is unchanged.)  On 32-bit kernels,
>>>>>>>>>> it loads the GS selector and updates GSBASE, which is now
>>>>>>>>>> always the user base.  This means that the overall effect is
>>>>>>>>>> the same on 32-bit and 64-bit, which avoids some ifdeffery.
>>>>>>>>> This patch broke 32-bit boot on Intel Merrifield
>>>>>>>>>
>>>>>>>>> git bisect start
>>>>>>>>> # good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
>>>>>>>>> git bisect good 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
>>>>>>>>> # bad: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
>>>>>>>>> git bisect bad 62fb9874f5da54fdb243003b386128037319b219
>>>>>>>>> # bad: [85f3f17b5db2dd9f8a094a0ddc665555135afd22] Merge branch
>>>>>>>>> 'md-fixes' of
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md into
>>>>>>>>> block-5.13
>>>>>>>>> git bisect bad 85f3f17b5db2dd9f8a094a0ddc665555135afd22
>>>>>>>>> # good: [ca62e9090d229926f43f20291bb44d67897baab7] Merge tag
>>>>>>>>> 'regulator-v5.13' of
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>>>>>>>>> git bisect good ca62e9090d229926f43f20291bb44d67897baab7
>>>>>>>>> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag
>>>>>>>>> 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
>>>>>>>>> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
>>>>>>>>> # good: [49c70ece54b0d1c51bc31b2b0c1070777c992c26]
>>>>>>>>> drm/amd/display: Change input parameter for set_drr
>>>>>>>>> git bisect good 49c70ece54b0d1c51bc31b2b0c1070777c992c26
>>>>>>>>> # good: [0b276e470a4d43e1365d3eb53c608a3d208cabd4] media: coda:
>>>>>>>>> fix macroblocks count control usage
>>>>>>>>> git bisect good 0b276e470a4d43e1365d3eb53c608a3d208cabd4
>>>>>>>>> # bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag
>>>>>>>>> 'x86_core_for_v5.13' of
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>>>>> git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
>>>>>>>>> # good: [d1466bc583a81830cef2399a4b8a514398351b40] Merge branch
>>>>>>>>> 'work.inode-type-fixes' of
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
>>>>>>>>> git bisect good d1466bc583a81830cef2399a4b8a514398351b40
>>>>>>>>> # good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag
>>>>>>>>> 'afs-netfs-lib-20210426' of
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
>>>>>>>>> git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
>>>>>>>>> # bad: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch
>>>>>>>>> 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve
>>>>>>>>> a semantic conflict
>>>>>>>>> git bisect bad b1f480bc0686e65d5413c035bd13af2ea4888784
>>>>>>>>> # bad: [0c925c61dae18ee3cb93a61cc9dd9562a066034d]
>>>>>>>>> x86/tools/insn_decoder_test: Convert to insn_decode()
>>>>>>>>> git bisect bad 0c925c61dae18ee3cb93a61cc9dd9562a066034d
>>>>>>>>> # bad: [514ef77607b9ff184c11b88e8f100bc27f07460d]
>>>>>>>>> x86/boot/compressed/sev-es: Convert to insn_decode()
>>>>>>>>> git bisect bad 514ef77607b9ff184c11b88e8f100bc27f07460d
>>>>>>>>> # bad: [9e761296c52dcdb1aaa151b65bd39accb05740d9] x86/insn:
>>>>>>>>> Rename insn_decode() to insn_decode_from_regs()
>>>>>>>>> git bisect bad 9e761296c52dcdb1aaa151b65bd39accb05740d9
>>>>>>>>> # bad: [d0962f2b24c99889a386f0658c71535f56358f77] x86/entry/32:
>>>>>>>>> Remove leftover macros after stackprotector cleanups
>>>>>>>>> git bisect bad d0962f2b24c99889a386f0658c71535f56358f77
>>>>>>>>> # bad: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
>>>>>>>>> x86/stackprotector/32: Make the canary into a regular percpu
>>>>>>>>> variable
>>>>>>>>> git bisect bad 3fb0fdb3bbe7aed495109b3296b06c2409734023
>>>>>>>>> # first bad commit: [3fb0fdb3bbe7aed495109b3296b06c2409734023]
>>>>>>>>> x86/stackprotector/32: Make the canary into a regular percpu
>>>>>>>>> variable
>>>>>>> With the bad commit the last words in dmesg are:
>>>>>>>
>>>>>>> mem auto-init: stack:off, heap alloc:off, heap free:off
>>>>>>> Initializing HighMem for node 0 (00036ffe:0003f500)
>>>>>>> Initializing Movable for node 0 (00000000:00000000)
>>>>>>> Checking if this processor honours the WP bit even in supervisor
>>>>>>> mode...Ok.
>>>>>>> Memory: 948444K/1004124K available (12430K kernel code, 2167K 
>>>>>>> rwdata,
>>>>>>> 4948K rodata, 716K init, 716K bss, 55680K reserved, 0K 
>>>>>>> cma-reserved,
>>>>>>> 136200K highmem)
>>>>>>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
>>>>>>> trace event string verifier disabled
>>>>>>> Dynamic Preempt: voluntary
>>>>>>> rcu: Preemptible hierarchical RCU implementation.
>>>>>>> rcu:     RCU event tracing is enabled.
>>>>>>> rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
>>>>>>>    Trampoline variant of Tasks RCU enabled.
>>>>>>>    Tracing variant of Tasks RCU enabled.
>>>>>>> rcu: RCU calculated value of scheduler-enlistment delay is 100 
>>>>>>> jiffies.
>>>>>>> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
>>>>>>> NR_IRQS: 2304, nr_irqs: 512, preallocated irqs: 0
>>>>>>>
>>>>>>> without the bad commit dmesg continues:
>>>>>>>
>>>>>>> random: get_random_bytes called from start_kernel+0x492/0x65a with
>>>>>>> crng_init=0
>>>>>>> Console: colour dummy device 80x25
>>>>>>> printk: console [tty0] enabled
>>>>>>> printk: bootconsole [uart0] disabled
>>>>>>>
>>>>>>> ....
>>>>>>>
>>>>>>>>> Any suggestions how to fix are welcome!
>>>>>>>>>
>>>>>> Interesting. I added the following fragment to the kernel config:
>>>>>>
>>>>>> # CONFIG_STACKPROTECTOR is not set
>>>>>>
>>>>>> And this resolves the boot issue (tested with v5.17 i686 on Intel
>>>>>> Merrifield)
>>>>> I'm not sure that's the correct approach.
>>> I didn't intend as a resolution, merely as a workaround. And since
>>> revert was not possible, as proof issue is localized in stack 
>>> protector.
>>>>> Any answer from the Andy Lutomirski?
>>>>>
>>>>> And in general to x86 maintainers, do we support all features on 
>>>>> x86 32-bit? If
>>>>> no, can it be said explicitly, please?
>>>> What compiler version are you using?
>>> I built with Yocto Honister which builds it's own cross-compiler gcc
>>> 11.2. For completeness:
>>>
>>> root@yuna:~# uname -a
>>> Linux yuna 5.17.0-edison-acpi-standard #1 SMP PREEMPT Sun Mar 20
>>> 20:14:17 UTC 2022 i686 i686 i386 GNU/Linux
>>>
>>> root@yuna:~# cat /proc/version
>>> Linux version 5.17.0-edison-acpi-standard (oe-user@oe-host)
>>> (i686-poky-linux-gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37.20210721)
>>> #1 SMP PREEMPT Sun Mar 20 20:14:17 UTC 2022
>>>
>>> root@yuna:~# cat /etc/os-release
>>> ID=poky-edison
>>> NAME="Poky (Yocto Project Reference Distro)"
>>> VERSION="3.4.4 (honister)"
>>> VERSION_ID=3.4.4
>>> PRETTY_NAME="Poky (Yocto Project Reference Distro) 3.4.4 (honister)"
>>>
>>>> -- 
>>>> Brian Gerst
>> What exactly happens when it fails (hang/reboot/oops)?
>
> It just hangs. Rebooting into another kernel with 
> CONFIG_STACKPROTECTOR not set and displaying journal of the failed 
> boot show the last words mentioned above.
>
>> Does removing the call to boot_init_stack_canary() in init/main.c fix
>> the problem?
> I will try tomorrow.

I just tried this on top of v6.0:

diff --git a/init/main.c b/init/main.c
index 1fe7942f5d4a..f30ec221f473 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1046,7 +1046,6 @@ asmlinkage __visible void __init 
__no_sanitize_address start_kernel(void)
       * - random_init() to initialize the RNG from from early entropy 
sources
       */
      random_init(command_line);
-    boot_init_stack_canary();

      perf_event_init();
      profile_init();

and it just hangs, same as with 5.17. Again tried (without this patch) 
"CONFIG_STACKPROTECTOR not set" and that boots fine.

>> -- 
>> Brian Gerst

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-11-17 19:28 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-13 19:19 [PATCH v2 0/2] Clean up x86_32 stackprotector Andy Lutomirski
2021-02-13 19:19 ` [PATCH v2 1/2] x86/stackprotector/32: Make the canary into a regular percpu variable Andy Lutomirski
2021-02-13 19:49   ` Sedat Dilek
2021-02-16 16:21   ` Sean Christopherson
2021-02-16 20:23     ` Sedat Dilek
2021-02-16 18:45   ` Nick Desaulniers
2021-02-16 20:29     ` Sedat Dilek
2021-03-08 13:14   ` [tip: x86/core] " tip-bot2 for Andy Lutomirski
2022-09-29 13:56   ` [PATCH v2 1/2] " Andy Shevchenko
2022-09-29 14:20     ` Andy Shevchenko
2022-09-30 20:30       ` Ferry Toth
2022-09-30 21:18         ` Ferry Toth
2022-11-09 14:50           ` Andy Shevchenko
2022-11-09 22:33             ` Brian Gerst
2022-11-10 14:02               ` Andy Shevchenko
2022-11-10 19:36               ` Ferry Toth
2022-11-14 21:43                 ` Brian Gerst
2022-11-14 22:16                   ` Ferry Toth
2022-11-17 19:28                     ` Ferry Toth
2021-02-13 19:19 ` [PATCH v2 2/2] x86/entry/32: Remove leftover macros after stackprotector cleanups Andy Lutomirski
2021-03-08 13:14   ` [tip: x86/core] " tip-bot2 for Andy Lutomirski
2021-02-13 19:33 ` [PATCH v2 0/2] Clean up x86_32 stackprotector Sedat Dilek
2021-02-14 14:42 ` Sedat Dilek
2021-02-16 18:13 ` Nick Desaulniers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).