Linux-arch Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking
@ 2020-09-25 14:57 Yu-cheng Yu
  2020-09-25 14:57 ` [PATCH v13 1/8] x86/cet/ibt: Add Kconfig option for user-mode " Yu-cheng Yu
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:57 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Control-flow Enforcement (CET) is a new Intel processor feature that blocks
return/jump-oriented programming attacks.  Details are in "Intel 64 and
IA-32 Architectures Software Developer's Manual" [1].

This is the second part of CET and enables Indirect Branch Tracking (IBT).
It is built on top of the shadow stack series.

Changes in v13:
- Drop the patch that disables vsyscall emulation when CET is enabled, and
  re-introduce an earlier patch that fixes shadow stack and IBT for the
  emulation code.
- Remove "INTEL" string from CET Kconfig options, update help text, and
  change IBT default configuration to N.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual:

    https://software.intel.com/en-us/download/intel-64-and-ia-32-
    architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4

[2] Indirect Branch Tracking patches v12.

    https://lkml.kernel.org/r/20200918192312.25978-1-yu-cheng.yu@intel.com/

H.J. Lu (3):
  x86/cet/ibt: Update arch_prctl functions for Indirect Branch Tracking
  x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point
  x86/vdso: Insert endbr32/endbr64 to vDSO

Yu-cheng Yu (5):
  x86/cet/ibt: Add Kconfig option for user-mode Indirect Branch Tracking
  x86/cet/ibt: User-mode Indirect Branch Tracking support
  x86/cet/ibt: Handle signals for Indirect Branch Tracking
  x86/cet/ibt: ELF header parsing for Indirect Branch Tracking
  x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for
    vsyscall emulation

 arch/x86/Kconfig                              | 21 +++++++
 arch/x86/entry/vdso/Makefile                  |  4 ++
 arch/x86/entry/vdso/vdso32/system_call.S      |  3 +
 arch/x86/entry/vsyscall/vsyscall_64.c         | 34 +++++++++++
 arch/x86/entry/vsyscall/vsyscall_emu_64.S     |  9 +++
 arch/x86/entry/vsyscall/vsyscall_trace.h      |  1 +
 arch/x86/include/asm/cet.h                    |  3 +
 arch/x86/include/asm/disabled-features.h      |  8 ++-
 arch/x86/kernel/cet.c                         | 60 ++++++++++++++++++-
 arch/x86/kernel/cet_prctl.c                   |  8 ++-
 arch/x86/kernel/cpu/common.c                  | 17 ++++++
 arch/x86/kernel/fpu/signal.c                  |  8 ++-
 arch/x86/kernel/process_64.c                  |  8 +++
 .../arch/x86/include/asm/disabled-features.h  |  8 ++-
 14 files changed, 184 insertions(+), 8 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 1/8] x86/cet/ibt: Add Kconfig option for user-mode Indirect Branch Tracking
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
@ 2020-09-25 14:57 ` Yu-cheng Yu
  2020-09-25 14:57 ` [PATCH v13 2/8] x86/cet/ibt: User-mode Indirect Branch Tracking support Yu-cheng Yu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:57 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Introduce Kconfig option X86_BRANCH_TRACKING_USER.

Indirect Branch Tracking (IBT) provides protection against CALL-/JMP-
oriented programming attacks.  It is active when the kernel has this
feature enabled, and the processor and the application support it.
When this feature is enabled, legacy non-IBT applications continue to
work, but without IBT protection.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v13:
- Update help text, and change default to N.
- Change X86_INTEL_* to X86_*.

v10:
- Change build-time CET check to config depends on.

 arch/x86/Kconfig | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b28a0ce4594..15c7f2606c9d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1966,6 +1966,25 @@ config X86_SHADOW_STACK_USER
 
 	  If unsure, say N.
 
+config X86_BRANCH_TRACKING_USER
+	prompt "Intel Indirect Branch Tracking for user-mode"
+	def_bool n
+	depends on CPU_SUP_INTEL && X86_64
+	depends on $(cc-option,-fcf-protection)
+	select X86_CET
+	help
+	  Indirect Branch Tracking (IBT) provides protection against
+	  CALL-/JMP-oriented programming attacks.  It is active when
+	  the kernel has this feature enabled, and the processor and
+	  the application support it.  When this feature is enabled,
+	  legacy non-IBT applications continue to work, but without
+	  IBT protection.
+	  Support for this feature is only known to be present on
+	  processors released in 2020 or later.  CET features are also
+	  known to increase kernel text size by 3.7 KB.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 2/8] x86/cet/ibt: User-mode Indirect Branch Tracking support
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
  2020-09-25 14:57 ` [PATCH v13 1/8] x86/cet/ibt: Add Kconfig option for user-mode " Yu-cheng Yu
@ 2020-09-25 14:57 ` Yu-cheng Yu
  2020-09-25 14:57 ` [PATCH v13 3/8] x86/cet/ibt: Handle signals for Indirect Branch Tracking Yu-cheng Yu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:57 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Introduce user-mode Indirect Branch Tracking (IBT) support.  Update setup
routines to include IBT.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v10:
- Change no_cet_ibt to no_user_ibt.

v9:
- Change cpu_feature_enabled() to static_cpu_has().

v2:
- Change noibt to no_cet_ibt.

 arch/x86/include/asm/cet.h                    |  3 ++
 arch/x86/include/asm/disabled-features.h      |  8 ++++-
 arch/x86/kernel/cet.c                         | 33 +++++++++++++++++++
 arch/x86/kernel/cpu/common.c                  | 17 ++++++++++
 .../arch/x86/include/asm/disabled-features.h  |  8 ++++-
 5 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 16870e5bc8eb..3a1cba579cb2 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -15,6 +15,7 @@ struct cet_status {
 	unsigned long	shstk_base;
 	unsigned long	shstk_size;
 	unsigned int	locked:1;
+	unsigned int	ibt_enabled:1;
 };
 
 #ifdef CONFIG_X86_CET
@@ -26,6 +27,8 @@ void cet_free_shstk(struct task_struct *p);
 int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp);
 void cet_restore_signal(struct sc_ext *sc);
 int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
+int cet_setup_ibt(void);
+void cet_disable_ibt(void);
 #else
 static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; }
 static inline int cet_setup_thread_shstk(struct task_struct *p,
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index edac76ed75e7..e7096a1e2698 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -62,6 +62,12 @@
 #define DISABLE_SHSTK	(1<<(X86_FEATURE_SHSTK & 31))
 #endif
 
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+#define DISABLE_IBT	0
+#else
+#define DISABLE_IBT	(1<<(X86_FEATURE_IBT & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -83,7 +89,7 @@
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
-#define DISABLED_MASK18	0
+#define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index b285c726bb88..e95fadb264f7 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -13,6 +13,8 @@
 #include <linux/uaccess.h>
 #include <linux/sched/signal.h>
 #include <linux/compat.h>
+#include <linux/vmalloc.h>
+#include <linux/bitops.h>
 #include <asm/msr.h>
 #include <asm/user.h>
 #include <asm/fpu/internal.h>
@@ -341,3 +343,34 @@ int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext)
 
 	return 0;
 }
+
+int cet_setup_ibt(void)
+{
+	u64 msr_val;
+
+	if (!static_cpu_has(X86_FEATURE_IBT))
+		return -EOPNOTSUPP;
+
+	start_update_msrs();
+	rdmsrl(MSR_IA32_U_CET, msr_val);
+	msr_val |= (CET_ENDBR_EN | CET_NO_TRACK_EN);
+	wrmsrl(MSR_IA32_U_CET, msr_val);
+	end_update_msrs();
+	current->thread.cet.ibt_enabled = 1;
+	return 0;
+}
+
+void cet_disable_ibt(void)
+{
+	u64 msr_val;
+
+	if (!static_cpu_has(X86_FEATURE_IBT))
+		return;
+
+	start_update_msrs();
+	rdmsrl(MSR_IA32_U_CET, msr_val);
+	msr_val &= CET_SHSTK_EN;
+	wrmsrl(MSR_IA32_U_CET, msr_val);
+	end_update_msrs();
+	current->thread.cet.ibt_enabled = 0;
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 084480f975aa..909b4160a2d2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -536,6 +536,23 @@ static __init int setup_disable_shstk(char *s)
 __setup("no_user_shstk", setup_disable_shstk);
 #endif
 
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+static __init int setup_disable_ibt(char *s)
+{
+	/* require an exact match without trailing characters */
+	if (s[0] != '\0')
+		return 0;
+
+	if (!boot_cpu_has(X86_FEATURE_IBT))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_IBT);
+	pr_info("x86: 'no_user_ibt' specified, disabling user Branch Tracking\n");
+	return 1;
+}
+__setup("no_user_ibt", setup_disable_ibt);
+#endif
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h
index edac76ed75e7..e7096a1e2698 100644
--- a/tools/arch/x86/include/asm/disabled-features.h
+++ b/tools/arch/x86/include/asm/disabled-features.h
@@ -62,6 +62,12 @@
 #define DISABLE_SHSTK	(1<<(X86_FEATURE_SHSTK & 31))
 #endif
 
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+#define DISABLE_IBT	0
+#else
+#define DISABLE_IBT	(1<<(X86_FEATURE_IBT & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -83,7 +89,7 @@
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
-#define DISABLED_MASK18	0
+#define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 3/8] x86/cet/ibt: Handle signals for Indirect Branch Tracking
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
  2020-09-25 14:57 ` [PATCH v13 1/8] x86/cet/ibt: Add Kconfig option for user-mode " Yu-cheng Yu
  2020-09-25 14:57 ` [PATCH v13 2/8] x86/cet/ibt: User-mode Indirect Branch Tracking support Yu-cheng Yu
@ 2020-09-25 14:57 ` Yu-cheng Yu
  2020-09-25 14:58 ` [PATCH v13 4/8] x86/cet/ibt: ELF header parsing " Yu-cheng Yu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:57 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

An indirect CALL/JMP moves the indirect branch tracking (IBT) state machine
to WAIT_ENDBR status until the instruction reaches an ENDBR opcode.  If the
CALL/JMP does not reach an ENDBR opcode, the processor raises a control-
protection fault.  WAIT_ENDBR status can be read from MSR_IA32_U_CET.

WAIT_ENDBR is cleared for signal handling, and restored for sigreturn.

IBT state machine is described in Intel SDM Vol. 1, Sec. 18.3.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v9:
- Fix missing WAIT_ENDBR in signal handling.

 arch/x86/kernel/cet.c        | 27 +++++++++++++++++++++++++--
 arch/x86/kernel/fpu/signal.c |  8 +++++---
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index e95fadb264f7..1f8b72269166 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -295,6 +295,13 @@ void cet_restore_signal(struct sc_ext *sc_ext)
 		msr_val |= CET_SHSTK_EN;
 	}
 
+	if (cet->ibt_enabled) {
+		msr_val |= (CET_ENDBR_EN | CET_NO_TRACK_EN);
+
+		if (sc_ext->wait_endbr)
+			msr_val |= CET_WAIT_ENDBR;
+	}
+
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		cet_user_state->user_cet = msr_val;
 	else
@@ -335,9 +342,25 @@ int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext)
 		sc_ext->ssp = new_ssp;
 	}
 
-	if (ssp) {
+	if (ssp || cet->ibt_enabled) {
+
 		start_update_msrs();
-		wrmsrl(MSR_IA32_PL3_SSP, ssp);
+
+		if (ssp)
+			wrmsrl(MSR_IA32_PL3_SSP, ssp);
+
+		if (cet->ibt_enabled) {
+			u64 r;
+
+			rdmsrl(MSR_IA32_U_CET, r);
+
+			if (r & CET_WAIT_ENDBR) {
+				sc_ext->wait_endbr = 1;
+				r &= ~CET_WAIT_ENDBR;
+				wrmsrl(MSR_IA32_U_CET, r);
+			}
+		}
+
 		end_update_msrs();
 	}
 
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c0c2141cb4b3..077853ef6f48 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -57,7 +57,8 @@ int save_cet_to_sigframe(int ia32, void __user *fp, unsigned long restorer)
 {
 	int err = 0;
 
-	if (!current->thread.cet.shstk_size)
+	if (!current->thread.cet.shstk_size &&
+	    !current->thread.cet.ibt_enabled)
 		return 0;
 
 	if (fp) {
@@ -89,7 +90,8 @@ static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext)
 
 	memset(ext, 0, sizeof(*ext));
 
-	if (!current->thread.cet.shstk_size)
+	if (!current->thread.cet.shstk_size &&
+	    !current->thread.cet.ibt_enabled)
 		return 0;
 
 	if (fp) {
@@ -577,7 +579,7 @@ static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp)
 	 * sigcontext_ext is at: fpu + fpu_user_xstate_size +
 	 * FP_XSTATE_MAGIC2_SIZE, then aligned to 8.
 	 */
-	if (cet->shstk_size)
+	if (cet->shstk_size || cet->ibt_enabled)
 		sp -= (sizeof(struct sc_ext) + 8);
 
 	return sp;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 4/8] x86/cet/ibt: ELF header parsing for Indirect Branch Tracking
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
                   ` (2 preceding siblings ...)
  2020-09-25 14:57 ` [PATCH v13 3/8] x86/cet/ibt: Handle signals for Indirect Branch Tracking Yu-cheng Yu
@ 2020-09-25 14:58 ` Yu-cheng Yu
  2020-09-25 14:58 ` [PATCH v13 5/8] x86/cet/ibt: Update arch_prctl functions " Yu-cheng Yu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:58 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Update arch_setup_elf_property() for Indirect Branch Tracking.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v9:
- Change cpu_feature_enabled() to static_cpu_has().

 arch/x86/Kconfig             | 2 ++
 arch/x86/kernel/process_64.c | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 15c7f2606c9d..cc9876f85e91 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1972,6 +1972,8 @@ config X86_BRANCH_TRACKING_USER
 	depends on CPU_SUP_INTEL && X86_64
 	depends on $(cc-option,-fcf-protection)
 	select X86_CET
+	select ARCH_USE_GNU_PROPERTY
+	select ARCH_BINFMT_ELF_STATE
 	help
 	  Indirect Branch Tracking (IBT) provides protection against
 	  CALL-/JMP-oriented programming attacks.  It is active when
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8725e67bcd44..1147a1052a07 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -866,6 +866,14 @@ int arch_setup_elf_property(struct arch_elf_state *state)
 			r = cet_setup_shstk();
 	}
 
+	if (r < 0)
+		return r;
+
+	if (static_cpu_has(X86_FEATURE_IBT)) {
+		if (state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_IBT)
+			r = cet_setup_ibt();
+	}
+
 	return r;
 }
 #endif
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 5/8] x86/cet/ibt: Update arch_prctl functions for Indirect Branch Tracking
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
                   ` (3 preceding siblings ...)
  2020-09-25 14:58 ` [PATCH v13 4/8] x86/cet/ibt: ELF header parsing " Yu-cheng Yu
@ 2020-09-25 14:58 ` Yu-cheng Yu
  2020-09-25 14:58 ` [PATCH v13 6/8] x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point Yu-cheng Yu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:58 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

From: "H.J. Lu" <hjl.tools@gmail.com>

Update ARCH_X86_CET_STATUS and ARCH_X86_CET_DISABLE for Indirect Branch
Tracking.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/kernel/cet_prctl.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c
index bd5ad11763e4..0af1ec5d028f 100644
--- a/arch/x86/kernel/cet_prctl.c
+++ b/arch/x86/kernel/cet_prctl.c
@@ -22,6 +22,9 @@ static int copy_status_to_user(struct cet_status *cet, u64 arg2)
 		buf[2] = (u64)cet->shstk_size;
 	}
 
+	if (cet->ibt_enabled)
+		buf[0] |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+
 	return copy_to_user((u64 __user *)arg2, buf, sizeof(buf));
 }
 
@@ -42,7 +45,8 @@ int prctl_cet(int option, u64 arg2)
 	if (option == ARCH_X86_CET_STATUS)
 		return copy_status_to_user(cet, arg2);
 
-	if (!static_cpu_has(X86_FEATURE_SHSTK))
+	if (!static_cpu_has(X86_FEATURE_SHSTK) &&
+	    !static_cpu_has(X86_FEATURE_IBT))
 		return -EOPNOTSUPP;
 
 	switch (option) {
@@ -56,6 +60,8 @@ int prctl_cet(int option, u64 arg2)
 			return -EINVAL;
 		if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
 			cet_disable_shstk();
+		if (features & GNU_PROPERTY_X86_FEATURE_1_IBT)
+			cet_disable_ibt();
 		return 0;
 
 	case ARCH_X86_CET_LOCK:
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 6/8] x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
                   ` (4 preceding siblings ...)
  2020-09-25 14:58 ` [PATCH v13 5/8] x86/cet/ibt: Update arch_prctl functions " Yu-cheng Yu
@ 2020-09-25 14:58 ` Yu-cheng Yu
  2020-09-25 14:58 ` [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO Yu-cheng Yu
  2020-09-25 14:58 ` [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation Yu-cheng Yu
  7 siblings, 0 replies; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:58 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

From: "H.J. Lu" <hjl.tools@gmail.com>

Add ENDBR32 to __kernel_vsyscall entry point.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vdso/vdso32/system_call.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/vdso/vdso32/system_call.S b/arch/x86/entry/vdso/vdso32/system_call.S
index de1fff7188aa..e331fcdebd95 100644
--- a/arch/x86/entry/vdso/vdso32/system_call.S
+++ b/arch/x86/entry/vdso/vdso32/system_call.S
@@ -14,6 +14,9 @@
 	ALIGN
 __kernel_vsyscall:
 	CFI_STARTPROC
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr32
+#endif
 	/*
 	 * Reshuffle regs so that all of any of the entry instructions
 	 * will preserve enough state.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
                   ` (5 preceding siblings ...)
  2020-09-25 14:58 ` [PATCH v13 6/8] x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point Yu-cheng Yu
@ 2020-09-25 14:58 ` Yu-cheng Yu
  2020-09-25 16:18   ` Andy Lutomirski
  2020-09-25 14:58 ` [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation Yu-cheng Yu
  7 siblings, 1 reply; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:58 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

From: "H.J. Lu" <hjl.tools@gmail.com>

When Indirect Branch Tracking (IBT) is enabled, vDSO functions may be
called indirectly, and must have ENDBR32 or ENDBR64 as the first
instruction.  The compiler must support -fcf-protection=branch so that it
can be used to compile vDSO.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v12:
- Replace object file list with $(vobjs) $(vobjs32).

 arch/x86/entry/vdso/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 215376d975a2..1f1b6893068a 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -94,6 +94,10 @@ endif
 
 $(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
 
+ifdef CONFIG_X86_BRANCH_TRACKING_USER
+$(vobjs) $(vobjs32): KBUILD_CFLAGS += -fcf-protection=branch
+endif
+
 #
 # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
 #
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
                   ` (6 preceding siblings ...)
  2020-09-25 14:58 ` [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO Yu-cheng Yu
@ 2020-09-25 14:58 ` Yu-cheng Yu
  2020-09-25 16:31   ` Andy Lutomirski
  7 siblings, 1 reply; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-25 14:58 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Vsyscall entry points are effectively branch targets.  Mark them with
ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
and reset IBT state machine.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v13:
- Check shadow stack address is canonical.
- Change from writing to MSRs to writing to CET xstate.

 arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
 arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
 arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
 3 files changed, 44 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..315ee3572664 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -38,6 +38,9 @@
 #include <asm/fixmap.h>
 #include <asm/traps.h>
 #include <asm/paravirt.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/fpu/internal.h>
 
 #define CREATE_TRACE_POINTS
 #include "vsyscall_trace.h"
@@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
 	/* Emulate a ret instruction. */
 	regs->ip = caller;
 	regs->sp += 8;
+
+#ifdef CONFIG_X86_CET
+	if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
+		struct cet_user_state *cet;
+		struct fpu *fpu;
+
+		fpu = &tsk->thread.fpu;
+		fpregs_lock();
+
+		if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+			copy_fpregs_to_fpstate(fpu);
+			set_thread_flag(TIF_NEED_FPU_LOAD);
+		}
+
+		cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+		if (!cet) {
+			fpregs_unlock();
+			goto sigsegv;
+		}
+
+		if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
+			cet->user_ssp += 8;
+
+		if (cet->user_cet & CET_ENDBR_EN)
+			cet->user_cet &= ~CET_WAIT_ENDBR;
+
+		__fpu_invalidate_fpregs_state(fpu);
+		fpregs_unlock();
+	}
+#endif
+
 	return true;
 
 sigsegv:
diff --git a/arch/x86/entry/vsyscall/vsyscall_emu_64.S b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index 2e203f3a25a7..b2fa92104cdb 100644
--- a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
@@ -17,16 +17,25 @@ __PAGE_ALIGNED_DATA
 	.type __vsyscall_page, @object
 __vsyscall_page:
 
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
 	mov $__NR_gettimeofday, %rax
 	syscall
 	ret
 
 	.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
 	mov $__NR_time, %rax
 	syscall
 	ret
 
 	.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
 	mov $__NR_getcpu, %rax
 	syscall
 	ret
diff --git a/arch/x86/entry/vsyscall/vsyscall_trace.h b/arch/x86/entry/vsyscall/vsyscall_trace.h
index 3c3f9765a85c..7aa2101ada44 100644
--- a/arch/x86/entry/vsyscall/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -25,6 +25,7 @@ TRACE_EVENT(emulate_vsyscall,
 #endif
 
 #undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
 #define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
 #define TRACE_INCLUDE_FILE vsyscall_trace
 #include <trace/define_trace.h>
-- 
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO
  2020-09-25 14:58 ` [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO Yu-cheng Yu
@ 2020-09-25 16:18   ` Andy Lutomirski
  2020-09-25 16:24     ` Yu, Yu-cheng
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-25 16:18 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> From: "H.J. Lu" <hjl.tools@gmail.com>
>
> When Indirect Branch Tracking (IBT) is enabled, vDSO functions may be
> called indirectly, and must have ENDBR32 or ENDBR64 as the first
> instruction.  The compiler must support -fcf-protection=branch so that it
> can be used to compile vDSO.

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO
  2020-09-25 16:18   ` Andy Lutomirski
@ 2020-09-25 16:24     ` Yu, Yu-cheng
  0 siblings, 0 replies; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-09-25 16:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 9/25/2020 9:18 AM, Andy Lutomirski wrote:
> On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> From: "H.J. Lu" <hjl.tools@gmail.com>
>>
>> When Indirect Branch Tracking (IBT) is enabled, vDSO functions may be
>> called indirectly, and must have ENDBR32 or ENDBR64 as the first
>> instruction.  The compiler must support -fcf-protection=branch so that it
>> can be used to compile vDSO.
> 
> Acked-by: Andy Lutomirski <luto@kernel.org>
> 

Thanks!

Yu-cheng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-25 14:58 ` [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation Yu-cheng Yu
@ 2020-09-25 16:31   ` Andy Lutomirski
  2020-09-25 16:47     ` Yu, Yu-cheng
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-25 16:31 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Vsyscall entry points are effectively branch targets.  Mark them with
> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> and reset IBT state machine.
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
> v13:
> - Check shadow stack address is canonical.
> - Change from writing to MSRs to writing to CET xstate.
>
>  arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
>  arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
>  arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
>  3 files changed, 44 insertions(+)
>
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 44c33103a955..315ee3572664 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -38,6 +38,9 @@
>  #include <asm/fixmap.h>
>  #include <asm/traps.h>
>  #include <asm/paravirt.h>
> +#include <asm/fpu/xstate.h>
> +#include <asm/fpu/types.h>
> +#include <asm/fpu/internal.h>
>
>  #define CREATE_TRACE_POINTS
>  #include "vsyscall_trace.h"
> @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
>         /* Emulate a ret instruction. */
>         regs->ip = caller;
>         regs->sp += 8;
> +
> +#ifdef CONFIG_X86_CET
> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> +               struct cet_user_state *cet;
> +               struct fpu *fpu;
> +
> +               fpu = &tsk->thread.fpu;
> +               fpregs_lock();
> +
> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +                       copy_fpregs_to_fpstate(fpu);
> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> +               }
> +
> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> +               if (!cet) {
> +                       fpregs_unlock();
> +                       goto sigsegv;

I *think* your patchset tries to keep cet.shstk_size and
cet.ibt_enabled in sync with the MSR, in which case it should be
impossible to get here, but a comment and a warning would be much
better than a random sigsegv.

Shouldn't we have a get_xsave_addr_or_allocate() that will never
return NULL but instead will mark the state as in use and set up the
init state if the feature was previously not in use?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-25 16:31   ` Andy Lutomirski
@ 2020-09-25 16:47     ` Yu, Yu-cheng
  2020-09-25 16:51       ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-09-25 16:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 9/25/2020 9:31 AM, Andy Lutomirski wrote:
> On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>

[...]

>> @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
>>          /* Emulate a ret instruction. */
>>          regs->ip = caller;
>>          regs->sp += 8;
>> +
>> +#ifdef CONFIG_X86_CET
>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>> +               struct cet_user_state *cet;
>> +               struct fpu *fpu;
>> +
>> +               fpu = &tsk->thread.fpu;
>> +               fpregs_lock();
>> +
>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>> +                       copy_fpregs_to_fpstate(fpu);
>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
>> +               }
>> +
>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> +               if (!cet) {
>> +                       fpregs_unlock();
>> +                       goto sigsegv;
> 
> I *think* your patchset tries to keep cet.shstk_size and
> cet.ibt_enabled in sync with the MSR, in which case it should be
> impossible to get here, but a comment and a warning would be much
> better than a random sigsegv.

Yes, it should be impossible to get here.  I will add a comment and a 
warning, but still do sigsegv.  Should this happen, and the function 
return, the app gets a control-protection fault.  Why not let it fail early?

>
> Shouldn't we have a get_xsave_addr_or_allocate() that will never
> return NULL but instead will mark the state as in use and set up the
> init state if the feature was previously not in use?
> 

We already have a static __raw_xsave_addr(), which returns a pointer to 
the requested xstate.  Maybe we can export __raw_xsave_addr(), if that 
is needed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-25 16:47     ` Yu, Yu-cheng
@ 2020-09-25 16:51       ` Andy Lutomirski
  2020-09-28 16:59         ` Yu-cheng Yu
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-25 16:51 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu



> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> 
> On 9/25/2020 9:31 AM, Andy Lutomirski wrote:
>>> On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>> 
> 
> [...]
> 
>>> @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
>>>         /* Emulate a ret instruction. */
>>>         regs->ip = caller;
>>>         regs->sp += 8;
>>> +
>>> +#ifdef CONFIG_X86_CET
>>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>>> +               struct cet_user_state *cet;
>>> +               struct fpu *fpu;
>>> +
>>> +               fpu = &tsk->thread.fpu;
>>> +               fpregs_lock();
>>> +
>>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>> +                       copy_fpregs_to_fpstate(fpu);
>>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
>>> +               }
>>> +
>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>> +               if (!cet) {
>>> +                       fpregs_unlock();
>>> +                       goto sigsegv;
>> I *think* your patchset tries to keep cet.shstk_size and
>> cet.ibt_enabled in sync with the MSR, in which case it should be
>> impossible to get here, but a comment and a warning would be much
>> better than a random sigsegv.
> 
> Yes, it should be impossible to get here.  I will add a comment and a warning, but still do sigsegv.  Should this happen, and the function return, the app gets a control-protection fault.  Why not let it fail early?

I’m okay with either approach as long as we get a comment and warning.

> 
>> 
>> Shouldn't we have a get_xsave_addr_or_allocate() that will never
>> return NULL but instead will mark the state as in use and set up the
>> init state if the feature was previously not in use?
> 
> We already have a static __raw_xsave_addr(), which returns a pointer to the requested xstate.  Maybe we can export __raw_xsave_addr(), if that is needed.

I don’t think that’s what we want in general — we want the whole construct of initializing the state if needed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-25 16:51       ` Andy Lutomirski
@ 2020-09-28 16:59         ` Yu-cheng Yu
  2020-09-28 17:37           ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Yu-cheng Yu @ 2020-09-28 16:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > 
> > On 9/25/2020 9:31 AM, Andy Lutomirski wrote:
> > > > On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > > > 
> > 
> > [...]
> > 
> > > > @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
> > > >         /* Emulate a ret instruction. */
> > > >         regs->ip = caller;
> > > >         regs->sp += 8;
> > > > +
> > > > +#ifdef CONFIG_X86_CET
> > > > +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > > > +               struct cet_user_state *cet;
> > > > +               struct fpu *fpu;
> > > > +
> > > > +               fpu = &tsk->thread.fpu;
> > > > +               fpregs_lock();
> > > > +
> > > > +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > > > +                       copy_fpregs_to_fpstate(fpu);
> > > > +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> > > > +               }
> > > > +
> > > > +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > > +               if (!cet) {
> > > > +                       fpregs_unlock();
> > > > +                       goto sigsegv;
> > > I *think* your patchset tries to keep cet.shstk_size and
> > > cet.ibt_enabled in sync with the MSR, in which case it should be
> > > impossible to get here, but a comment and a warning would be much
> > > better than a random sigsegv.
> > 
> > Yes, it should be impossible to get here.  I will add a comment and a warning, but still do sigsegv.  Should this happen, and the function return, the app gets a control-protection fault.  Why not let it fail early?
> 
> I’m okay with either approach as long as we get a comment and warning.
> 

Here is the updated patch.  I can also re-send the whole series as v14.  Thanks!

======

From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
From: Yu-cheng Yu <yu-cheng.yu@intel.com>
Date: Thu, 29 Nov 2018 14:15:38 -0800
Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch
 Tracking for vsyscall emulation

Vsyscall entry points are effectively branch targets.  Mark them with
ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
and reset IBT state machine.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v13:
- Check shadow stack address is canonical.
- Change from writing to MSRs to
writing to CET xstate.

 arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
 arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
 arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
 3 files changed, 44 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..30b166091d46 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -38,6 +38,9 @@
 #include <asm/fixmap.h>
 #include <asm/traps.h>
 #include <asm/paravirt.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/fpu/internal.h>
 
 #define CREATE_TRACE_POINTS
 #include "vsyscall_trace.h"
@@ -286,6 +289,42 @@ bool emulate_vsyscall(unsigned long error_code,
 	/* Emulate a ret instruction. */
 	regs->ip = caller;
 	regs->sp += 8;
+
+#ifdef CONFIG_X86_CET
+	if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
+		struct cet_user_state *cet;
+		struct fpu *fpu;
+
+		fpu = &tsk->thread.fpu;
+		fpregs_lock();
+
+		if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+			copy_fpregs_to_fpstate(fpu);
+			set_thread_flag(TIF_NEED_FPU_LOAD);
+		}
+
+		cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+		if (!cet) {
+			/*
+			 * This is an unlikely case where the task is
+			 * CET-enabled, but CET xstate is in INIT.
+			 */
+			WARN_ONCE(1, "CET is enabled, but no xstates");
+			fpregs_unlock();
+			goto sigsegv;
+		}
+
+		if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
+			cet->user_ssp += 8;
+
+		if (cet->user_cet & CET_ENDBR_EN)
+			cet->user_cet &= ~CET_WAIT_ENDBR;
+
+		__fpu_invalidate_fpregs_state(fpu);
+		fpregs_unlock();
+	}
+#endif
+
 	return true;
 
 sigsegv:
diff --git a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index 2e203f3a25a7..b2fa92104cdb 100644
--- a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
@@ -17,16 +17,25 @@ __PAGE_ALIGNED_DATA
 	.type __vsyscall_page, @object
 __vsyscall_page:
 
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
 	mov $__NR_gettimeofday, %rax
 	syscall
 	ret
 
 	.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
 	mov $__NR_time, %rax
 	syscall
 	ret
 
 	.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
 	mov $__NR_getcpu, %rax
 	syscall
 	ret
diff --git a/arch/x86/entry/vsyscall/vsyscall_trace.h
b/arch/x86/entry/vsyscall/vsyscall_trace.h
index 3c3f9765a85c..7aa2101ada44 100644
--- a/arch/x86/entry/vsyscall/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -25,6 +25,7 @@ TRACE_EVENT(emulate_vsyscall,
 #endif
 
 #undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
 #define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
 #define TRACE_INCLUDE_FILE vsyscall_trace
 #include <trace/define_trace.h>
-- 
2.21.0




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-28 16:59         ` Yu-cheng Yu
@ 2020-09-28 17:37           ` Andy Lutomirski
  2020-09-28 19:04             ` Yu, Yu-cheng
  2020-09-29 18:37             ` Yu, Yu-cheng
  0 siblings, 2 replies; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-28 17:37 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > > On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> +
> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> +               if (!cet) {
> +                       /*
> +                        * This is an unlikely case where the task is
> +                        * CET-enabled, but CET xstate is in INIT.
> +                        */
> +                       WARN_ONCE(1, "CET is enabled, but no xstates");

"unlikely" doesn't really cover this.

> +                       fpregs_unlock();
> +                       goto sigsegv;
> +               }
> +
> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> +                       cet->user_ssp += 8;

This looks buggy.  The condition should be "if SHSTK is on, then add 8
to user_ssp".  If the result is noncanonical, then some appropriate
exception should be generated, probably by the FPU restore code -- see
below.  You should be checking the SHSTK_EN bit, not SSP.

Also, can you point me to where any of these canonicality rules are
documented in the SDM?  I looked and I can't find them.


This reminds me: this code in extable.c needs to change.

__visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
                                    struct pt_regs *regs, int trapnr,
                                    unsigned long error_code,
                                    unsigned long fault_addr)
{
        regs->ip = ex_fixup_addr(fixup);

        WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
FPU registers.",
                  (void *)instruction_pointer(regs));

        __copy_kernel_to_fpregs(&init_fpstate, -1);

Now that we have supervisor states like CET, this is buggy.  This
should do something intelligent like initializing all the *user* state
and trying again.  If that succeeds, a signal should be sent rather
than just corrupting the task.  And if it fails, then perhaps some
actual intelligence is needed.  We certainly should not just disable
CET because something is wrong with the CET MSRs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-28 17:37           ` Andy Lutomirski
@ 2020-09-28 19:04             ` Yu, Yu-cheng
  2020-09-29 18:37             ` Yu, Yu-cheng
  1 sibling, 0 replies; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-09-28 19:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>> +
>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> +               if (!cet) {
>> +                       /*
>> +                        * This is an unlikely case where the task is
>> +                        * CET-enabled, but CET xstate is in INIT.
>> +                        */
>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> 
> "unlikely" doesn't really cover this.
> 
>> +                       fpregs_unlock();
>> +                       goto sigsegv;
>> +               }
>> +
>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
>> +                       cet->user_ssp += 8;
> 
> This looks buggy.  The condition should be "if SHSTK is on, then add 8
> to user_ssp".  If the result is noncanonical, then some appropriate
> exception should be generated, probably by the FPU restore code -- see
> below.  You should be checking the SHSTK_EN bit, not SSP.

The code now checks if shadow stack is on (yes, it should check SHSTK_EN 
bit, I will fix it.), then adds 8 to user_ssp.  If the result is 
canonical, then it sets the corresponding xstate.

If the resulting address is not canonical, the kernel does not know what 
the address should be either.  I think the best action to take is doing 
nothing about the shadow stack pointer, and let the application return 
and get a control protection fault.  The application should have not got 
into such situation in the first place; if it does, it should fault.

> 
> Also, can you point me to where any of these canonicality rules are
> documented in the SDM?  I looked and I can't find them.

The SDM is not very explicit.  It should have been.

> 
> This reminds me: this code in extable.c needs to change.
> 
> __visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
>                                      struct pt_regs *regs, int trapnr,
>                                      unsigned long error_code,
>                                      unsigned long fault_addr)
> {
>          regs->ip = ex_fixup_addr(fixup);
> 
>          WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
> FPU registers.",
>                    (void *)instruction_pointer(regs));
> 
>          __copy_kernel_to_fpregs(&init_fpstate, -1);
> 
> Now that we have supervisor states like CET, this is buggy.  This
> should do something intelligent like initializing all the *user* state
> and trying again.  If that succeeds, a signal should be sent rather
> than just corrupting the task.  And if it fails, then perhaps some
> actual intelligence is needed.  We certainly should not just disable
> CET because something is wrong with the CET MSRs.
> 

Yes, but it needs more thought.  Maybe a separate patch and more discussion?

Yu-cheng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-28 17:37           ` Andy Lutomirski
  2020-09-28 19:04             ` Yu, Yu-cheng
@ 2020-09-29 18:37             ` Yu, Yu-cheng
  2020-09-29 19:57               ` Andy Lutomirski
  1 sibling, 1 reply; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-09-29 18:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>> +
>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> +               if (!cet) {
>> +                       /*
>> +                        * This is an unlikely case where the task is
>> +                        * CET-enabled, but CET xstate is in INIT.
>> +                        */
>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> 
> "unlikely" doesn't really cover this.
> 
>> +                       fpregs_unlock();
>> +                       goto sigsegv;
>> +               }
>> +
>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
>> +                       cet->user_ssp += 8;
> 
> This looks buggy.  The condition should be "if SHSTK is on, then add 8
> to user_ssp".  If the result is noncanonical, then some appropriate
> exception should be generated, probably by the FPU restore code -- see
> below.  You should be checking the SHSTK_EN bit, not SSP.

Updated.  Is this OK?  I will resend the whole series later.

Thanks,
Yu-cheng

======

 From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
From: Yu-cheng Yu <yu-cheng.yu@intel.com>
Date: Thu, 29 Nov 2018 14:15:38 -0800
Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and 
Indirect Branch
  Tracking for vsyscall emulation

Vsyscall entry points are effectively branch targets.  Mark them with
ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
and reset IBT state machine.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
v13:
- Check shadow stack address is canonical.
- Change from writing to MSRs to writing to CET xstate.

  arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
  arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
  arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
  3 files changed, 44 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c 
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..30b166091d46 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -38,6 +38,9 @@
  #include <asm/fixmap.h>
  #include <asm/traps.h>
  #include <asm/paravirt.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/fpu/internal.h>

  #define CREATE_TRACE_POINTS
  #include "vsyscall_trace.h"
@@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
  	/* Emulate a ret instruction. */
  	regs->ip = caller;
  	regs->sp += 8;
+
+#ifdef CONFIG_X86_CET
+	if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
+		struct cet_user_state *cet;
+		struct fpu *fpu;
+
+		fpu = &tsk->thread.fpu;
+		fpregs_lock();
+
+		if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+			copy_fpregs_to_fpstate(fpu);
+			set_thread_flag(TIF_NEED_FPU_LOAD);
+		}
+
+		cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+		if (!cet) {
+			/*
+			 * This should not happen.  The task is
+			 * CET-enabled, but CET xstate is in INIT.
+			 */
+			WARN_ONCE(1, "CET is enabled, but no xstates");
+			fpregs_unlock();
+			goto sigsegv;
+		}
+
+		if (cet->user_cet & CET_SHSTK_EN) {
+			if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
+				cet->user_ssp += 8;
+		}
+
+		if (cet->user_cet & CET_ENDBR_EN)
+			cet->user_cet &= ~CET_WAIT_ENDBR;
+
+		__fpu_invalidate_fpregs_state(fpu);
+		fpregs_unlock();
+	}
+#endif
+
  	return true;

  sigsegv:
diff --git a/arch/x86/entry/vsyscall/vsyscall_emu_64.S 
b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index 2e203f3a25a7..b2fa92104cdb 100644
--- a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
@@ -17,16 +17,25 @@ __PAGE_ALIGNED_DATA
  	.type __vsyscall_page, @object
  __vsyscall_page:

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
  	mov $__NR_gettimeofday, %rax
  	syscall
  	ret

  	.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
  	mov $__NR_time, %rax
  	syscall
  	ret

  	.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+	endbr64
+#endif
  	mov $__NR_getcpu, %rax
  	syscall
  	ret
diff --git a/arch/x86/entry/vsyscall/vsyscall_trace.h 
b/arch/x86/entry/vsyscall/vsyscall_trace.h
index 3c3f9765a85c..7aa2101ada44 100644
--- a/arch/x86/entry/vsyscall/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -25,6 +25,7 @@ TRACE_EVENT(emulate_vsyscall,
  #endif

  #undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
  #define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
  #define TRACE_INCLUDE_FILE vsyscall_trace
  #include <trace/define_trace.h>
-- 
2.21.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-29 18:37             ` Yu, Yu-cheng
@ 2020-09-29 19:57               ` Andy Lutomirski
  2020-09-29 20:00                 ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-29 19:57 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>
> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> >>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> >> +
> >> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >> +               if (!cet) {
> >> +                       /*
> >> +                        * This is an unlikely case where the task is
> >> +                        * CET-enabled, but CET xstate is in INIT.
> >> +                        */
> >> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> >
> > "unlikely" doesn't really cover this.
> >
> >> +                       fpregs_unlock();
> >> +                       goto sigsegv;
> >> +               }
> >> +
> >> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> >> +                       cet->user_ssp += 8;
> >
> > This looks buggy.  The condition should be "if SHSTK is on, then add 8
> > to user_ssp".  If the result is noncanonical, then some appropriate
> > exception should be generated, probably by the FPU restore code -- see
> > below.  You should be checking the SHSTK_EN bit, not SSP.
>
> Updated.  Is this OK?  I will resend the whole series later.
>
> Thanks,
> Yu-cheng
>
> ======
>
>  From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Date: Thu, 29 Nov 2018 14:15:38 -0800
> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> Indirect Branch
>   Tracking for vsyscall emulation
>
> Vsyscall entry points are effectively branch targets.  Mark them with
> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> and reset IBT state machine.
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
> v13:
> - Check shadow stack address is canonical.
> - Change from writing to MSRs to writing to CET xstate.
>
>   arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
>   arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
>   arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
>   3 files changed, 44 insertions(+)
>
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 44c33103a955..30b166091d46 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -38,6 +38,9 @@
>   #include <asm/fixmap.h>
>   #include <asm/traps.h>
>   #include <asm/paravirt.h>
> +#include <asm/fpu/xstate.h>
> +#include <asm/fpu/types.h>
> +#include <asm/fpu/internal.h>
>
>   #define CREATE_TRACE_POINTS
>   #include "vsyscall_trace.h"
> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
>         /* Emulate a ret instruction. */
>         regs->ip = caller;
>         regs->sp += 8;
> +
> +#ifdef CONFIG_X86_CET
> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> +               struct cet_user_state *cet;
> +               struct fpu *fpu;
> +
> +               fpu = &tsk->thread.fpu;
> +               fpregs_lock();
> +
> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +                       copy_fpregs_to_fpstate(fpu);
> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> +               }
> +
> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> +               if (!cet) {
> +                       /*
> +                        * This should not happen.  The task is
> +                        * CET-enabled, but CET xstate is in INIT.
> +                        */

Can the comment explain better, please?  I would say something like:

If the kernel thinks this task has CET enabled (because
tsk->thread.cet has one of the features enabled), then the
corresponding bits must also be set in the CET XSAVES region.  If the
CET XSAVES region is in the INIT state, then the kernel's concept of
the task's CET state is corrupt.

> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> +                       fpregs_unlock();
> +                       goto sigsegv;
> +               }
> +
> +               if (cet->user_cet & CET_SHSTK_EN) {
> +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> +                               cet->user_ssp += 8;
> +               }

This makes so sense to me.  Also, the vsyscall emulation code is
intended to be as rigid as possible to minimize the chance that it
gets used as an exploit gadget.  So we should not silently corrupt
anything.  Moreover, this code seems quite dangerous -- you've created
a gadget that does RET without actually verifying the SHSTK token.  If
SHSTK and some form of strong indirect branch/call CFI is in use, then
the existance of a CFI-bypassing return primitive at a fixed address
seems quite problematic.

So I think you need to write a function that reasonably accurately
emulates a usermode RET.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-29 19:57               ` Andy Lutomirski
@ 2020-09-29 20:00                 ` Andy Lutomirski
  2020-09-30 22:33                   ` Yu, Yu-cheng
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-29 20:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu, Yu-cheng, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> >
> > On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > > On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >>
> > >> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > >>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > >> +
> > >> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > >> +               if (!cet) {
> > >> +                       /*
> > >> +                        * This is an unlikely case where the task is
> > >> +                        * CET-enabled, but CET xstate is in INIT.
> > >> +                        */
> > >> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > >
> > > "unlikely" doesn't really cover this.
> > >
> > >> +                       fpregs_unlock();
> > >> +                       goto sigsegv;
> > >> +               }
> > >> +
> > >> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > >> +                       cet->user_ssp += 8;
> > >
> > > This looks buggy.  The condition should be "if SHSTK is on, then add 8
> > > to user_ssp".  If the result is noncanonical, then some appropriate
> > > exception should be generated, probably by the FPU restore code -- see
> > > below.  You should be checking the SHSTK_EN bit, not SSP.
> >
> > Updated.  Is this OK?  I will resend the whole series later.
> >
> > Thanks,
> > Yu-cheng
> >
> > ======
> >
> >  From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > Date: Thu, 29 Nov 2018 14:15:38 -0800
> > Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > Indirect Branch
> >   Tracking for vsyscall emulation
> >
> > Vsyscall entry points are effectively branch targets.  Mark them with
> > ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> > and reset IBT state machine.
> >
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> > v13:
> > - Check shadow stack address is canonical.
> > - Change from writing to MSRs to writing to CET xstate.
> >
> >   arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
> >   arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
> >   arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
> >   3 files changed, 44 insertions(+)
> >
> > diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > b/arch/x86/entry/vsyscall/vsyscall_64.c
> > index 44c33103a955..30b166091d46 100644
> > --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > @@ -38,6 +38,9 @@
> >   #include <asm/fixmap.h>
> >   #include <asm/traps.h>
> >   #include <asm/paravirt.h>
> > +#include <asm/fpu/xstate.h>
> > +#include <asm/fpu/types.h>
> > +#include <asm/fpu/internal.h>
> >
> >   #define CREATE_TRACE_POINTS
> >   #include "vsyscall_trace.h"
> > @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> >         /* Emulate a ret instruction. */
> >         regs->ip = caller;
> >         regs->sp += 8;
> > +
> > +#ifdef CONFIG_X86_CET
> > +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > +               struct cet_user_state *cet;
> > +               struct fpu *fpu;
> > +
> > +               fpu = &tsk->thread.fpu;
> > +               fpregs_lock();
> > +
> > +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > +                       copy_fpregs_to_fpstate(fpu);
> > +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> > +               }
> > +
> > +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > +               if (!cet) {
> > +                       /*
> > +                        * This should not happen.  The task is
> > +                        * CET-enabled, but CET xstate is in INIT.
> > +                        */
>
> Can the comment explain better, please?  I would say something like:
>
> If the kernel thinks this task has CET enabled (because
> tsk->thread.cet has one of the features enabled), then the
> corresponding bits must also be set in the CET XSAVES region.  If the
> CET XSAVES region is in the INIT state, then the kernel's concept of
> the task's CET state is corrupt.
>
> > +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > +                       fpregs_unlock();
> > +                       goto sigsegv;
> > +               }
> > +
> > +               if (cet->user_cet & CET_SHSTK_EN) {
> > +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > +                               cet->user_ssp += 8;
> > +               }
>
> This makes so sense to me.  Also, the vsyscall emulation code is
> intended to be as rigid as possible to minimize the chance that it
> gets used as an exploit gadget.  So we should not silently corrupt
> anything.  Moreover, this code seems quite dangerous -- you've created
> a gadget that does RET without actually verifying the SHSTK token.  If
> SHSTK and some form of strong indirect branch/call CFI is in use, then
> the existance of a CFI-bypassing return primitive at a fixed address
> seems quite problematic.
>
> So I think you need to write a function that reasonably accurately
> emulates a usermode RET.
>

For what it's worth, I think there is an alternative.  If you all
(userspace people, etc) can come up with a credible way for a user
program to statically declare that it doesn't need vsyscalls, then we
could make SHSTK depend on *that*, and we could avoid this mess.  This
breaks orthogonality, but it's probably a decent outcome.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-29 20:00                 ` Andy Lutomirski
@ 2020-09-30 22:33                   ` Yu, Yu-cheng
  2020-09-30 23:44                     ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-09-30 22:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <luto@kernel.org> wrote:
>>
>> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>>>
>>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
>>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>>>>
>>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
>>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>>>>> +
>>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>>>> +               if (!cet) {
>>>>> +                       /*
>>>>> +                        * This is an unlikely case where the task is
>>>>> +                        * CET-enabled, but CET xstate is in INIT.
>>>>> +                        */
>>>>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
>>>>
>>>> "unlikely" doesn't really cover this.
>>>>
>>>>> +                       fpregs_unlock();
>>>>> +                       goto sigsegv;
>>>>> +               }
>>>>> +
>>>>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
>>>>> +                       cet->user_ssp += 8;
>>>>
>>>> This looks buggy.  The condition should be "if SHSTK is on, then add 8
>>>> to user_ssp".  If the result is noncanonical, then some appropriate
>>>> exception should be generated, probably by the FPU restore code -- see
>>>> below.  You should be checking the SHSTK_EN bit, not SSP.
>>>
>>> Updated.  Is this OK?  I will resend the whole series later.
>>>
>>> Thanks,
>>> Yu-cheng
>>>
>>> ======
>>>
>>>   From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
>>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
>>> Indirect Branch
>>>    Tracking for vsyscall emulation
>>>
>>> Vsyscall entry points are effectively branch targets.  Mark them with
>>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
>>> and reset IBT state machine.
>>>
>>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>> ---
>>> v13:
>>> - Check shadow stack address is canonical.
>>> - Change from writing to MSRs to writing to CET xstate.
>>>
>>>    arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
>>>    arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
>>>    arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
>>>    3 files changed, 44 insertions(+)
>>>
>>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
>>> b/arch/x86/entry/vsyscall/vsyscall_64.c
>>> index 44c33103a955..30b166091d46 100644
>>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
>>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
>>> @@ -38,6 +38,9 @@
>>>    #include <asm/fixmap.h>
>>>    #include <asm/traps.h>
>>>    #include <asm/paravirt.h>
>>> +#include <asm/fpu/xstate.h>
>>> +#include <asm/fpu/types.h>
>>> +#include <asm/fpu/internal.h>
>>>
>>>    #define CREATE_TRACE_POINTS
>>>    #include "vsyscall_trace.h"
>>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
>>>          /* Emulate a ret instruction. */
>>>          regs->ip = caller;
>>>          regs->sp += 8;
>>> +
>>> +#ifdef CONFIG_X86_CET
>>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>>> +               struct cet_user_state *cet;
>>> +               struct fpu *fpu;
>>> +
>>> +               fpu = &tsk->thread.fpu;
>>> +               fpregs_lock();
>>> +
>>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>> +                       copy_fpregs_to_fpstate(fpu);
>>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
>>> +               }
>>> +
>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>> +               if (!cet) {
>>> +                       /*
>>> +                        * This should not happen.  The task is
>>> +                        * CET-enabled, but CET xstate is in INIT.
>>> +                        */
>>
>> Can the comment explain better, please?  I would say something like:
>>
>> If the kernel thinks this task has CET enabled (because
>> tsk->thread.cet has one of the features enabled), then the
>> corresponding bits must also be set in the CET XSAVES region.  If the
>> CET XSAVES region is in the INIT state, then the kernel's concept of
>> the task's CET state is corrupt.
>>
>>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
>>> +                       fpregs_unlock();
>>> +                       goto sigsegv;
>>> +               }
>>> +
>>> +               if (cet->user_cet & CET_SHSTK_EN) {
>>> +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
>>> +                               cet->user_ssp += 8;
>>> +               }
>>
>> This makes so sense to me.  Also, the vsyscall emulation code is
>> intended to be as rigid as possible to minimize the chance that it
>> gets used as an exploit gadget.  So we should not silently corrupt
>> anything.  Moreover, this code seems quite dangerous -- you've created
>> a gadget that does RET without actually verifying the SHSTK token.  If
>> SHSTK and some form of strong indirect branch/call CFI is in use, then
>> the existance of a CFI-bypassing return primitive at a fixed address
>> seems quite problematic.
>>
>> So I think you need to write a function that reasonably accurately
>> emulates a usermode RET.
>>
> 
> For what it's worth, I think there is an alternative.  If you all
> (userspace people, etc) can come up with a credible way for a user
> program to statically declare that it doesn't need vsyscalls, then we
> could make SHSTK depend on *that*, and we could avoid this mess.  This
> breaks orthogonality, but it's probably a decent outcome.
> 

Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a 
thread flag, and in emulate_vsyscall(), checks the flag.

When CET is enabled, ld-linux will do DISABLE_VSYSCALL.

How is that?

Yu-cheng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-30 22:33                   ` Yu, Yu-cheng
@ 2020-09-30 23:44                     ` Andy Lutomirski
  2020-10-01  1:00                       ` H.J. Lu
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-09-30 23:44 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>
> On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <luto@kernel.org> wrote:
> >>
> >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> >>>
> >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>>>>
> >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> >>>>> +
> >>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >>>>> +               if (!cet) {
> >>>>> +                       /*
> >>>>> +                        * This is an unlikely case where the task is
> >>>>> +                        * CET-enabled, but CET xstate is in INIT.
> >>>>> +                        */
> >>>>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> >>>>
> >>>> "unlikely" doesn't really cover this.
> >>>>
> >>>>> +                       fpregs_unlock();
> >>>>> +                       goto sigsegv;
> >>>>> +               }
> >>>>> +
> >>>>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> >>>>> +                       cet->user_ssp += 8;
> >>>>
> >>>> This looks buggy.  The condition should be "if SHSTK is on, then add 8
> >>>> to user_ssp".  If the result is noncanonical, then some appropriate
> >>>> exception should be generated, probably by the FPU restore code -- see
> >>>> below.  You should be checking the SHSTK_EN bit, not SSP.
> >>>
> >>> Updated.  Is this OK?  I will resend the whole series later.
> >>>
> >>> Thanks,
> >>> Yu-cheng
> >>>
> >>> ======
> >>>
> >>>   From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> >>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> >>> Indirect Branch
> >>>    Tracking for vsyscall emulation
> >>>
> >>> Vsyscall entry points are effectively branch targets.  Mark them with
> >>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> >>> and reset IBT state machine.
> >>>
> >>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> >>> ---
> >>> v13:
> >>> - Check shadow stack address is canonical.
> >>> - Change from writing to MSRs to writing to CET xstate.
> >>>
> >>>    arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
> >>>    arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
> >>>    arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
> >>>    3 files changed, 44 insertions(+)
> >>>
> >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> index 44c33103a955..30b166091d46 100644
> >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> @@ -38,6 +38,9 @@
> >>>    #include <asm/fixmap.h>
> >>>    #include <asm/traps.h>
> >>>    #include <asm/paravirt.h>
> >>> +#include <asm/fpu/xstate.h>
> >>> +#include <asm/fpu/types.h>
> >>> +#include <asm/fpu/internal.h>
> >>>
> >>>    #define CREATE_TRACE_POINTS
> >>>    #include "vsyscall_trace.h"
> >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> >>>          /* Emulate a ret instruction. */
> >>>          regs->ip = caller;
> >>>          regs->sp += 8;
> >>> +
> >>> +#ifdef CONFIG_X86_CET
> >>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> >>> +               struct cet_user_state *cet;
> >>> +               struct fpu *fpu;
> >>> +
> >>> +               fpu = &tsk->thread.fpu;
> >>> +               fpregs_lock();
> >>> +
> >>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> >>> +                       copy_fpregs_to_fpstate(fpu);
> >>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> >>> +               }
> >>> +
> >>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >>> +               if (!cet) {
> >>> +                       /*
> >>> +                        * This should not happen.  The task is
> >>> +                        * CET-enabled, but CET xstate is in INIT.
> >>> +                        */
> >>
> >> Can the comment explain better, please?  I would say something like:
> >>
> >> If the kernel thinks this task has CET enabled (because
> >> tsk->thread.cet has one of the features enabled), then the
> >> corresponding bits must also be set in the CET XSAVES region.  If the
> >> CET XSAVES region is in the INIT state, then the kernel's concept of
> >> the task's CET state is corrupt.
> >>
> >>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> >>> +                       fpregs_unlock();
> >>> +                       goto sigsegv;
> >>> +               }
> >>> +
> >>> +               if (cet->user_cet & CET_SHSTK_EN) {
> >>> +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> >>> +                               cet->user_ssp += 8;
> >>> +               }
> >>
> >> This makes so sense to me.  Also, the vsyscall emulation code is
> >> intended to be as rigid as possible to minimize the chance that it
> >> gets used as an exploit gadget.  So we should not silently corrupt
> >> anything.  Moreover, this code seems quite dangerous -- you've created
> >> a gadget that does RET without actually verifying the SHSTK token.  If
> >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> >> the existance of a CFI-bypassing return primitive at a fixed address
> >> seems quite problematic.
> >>
> >> So I think you need to write a function that reasonably accurately
> >> emulates a usermode RET.
> >>
> >
> > For what it's worth, I think there is an alternative.  If you all
> > (userspace people, etc) can come up with a credible way for a user
> > program to statically declare that it doesn't need vsyscalls, then we
> > could make SHSTK depend on *that*, and we could avoid this mess.  This
> > breaks orthogonality, but it's probably a decent outcome.
> >
>
> Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
> thread flag, and in emulate_vsyscall(), checks the flag.
>
> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
>
> How is that?

Backwards, no?  Presumably vsyscall needs to be disabled before or
concurrently with CET being enabled, not after.

I think the solution of making vsyscall emulation work correctly with
CET is going to be better and possibly more straightforward.

>
> Yu-cheng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-09-30 23:44                     ` Andy Lutomirski
@ 2020-10-01  1:00                       ` H.J. Lu
  2020-10-01  1:10                         ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2020-10-01  1:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu, Yu-cheng, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> >
> > On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >>
> > >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > >>>
> > >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >>>>>
> > >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > >>>>> +
> > >>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > >>>>> +               if (!cet) {
> > >>>>> +                       /*
> > >>>>> +                        * This is an unlikely case where the task is
> > >>>>> +                        * CET-enabled, but CET xstate is in INIT.
> > >>>>> +                        */
> > >>>>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > >>>>
> > >>>> "unlikely" doesn't really cover this.
> > >>>>
> > >>>>> +                       fpregs_unlock();
> > >>>>> +                       goto sigsegv;
> > >>>>> +               }
> > >>>>> +
> > >>>>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > >>>>> +                       cet->user_ssp += 8;
> > >>>>
> > >>>> This looks buggy.  The condition should be "if SHSTK is on, then add 8
> > >>>> to user_ssp".  If the result is noncanonical, then some appropriate
> > >>>> exception should be generated, probably by the FPU restore code -- see
> > >>>> below.  You should be checking the SHSTK_EN bit, not SSP.
> > >>>
> > >>> Updated.  Is this OK?  I will resend the whole series later.
> > >>>
> > >>> Thanks,
> > >>> Yu-cheng
> > >>>
> > >>> ======
> > >>>
> > >>>   From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > >>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> > >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > >>> Indirect Branch
> > >>>    Tracking for vsyscall emulation
> > >>>
> > >>> Vsyscall entry points are effectively branch targets.  Mark them with
> > >>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> > >>> and reset IBT state machine.
> > >>>
> > >>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > >>> ---
> > >>> v13:
> > >>> - Check shadow stack address is canonical.
> > >>> - Change from writing to MSRs to writing to CET xstate.
> > >>>
> > >>>    arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
> > >>>    arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
> > >>>    arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
> > >>>    3 files changed, 44 insertions(+)
> > >>>
> > >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> index 44c33103a955..30b166091d46 100644
> > >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> @@ -38,6 +38,9 @@
> > >>>    #include <asm/fixmap.h>
> > >>>    #include <asm/traps.h>
> > >>>    #include <asm/paravirt.h>
> > >>> +#include <asm/fpu/xstate.h>
> > >>> +#include <asm/fpu/types.h>
> > >>> +#include <asm/fpu/internal.h>
> > >>>
> > >>>    #define CREATE_TRACE_POINTS
> > >>>    #include "vsyscall_trace.h"
> > >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > >>>          /* Emulate a ret instruction. */
> > >>>          regs->ip = caller;
> > >>>          regs->sp += 8;
> > >>> +
> > >>> +#ifdef CONFIG_X86_CET
> > >>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > >>> +               struct cet_user_state *cet;
> > >>> +               struct fpu *fpu;
> > >>> +
> > >>> +               fpu = &tsk->thread.fpu;
> > >>> +               fpregs_lock();
> > >>> +
> > >>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > >>> +                       copy_fpregs_to_fpstate(fpu);
> > >>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> > >>> +               }
> > >>> +
> > >>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > >>> +               if (!cet) {
> > >>> +                       /*
> > >>> +                        * This should not happen.  The task is
> > >>> +                        * CET-enabled, but CET xstate is in INIT.
> > >>> +                        */
> > >>
> > >> Can the comment explain better, please?  I would say something like:
> > >>
> > >> If the kernel thinks this task has CET enabled (because
> > >> tsk->thread.cet has one of the features enabled), then the
> > >> corresponding bits must also be set in the CET XSAVES region.  If the
> > >> CET XSAVES region is in the INIT state, then the kernel's concept of
> > >> the task's CET state is corrupt.
> > >>
> > >>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > >>> +                       fpregs_unlock();
> > >>> +                       goto sigsegv;
> > >>> +               }
> > >>> +
> > >>> +               if (cet->user_cet & CET_SHSTK_EN) {
> > >>> +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > >>> +                               cet->user_ssp += 8;
> > >>> +               }
> > >>
> > >> This makes so sense to me.  Also, the vsyscall emulation code is
> > >> intended to be as rigid as possible to minimize the chance that it
> > >> gets used as an exploit gadget.  So we should not silently corrupt
> > >> anything.  Moreover, this code seems quite dangerous -- you've created
> > >> a gadget that does RET without actually verifying the SHSTK token.  If
> > >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> > >> the existance of a CFI-bypassing return primitive at a fixed address
> > >> seems quite problematic.
> > >>
> > >> So I think you need to write a function that reasonably accurately
> > >> emulates a usermode RET.
> > >>
> > >
> > > For what it's worth, I think there is an alternative.  If you all
> > > (userspace people, etc) can come up with a credible way for a user
> > > program to statically declare that it doesn't need vsyscalls, then we
> > > could make SHSTK depend on *that*, and we could avoid this mess.  This
> > > breaks orthogonality, but it's probably a decent outcome.
> > >
> >
> > Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
> > thread flag, and in emulate_vsyscall(), checks the flag.
> >
> > When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> >
> > How is that?
>
> Backwards, no?  Presumably vsyscall needs to be disabled before or
> concurrently with CET being enabled, not after.
>
> I think the solution of making vsyscall emulation work correctly with
> CET is going to be better and possibly more straightforward.
>

We can do

1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
2. If CPU supports CET and the program is CET enabled:
    a. Disable the vsyscall page.
    b. Pass control to user.
    c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.

So when control is passed from kernel to user, the vsyscall page is
disabled if the program
is CET enabled.

-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-10-01  1:00                       ` H.J. Lu
@ 2020-10-01  1:10                         ` Andy Lutomirski
  2020-10-01  1:21                           ` H.J. Lu
  2020-10-01 16:51                           ` Yu, Yu-cheng
  0 siblings, 2 replies; 29+ messages in thread
From: Andy Lutomirski @ 2020-10-01  1:10 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Andy Lutomirski, Yu, Yu-cheng, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, LKML, open list:DOCUMENTATION,
	Linux-MM, linux-arch, Linux API, Arnd Bergmann, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > >
> > > On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > > > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > >>
> > > >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > > >>>
> > > >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > > >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > > >>>>>
> > > >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > > >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > > >>>>> +
> > > >>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > >>>>> +               if (!cet) {
> > > >>>>> +                       /*
> > > >>>>> +                        * This is an unlikely case where the task is
> > > >>>>> +                        * CET-enabled, but CET xstate is in INIT.
> > > >>>>> +                        */
> > > >>>>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > > >>>>
> > > >>>> "unlikely" doesn't really cover this.
> > > >>>>
> > > >>>>> +                       fpregs_unlock();
> > > >>>>> +                       goto sigsegv;
> > > >>>>> +               }
> > > >>>>> +
> > > >>>>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > > >>>>> +                       cet->user_ssp += 8;
> > > >>>>
> > > >>>> This looks buggy.  The condition should be "if SHSTK is on, then add 8
> > > >>>> to user_ssp".  If the result is noncanonical, then some appropriate
> > > >>>> exception should be generated, probably by the FPU restore code -- see
> > > >>>> below.  You should be checking the SHSTK_EN bit, not SSP.
> > > >>>
> > > >>> Updated.  Is this OK?  I will resend the whole series later.
> > > >>>
> > > >>> Thanks,
> > > >>> Yu-cheng
> > > >>>
> > > >>> ======
> > > >>>
> > > >>>   From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > > >>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > > >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> > > >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > > >>> Indirect Branch
> > > >>>    Tracking for vsyscall emulation
> > > >>>
> > > >>> Vsyscall entry points are effectively branch targets.  Mark them with
> > > >>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> > > >>> and reset IBT state machine.
> > > >>>
> > > >>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > > >>> ---
> > > >>> v13:
> > > >>> - Check shadow stack address is canonical.
> > > >>> - Change from writing to MSRs to writing to CET xstate.
> > > >>>
> > > >>>    arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
> > > >>>    arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
> > > >>>    arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
> > > >>>    3 files changed, 44 insertions(+)
> > > >>>
> > > >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> index 44c33103a955..30b166091d46 100644
> > > >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> @@ -38,6 +38,9 @@
> > > >>>    #include <asm/fixmap.h>
> > > >>>    #include <asm/traps.h>
> > > >>>    #include <asm/paravirt.h>
> > > >>> +#include <asm/fpu/xstate.h>
> > > >>> +#include <asm/fpu/types.h>
> > > >>> +#include <asm/fpu/internal.h>
> > > >>>
> > > >>>    #define CREATE_TRACE_POINTS
> > > >>>    #include "vsyscall_trace.h"
> > > >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > > >>>          /* Emulate a ret instruction. */
> > > >>>          regs->ip = caller;
> > > >>>          regs->sp += 8;
> > > >>> +
> > > >>> +#ifdef CONFIG_X86_CET
> > > >>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > > >>> +               struct cet_user_state *cet;
> > > >>> +               struct fpu *fpu;
> > > >>> +
> > > >>> +               fpu = &tsk->thread.fpu;
> > > >>> +               fpregs_lock();
> > > >>> +
> > > >>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > > >>> +                       copy_fpregs_to_fpstate(fpu);
> > > >>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> > > >>> +               }
> > > >>> +
> > > >>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > >>> +               if (!cet) {
> > > >>> +                       /*
> > > >>> +                        * This should not happen.  The task is
> > > >>> +                        * CET-enabled, but CET xstate is in INIT.
> > > >>> +                        */
> > > >>
> > > >> Can the comment explain better, please?  I would say something like:
> > > >>
> > > >> If the kernel thinks this task has CET enabled (because
> > > >> tsk->thread.cet has one of the features enabled), then the
> > > >> corresponding bits must also be set in the CET XSAVES region.  If the
> > > >> CET XSAVES region is in the INIT state, then the kernel's concept of
> > > >> the task's CET state is corrupt.
> > > >>
> > > >>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > > >>> +                       fpregs_unlock();
> > > >>> +                       goto sigsegv;
> > > >>> +               }
> > > >>> +
> > > >>> +               if (cet->user_cet & CET_SHSTK_EN) {
> > > >>> +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > > >>> +                               cet->user_ssp += 8;
> > > >>> +               }
> > > >>
> > > >> This makes so sense to me.  Also, the vsyscall emulation code is
> > > >> intended to be as rigid as possible to minimize the chance that it
> > > >> gets used as an exploit gadget.  So we should not silently corrupt
> > > >> anything.  Moreover, this code seems quite dangerous -- you've created
> > > >> a gadget that does RET without actually verifying the SHSTK token.  If
> > > >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> > > >> the existance of a CFI-bypassing return primitive at a fixed address
> > > >> seems quite problematic.
> > > >>
> > > >> So I think you need to write a function that reasonably accurately
> > > >> emulates a usermode RET.
> > > >>
> > > >
> > > > For what it's worth, I think there is an alternative.  If you all
> > > > (userspace people, etc) can come up with a credible way for a user
> > > > program to statically declare that it doesn't need vsyscalls, then we
> > > > could make SHSTK depend on *that*, and we could avoid this mess.  This
> > > > breaks orthogonality, but it's probably a decent outcome.
> > > >
> > >
> > > Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
> > > thread flag, and in emulate_vsyscall(), checks the flag.
> > >
> > > When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> > >
> > > How is that?
> >
> > Backwards, no?  Presumably vsyscall needs to be disabled before or
> > concurrently with CET being enabled, not after.
> >
> > I think the solution of making vsyscall emulation work correctly with
> > CET is going to be better and possibly more straightforward.
> >
>
> We can do
>
> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> 2. If CPU supports CET and the program is CET enabled:
>     a. Disable the vsyscall page.
>     b. Pass control to user.
>     c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
>
> So when control is passed from kernel to user, the vsyscall page is
> disabled if the program
> is CET enabled.

Let me say this one more time:

If we have a per-process vsyscall disable control and a per-process
CET control, we are going to keep those settings orthogonal.  I'm
willing to entertain an option in which enabling SHSTK without also
disabling vsyscalls is disallowed, We are *not* going to have any CET
flags magically disable vsyscalls, though, and we are not going to
have a situation where disabling vsyscalls on process startup requires
enabling SHSTK.

Any possible static vsyscall controls (and CET controls, for that
matter) also need to come with some explanation of whether they are
properties set on the ELF loader, the ELF program being loaded, or
both.  And this explanation needs to cover what happens when old
binaries link against new libc versions and vice versa.  A new
CET-enabled binary linked against old libc running on a new kernel
that is expected to work on a non-CET CPU MUST work on a CET CPU, too.

Right now, literally the only thing preventing vsyscall emulation from
coexisting with SHSTK is that the implementation eeds work.

So your proposal is rejected.  Sorry.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-10-01  1:10                         ` Andy Lutomirski
@ 2020-10-01  1:21                           ` H.J. Lu
  2020-10-01 16:51                           ` Yu, Yu-cheng
  1 sibling, 0 replies; 29+ messages in thread
From: H.J. Lu @ 2020-10-01  1:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu, Yu-cheng, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, LKML, open list:DOCUMENTATION, Linux-MM, linux-arch,
	Linux API, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Wed, Sep 30, 2020 at 6:10 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > > >
> > > > On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > > > > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > > >>
> > > > >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > > > >>>
> > > > >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > > > >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > > > >>>>>
> > > > >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > > > >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> > > > >>>>> +
> > > > >>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > > >>>>> +               if (!cet) {
> > > > >>>>> +                       /*
> > > > >>>>> +                        * This is an unlikely case where the task is
> > > > >>>>> +                        * CET-enabled, but CET xstate is in INIT.
> > > > >>>>> +                        */
> > > > >>>>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > > > >>>>
> > > > >>>> "unlikely" doesn't really cover this.
> > > > >>>>
> > > > >>>>> +                       fpregs_unlock();
> > > > >>>>> +                       goto sigsegv;
> > > > >>>>> +               }
> > > > >>>>> +
> > > > >>>>> +               if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > > > >>>>> +                       cet->user_ssp += 8;
> > > > >>>>
> > > > >>>> This looks buggy.  The condition should be "if SHSTK is on, then add 8
> > > > >>>> to user_ssp".  If the result is noncanonical, then some appropriate
> > > > >>>> exception should be generated, probably by the FPU restore code -- see
> > > > >>>> below.  You should be checking the SHSTK_EN bit, not SSP.
> > > > >>>
> > > > >>> Updated.  Is this OK?  I will resend the whole series later.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Yu-cheng
> > > > >>>
> > > > >>> ======
> > > > >>>
> > > > >>>   From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > > > >>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > > > >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> > > > >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > > > >>> Indirect Branch
> > > > >>>    Tracking for vsyscall emulation
> > > > >>>
> > > > >>> Vsyscall entry points are effectively branch targets.  Mark them with
> > > > >>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> > > > >>> and reset IBT state machine.
> > > > >>>
> > > > >>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > > > >>> ---
> > > > >>> v13:
> > > > >>> - Check shadow stack address is canonical.
> > > > >>> - Change from writing to MSRs to writing to CET xstate.
> > > > >>>
> > > > >>>    arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
> > > > >>>    arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
> > > > >>>    arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
> > > > >>>    3 files changed, 44 insertions(+)
> > > > >>>
> > > > >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> index 44c33103a955..30b166091d46 100644
> > > > >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> @@ -38,6 +38,9 @@
> > > > >>>    #include <asm/fixmap.h>
> > > > >>>    #include <asm/traps.h>
> > > > >>>    #include <asm/paravirt.h>
> > > > >>> +#include <asm/fpu/xstate.h>
> > > > >>> +#include <asm/fpu/types.h>
> > > > >>> +#include <asm/fpu/internal.h>
> > > > >>>
> > > > >>>    #define CREATE_TRACE_POINTS
> > > > >>>    #include "vsyscall_trace.h"
> > > > >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > > > >>>          /* Emulate a ret instruction. */
> > > > >>>          regs->ip = caller;
> > > > >>>          regs->sp += 8;
> > > > >>> +
> > > > >>> +#ifdef CONFIG_X86_CET
> > > > >>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > > > >>> +               struct cet_user_state *cet;
> > > > >>> +               struct fpu *fpu;
> > > > >>> +
> > > > >>> +               fpu = &tsk->thread.fpu;
> > > > >>> +               fpregs_lock();
> > > > >>> +
> > > > >>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > > > >>> +                       copy_fpregs_to_fpstate(fpu);
> > > > >>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> > > > >>> +               }
> > > > >>> +
> > > > >>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > > >>> +               if (!cet) {
> > > > >>> +                       /*
> > > > >>> +                        * This should not happen.  The task is
> > > > >>> +                        * CET-enabled, but CET xstate is in INIT.
> > > > >>> +                        */
> > > > >>
> > > > >> Can the comment explain better, please?  I would say something like:
> > > > >>
> > > > >> If the kernel thinks this task has CET enabled (because
> > > > >> tsk->thread.cet has one of the features enabled), then the
> > > > >> corresponding bits must also be set in the CET XSAVES region.  If the
> > > > >> CET XSAVES region is in the INIT state, then the kernel's concept of
> > > > >> the task's CET state is corrupt.
> > > > >>
> > > > >>> +                       WARN_ONCE(1, "CET is enabled, but no xstates");
> > > > >>> +                       fpregs_unlock();
> > > > >>> +                       goto sigsegv;
> > > > >>> +               }
> > > > >>> +
> > > > >>> +               if (cet->user_cet & CET_SHSTK_EN) {
> > > > >>> +                       if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > > > >>> +                               cet->user_ssp += 8;
> > > > >>> +               }
> > > > >>
> > > > >> This makes so sense to me.  Also, the vsyscall emulation code is
> > > > >> intended to be as rigid as possible to minimize the chance that it
> > > > >> gets used as an exploit gadget.  So we should not silently corrupt
> > > > >> anything.  Moreover, this code seems quite dangerous -- you've created
> > > > >> a gadget that does RET without actually verifying the SHSTK token.  If
> > > > >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> > > > >> the existance of a CFI-bypassing return primitive at a fixed address
> > > > >> seems quite problematic.
> > > > >>
> > > > >> So I think you need to write a function that reasonably accurately
> > > > >> emulates a usermode RET.
> > > > >>
> > > > >
> > > > > For what it's worth, I think there is an alternative.  If you all
> > > > > (userspace people, etc) can come up with a credible way for a user
> > > > > program to statically declare that it doesn't need vsyscalls, then we
> > > > > could make SHSTK depend on *that*, and we could avoid this mess.  This
> > > > > breaks orthogonality, but it's probably a decent outcome.
> > > > >
> > > >
> > > > Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
> > > > thread flag, and in emulate_vsyscall(), checks the flag.
> > > >
> > > > When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> > > >
> > > > How is that?
> > >
> > > Backwards, no?  Presumably vsyscall needs to be disabled before or
> > > concurrently with CET being enabled, not after.
> > >
> > > I think the solution of making vsyscall emulation work correctly with
> > > CET is going to be better and possibly more straightforward.
> > >
> >
> > We can do
> >
> > 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> > 2. If CPU supports CET and the program is CET enabled:
> >     a. Disable the vsyscall page.
> >     b. Pass control to user.
> >     c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
> >
> > So when control is passed from kernel to user, the vsyscall page is
> > disabled if the program
> > is CET enabled.
>
> Let me say this one more time:
>
> If we have a per-process vsyscall disable control and a per-process
> CET control, we are going to keep those settings orthogonal.  I'm
> willing to entertain an option in which enabling SHSTK without also
> disabling vsyscalls is disallowed, We are *not* going to have any CET
> flags magically disable vsyscalls, though, and we are not going to
> have a situation where disabling vsyscalls on process startup requires
> enabling SHSTK.
>
> Any possible static vsyscall controls (and CET controls, for that
> matter) also need to come with some explanation of whether they are
> properties set on the ELF loader, the ELF program being loaded, or

Kernel enables CET on CET processors only if ld.so is CET enabled.
Kernel passes control to CET enabled ld.so with CET enabled and
ld.so will check if CET should be disabled because of legacy program
or dependency libraries.

> both.  And this explanation needs to cover what happens when old
> binaries link against new libc versions and vice versa.  A new
> CET-enabled binary linked against old libc running on a new kernel
> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.

Since kernel doesn't enable CET on ld.so from the old libc, the new
CET-enabled binary will start with CET disabled regardless whatever the
processor the binary runs on.

> Right now, literally the only thing preventing vsyscall emulation from
> coexisting with SHSTK is that the implementation eeds work.
>
> So your proposal is rejected.  Sorry.



-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-10-01  1:10                         ` Andy Lutomirski
  2020-10-01  1:21                           ` H.J. Lu
@ 2020-10-01 16:51                           ` Yu, Yu-cheng
  2020-10-01 17:26                             ` Andy Lutomirski
  1 sibling, 1 reply; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-10-01 16:51 UTC (permalink / raw)
  To: Andy Lutomirski, H.J. Lu
  Cc: X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
	open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:

[...]

>>>>>>>    From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
>>>>>>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
>>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
>>>>>>> Indirect Branch
>>>>>>>     Tracking for vsyscall emulation
>>>>>>>
>>>>>>> Vsyscall entry points are effectively branch targets.  Mark them with
>>>>>>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
>>>>>>> and reset IBT state machine.
>>>>>>>
>>>>>>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>>>>>> ---
>>>>>>> v13:
>>>>>>> - Check shadow stack address is canonical.
>>>>>>> - Change from writing to MSRs to writing to CET xstate.
>>>>>>>
>>>>>>>     arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
>>>>>>>     arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
>>>>>>>     arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
>>>>>>>     3 files changed, 44 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> b/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> index 44c33103a955..30b166091d46 100644
>>>>>>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> @@ -38,6 +38,9 @@
>>>>>>>     #include <asm/fixmap.h>
>>>>>>>     #include <asm/traps.h>
>>>>>>>     #include <asm/paravirt.h>
>>>>>>> +#include <asm/fpu/xstate.h>
>>>>>>> +#include <asm/fpu/types.h>
>>>>>>> +#include <asm/fpu/internal.h>
>>>>>>>
>>>>>>>     #define CREATE_TRACE_POINTS
>>>>>>>     #include "vsyscall_trace.h"
>>>>>>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
>>>>>>>           /* Emulate a ret instruction. */
>>>>>>>           regs->ip = caller;
>>>>>>>           regs->sp += 8;
>>>>>>> +
>>>>>>> +#ifdef CONFIG_X86_CET
>>>>>>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>>>>>>> +               struct cet_user_state *cet;
>>>>>>> +               struct fpu *fpu;
>>>>>>> +
>>>>>>> +               fpu = &tsk->thread.fpu;
>>>>>>> +               fpregs_lock();
>>>>>>> +
>>>>>>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>>>>>> +                       copy_fpregs_to_fpstate(fpu);
>>>>>>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
>>>>>>> +               }
>>>>>>> +
>>>>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>>>>>> +               if (!cet) {
>>>>>>> +                       /*
>>>>>>> +                        * This should not happen.  The task is
>>>>>>> +                        * CET-enabled, but CET xstate is in INIT.
>>>>>>> +                        */
>>>>>>
[...]
>>>>>>
>>>>>
>>>>> For what it's worth, I think there is an alternative.  If you all
>>>>> (userspace people, etc) can come up with a credible way for a user
>>>>> program to statically declare that it doesn't need vsyscalls, then we
>>>>> could make SHSTK depend on *that*, and we could avoid this mess.  This
>>>>> breaks orthogonality, but it's probably a decent outcome.
>>>>>
>>>>
>>>> Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
>>>> thread flag, and in emulate_vsyscall(), checks the flag.
>>>>
>>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
>>>>
>>>> How is that?
>>>
>>> Backwards, no?  Presumably vsyscall needs to be disabled before or
>>> concurrently with CET being enabled, not after.
>>>
>>> I think the solution of making vsyscall emulation work correctly with
>>> CET is going to be better and possibly more straightforward.
>>>
>>
>> We can do
>>
>> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
>> 2. If CPU supports CET and the program is CET enabled:
>>      a. Disable the vsyscall page.
>>      b. Pass control to user.
>>      c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
>>
>> So when control is passed from kernel to user, the vsyscall page is
>> disabled if the program
>> is CET enabled.
> 
> Let me say this one more time:
> 
> If we have a per-process vsyscall disable control and a per-process
> CET control, we are going to keep those settings orthogonal.  I'm
> willing to entertain an option in which enabling SHSTK without also
> disabling vsyscalls is disallowed, We are *not* going to have any CET
> flags magically disable vsyscalls, though, and we are not going to
> have a situation where disabling vsyscalls on process startup requires
> enabling SHSTK.
> 
> Any possible static vsyscall controls (and CET controls, for that
> matter) also need to come with some explanation of whether they are
> properties set on the ELF loader, the ELF program being loaded, or
> both.  And this explanation needs to cover what happens when old
> binaries link against new libc versions and vice versa.  A new
> CET-enabled binary linked against old libc running on a new kernel
> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
> 
> Right now, literally the only thing preventing vsyscall emulation from
> coexisting with SHSTK is that the implementation eeds work.
> 
> So your proposal is rejected.  Sorry.
>
I think, even with shadow stack/ibt enabled, we can still allow XONLY 
without too much mess.

What about this?

Thanks,
Yu-cheng

======

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c 
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 8b0b32ac7791..d39da0a15521 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -48,16 +48,16 @@
  static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
  #ifdef CONFIG_LEGACY_VSYSCALL_NONE
         NONE;
-#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
+#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
         XONLY;
-#else
+#else
         EMULATE;
  #endif

  static int __init vsyscall_setup(char *str)
  {
         if (str) {
-               if (!strcmp("emulate", str))
+               if (!strcmp("emulate", str) && !IS_ENABLED(CONFIG_X86_CET))
                         vsyscall_mode = EMULATE;
                 else if (!strcmp("xonly", str))
                         vsyscall_mode = XONLY;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-10-01 16:51                           ` Yu, Yu-cheng
@ 2020-10-01 17:26                             ` Andy Lutomirski
  2020-10-06 19:09                               ` Yu, Yu-cheng
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2020-10-01 17:26 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, H.J. Lu, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, LKML, open list:DOCUMENTATION,
	Linux-MM, linux-arch, Linux API, Arnd Bergmann, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Thu, Oct 1, 2020 at 9:51 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>
> On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
> > On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> [...]
>
> >>>>>>>    From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> >>>>>>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> >>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> >>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> >>>>>>> Indirect Branch
> >>>>>>>     Tracking for vsyscall emulation
> >>>>>>>
> >>>>>>> Vsyscall entry points are effectively branch targets.  Mark them with
> >>>>>>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> >>>>>>> and reset IBT state machine.
> >>>>>>>
> >>>>>>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> >>>>>>> ---
> >>>>>>> v13:
> >>>>>>> - Check shadow stack address is canonical.
> >>>>>>> - Change from writing to MSRs to writing to CET xstate.
> >>>>>>>
> >>>>>>>     arch/x86/entry/vsyscall/vsyscall_64.c     | 34 +++++++++++++++++++++++
> >>>>>>>     arch/x86/entry/vsyscall/vsyscall_emu_64.S |  9 ++++++
> >>>>>>>     arch/x86/entry/vsyscall/vsyscall_trace.h  |  1 +
> >>>>>>>     3 files changed, 44 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> index 44c33103a955..30b166091d46 100644
> >>>>>>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> @@ -38,6 +38,9 @@
> >>>>>>>     #include <asm/fixmap.h>
> >>>>>>>     #include <asm/traps.h>
> >>>>>>>     #include <asm/paravirt.h>
> >>>>>>> +#include <asm/fpu/xstate.h>
> >>>>>>> +#include <asm/fpu/types.h>
> >>>>>>> +#include <asm/fpu/internal.h>
> >>>>>>>
> >>>>>>>     #define CREATE_TRACE_POINTS
> >>>>>>>     #include "vsyscall_trace.h"
> >>>>>>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> >>>>>>>           /* Emulate a ret instruction. */
> >>>>>>>           regs->ip = caller;
> >>>>>>>           regs->sp += 8;
> >>>>>>> +
> >>>>>>> +#ifdef CONFIG_X86_CET
> >>>>>>> +       if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> >>>>>>> +               struct cet_user_state *cet;
> >>>>>>> +               struct fpu *fpu;
> >>>>>>> +
> >>>>>>> +               fpu = &tsk->thread.fpu;
> >>>>>>> +               fpregs_lock();
> >>>>>>> +
> >>>>>>> +               if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> >>>>>>> +                       copy_fpregs_to_fpstate(fpu);
> >>>>>>> +                       set_thread_flag(TIF_NEED_FPU_LOAD);
> >>>>>>> +               }
> >>>>>>> +
> >>>>>>> +               cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >>>>>>> +               if (!cet) {
> >>>>>>> +                       /*
> >>>>>>> +                        * This should not happen.  The task is
> >>>>>>> +                        * CET-enabled, but CET xstate is in INIT.
> >>>>>>> +                        */
> >>>>>>
> [...]
> >>>>>>
> >>>>>
> >>>>> For what it's worth, I think there is an alternative.  If you all
> >>>>> (userspace people, etc) can come up with a credible way for a user
> >>>>> program to statically declare that it doesn't need vsyscalls, then we
> >>>>> could make SHSTK depend on *that*, and we could avoid this mess.  This
> >>>>> breaks orthogonality, but it's probably a decent outcome.
> >>>>>
> >>>>
> >>>> Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
> >>>> thread flag, and in emulate_vsyscall(), checks the flag.
> >>>>
> >>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> >>>>
> >>>> How is that?
> >>>
> >>> Backwards, no?  Presumably vsyscall needs to be disabled before or
> >>> concurrently with CET being enabled, not after.
> >>>
> >>> I think the solution of making vsyscall emulation work correctly with
> >>> CET is going to be better and possibly more straightforward.
> >>>
> >>
> >> We can do
> >>
> >> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> >> 2. If CPU supports CET and the program is CET enabled:
> >>      a. Disable the vsyscall page.
> >>      b. Pass control to user.
> >>      c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
> >>
> >> So when control is passed from kernel to user, the vsyscall page is
> >> disabled if the program
> >> is CET enabled.
> >
> > Let me say this one more time:
> >
> > If we have a per-process vsyscall disable control and a per-process
> > CET control, we are going to keep those settings orthogonal.  I'm
> > willing to entertain an option in which enabling SHSTK without also
> > disabling vsyscalls is disallowed, We are *not* going to have any CET
> > flags magically disable vsyscalls, though, and we are not going to
> > have a situation where disabling vsyscalls on process startup requires
> > enabling SHSTK.
> >
> > Any possible static vsyscall controls (and CET controls, for that
> > matter) also need to come with some explanation of whether they are
> > properties set on the ELF loader, the ELF program being loaded, or
> > both.  And this explanation needs to cover what happens when old
> > binaries link against new libc versions and vice versa.  A new
> > CET-enabled binary linked against old libc running on a new kernel
> > that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
> >
> > Right now, literally the only thing preventing vsyscall emulation from
> > coexisting with SHSTK is that the implementation eeds work.
> >
> > So your proposal is rejected.  Sorry.
> >
> I think, even with shadow stack/ibt enabled, we can still allow XONLY
> without too much mess.
>
> What about this?
>
> Thanks,
> Yu-cheng
>
> ======
>
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 8b0b32ac7791..d39da0a15521 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -48,16 +48,16 @@
>   static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
>   #ifdef CONFIG_LEGACY_VSYSCALL_NONE
>          NONE;
> -#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
> +#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
>          XONLY;
> -#else
> +#else
>          EMULATE;
>   #endif

I don't get it.

First, you can't do any of this based on config -- it must be runtime.

Second, and more importantly, I don't see how XONLY helps at all.  The
(non-executable) text that's exposed to user code in EMULATE mode is
trivial to get right with CET -- your code already handles it.  It's
the emulation code (that runs identically in EMULATE and XONLY mode)
that's tricky.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-10-01 17:26                             ` Andy Lutomirski
@ 2020-10-06 19:09                               ` Yu, Yu-cheng
  2020-10-09 17:42                                 ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Yu, Yu-cheng @ 2020-10-06 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H.J. Lu, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	LKML, open list:DOCUMENTATION, Linux-MM, linux-arch, Linux API,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 10/1/2020 10:26 AM, Andy Lutomirski wrote:
> On Thu, Oct 1, 2020 at 9:51 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>>
>> On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
>>> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:
>>
>> [...]
>>
>>>>>>>>>     From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
>>>>>>>>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>>>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
>>>>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
>>>>>>>>> Indirect Branch
>>>>>>>>>      Tracking for vsyscall emulation
>>>>>>>>>
>>>>>>>>> Vsyscall entry points are effectively branch targets.  Mark them with
>>>>>>>>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
>>>>>>>>> and reset IBT state machine.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

[...]

>>>>>>>>
>>>>>>>
>>>>>>> For what it's worth, I think there is an alternative.  If you all
>>>>>>> (userspace people, etc) can come up with a credible way for a user
>>>>>>> program to statically declare that it doesn't need vsyscalls, then we
>>>>>>> could make SHSTK depend on *that*, and we could avoid this mess.  This
>>>>>>> breaks orthogonality, but it's probably a decent outcome.
>>>>>>>
>>>>>>
>>>>>> Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
>>>>>> thread flag, and in emulate_vsyscall(), checks the flag.
>>>>>>
>>>>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
>>>>>>
>>>>>> How is that?
>>>>>
>>>>> Backwards, no?  Presumably vsyscall needs to be disabled before or
>>>>> concurrently with CET being enabled, not after.
>>>>>
>>>>> I think the solution of making vsyscall emulation work correctly with
>>>>> CET is going to be better and possibly more straightforward.
>>>>>
>>>>
>>>> We can do
>>>>
>>>> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
>>>> 2. If CPU supports CET and the program is CET enabled:
>>>>       a. Disable the vsyscall page.
>>>>       b. Pass control to user.
>>>>       c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
>>>>
>>>> So when control is passed from kernel to user, the vsyscall page is
>>>> disabled if the program
>>>> is CET enabled.
>>>
>>> Let me say this one more time:
>>>
>>> If we have a per-process vsyscall disable control and a per-process
>>> CET control, we are going to keep those settings orthogonal.  I'm
>>> willing to entertain an option in which enabling SHSTK without also
>>> disabling vsyscalls is disallowed, We are *not* going to have any CET
>>> flags magically disable vsyscalls, though, and we are not going to
>>> have a situation where disabling vsyscalls on process startup requires
>>> enabling SHSTK.
>>>
>>> Any possible static vsyscall controls (and CET controls, for that
>>> matter) also need to come with some explanation of whether they are
>>> properties set on the ELF loader, the ELF program being loaded, or
>>> both.  And this explanation needs to cover what happens when old
>>> binaries link against new libc versions and vice versa.  A new
>>> CET-enabled binary linked against old libc running on a new kernel
>>> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
>>>
>>> Right now, literally the only thing preventing vsyscall emulation from
>>> coexisting with SHSTK is that the implementation eeds work.
>>>
>>> So your proposal is rejected.  Sorry.
>>>
>> I think, even with shadow stack/ibt enabled, we can still allow XONLY
>> without too much mess.
>>
>> What about this?
>>
>> Thanks,
>> Yu-cheng
>>
>> ======
>>
>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
>> b/arch/x86/entry/vsyscall/vsyscall_64.c
>> index 8b0b32ac7791..d39da0a15521 100644
>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
>> @@ -48,16 +48,16 @@
>>    static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
>>    #ifdef CONFIG_LEGACY_VSYSCALL_NONE
>>           NONE;
>> -#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
>> +#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
>>           XONLY;
>> -#else
>> +#else
>>           EMULATE;
>>    #endif
> 
> I don't get it.
> 
> First, you can't do any of this based on config -- it must be runtime.
> 
> Second, and more importantly, I don't see how XONLY helps at all.  The
> (non-executable) text that's exposed to user code in EMULATE mode is
> trivial to get right with CET -- your code already handles it.  It's
> the emulation code (that runs identically in EMULATE and XONLY mode)
> that's tricky.
> 

Hi,

There has been some ambiguity in my previous proposals.  To make things 
clear, I created a patch for arch_prctl(VSYSCALL_CTL), which controls 
the TIF_VSYSCALL_DISABLE flag.  It is entirely orthogonal to shadow 
stack or IBT.  On top of the patch, we can do SET_PERSONALITY2() to 
disable vsyscall, e.g.

======
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 0e1be2a13359..c730ff00bc62 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -394,6 +394,19 @@ struct arch_elf_state {
  	.gnu_property = 0,	\
  }

+#define SET_PERSONALITY2(ex, state)				\
+do {								\
+	unsigned int has_cet;					\
+								\
+	has_cet = GNU_PROPERTY_X86_FEATURE_1_SHSTK |		\
+		  GNU_PROPERTY_X86_FEATURE_1_IBT;		\
+								\
+	if ((state)->gnu_property & has_cet)			\
+		set_thread_flag(TIF_VSYSCALL_DISABLE);		\
+								\
+	SET_PERSONALITY(ex);					\
+} while (0)
+
  #define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0)
  #define arch_check_elf(ehdr, interp, interp_ehdr, state) (0)
  #endif

======
The is the patch.

 From a124b81086122495d6837f26df99db619cd5402a Mon Sep 17 00:00:00 2001
From: Yu-cheng Yu <yu-cheng.yu@intel.com>
Date: Mon, 5 Oct 2020 12:10:26 -0700
Subject: [PATCH 34/45] x86/vsyscall/64: Introduce arch_prctl(VSYCALL_CTL)

Vsyscall emulation provides compatibility to older applications.  Newer
applications use the vDSO interface and do not use vsyscalls, and it is
desirable to have a per-task control of vsyscall.

One use case of the interface is when shadow stack and/or indirect branch
tracking is enabled and vsyscall emulation needs to cancel out the control-
flow protection.  The cancelling code, if implemented, could become a back
door for evading the protection.  Disabling vsyscall eliminates the risk.

Introduce arch_prctl(VSYSCALL_CTL), which sets/clears TIF_VSYSCALL_DISABLE
flag.  When the flag is set, vsyscall is disabled.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
  arch/x86/entry/vsyscall/vsyscall_64.c   |  3 +++
  arch/x86/include/asm/thread_info.h      |  2 ++
  arch/x86/include/uapi/asm/prctl.h       |  1 +
  arch/x86/kernel/process_64.c            | 17 +++++++++++++++++
  tools/arch/x86/include/uapi/asm/prctl.h |  1 +
  5 files changed, 24 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c 
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..fe8f3db6d21b 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -127,6 +127,9 @@ bool emulate_vsyscall(unsigned long error_code,
  	long ret;
  	unsigned long orig_dx;

+	if (test_thread_flag(TIF_VSYSCALL_DISABLE))
+		return false;
+
  	/* Write faults or kernel-privilege faults never get fixed up. */
  	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
  		return false;
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..c0cce3401c0f 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -98,6 +98,7 @@ struct thread_info {
  #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
  #define TIF_FORCED_TF		24	/* true if TF in eflags artificially */
  #define TIF_BLOCKSTEP		25	/* set when we want DEBUGCTLMSR_BTF */
+#define TIF_VSYSCALL_DISABLE	26	/* set when vsyscall is disallowed */
  #define TIF_LAZY_MMU_UPDATES	27	/* task is updating the mmu lazily */
  #define TIF_SYSCALL_TRACEPOINT	28	/* syscall tracepoint instrumentation */
  #define TIF_ADDR32		29	/* 32-bit address space on 64 bits */
@@ -127,6 +128,7 @@ struct thread_info {
  #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
  #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
  #define _TIF_BLOCKSTEP		(1 << TIF_BLOCKSTEP)
+#define _TIF_VSYSCALL_DISABLE	(1 << TIF_VSYSCALL_DISABLE)
  #define _TIF_LAZY_MMU_UPDATES	(1 << TIF_LAZY_MMU_UPDATES)
  #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
  #define _TIF_ADDR32		(1 << TIF_ADDR32)
diff --git a/arch/x86/include/uapi/asm/prctl.h 
b/arch/x86/include/uapi/asm/prctl.h
index 9245bf629120..223fa382a81e 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -13,6 +13,7 @@
  #define ARCH_MAP_VDSO_X32	0x2001
  #define ARCH_MAP_VDSO_32	0x2002
  #define ARCH_MAP_VDSO_64	0x2003
+#define ARCH_VSYSCALL_CTRL	0x2004

  #define ARCH_X86_CET_STATUS		0x3001
  #define ARCH_X86_CET_DISABLE		0x3002
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1147a1052a07..eba61791c9cf 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -719,6 +719,20 @@ static long prctl_map_vdso(const struct vdso_image 
*image, unsigned long addr)
  }
  #endif

+static long prctl_vsyscall_ctrl(unsigned int disable)
+{
+	if (IS_ENABLED(CONFIG_X86_VSYSCALL_EMULATION)) {
+		if (disable)
+			set_thread_flag(TIF_VSYSCALL_DISABLE);
+		else
+			clear_thread_flag(TIF_VSYSCALL_DISABLE);
+
+		return 0;
+	} else {
+		return disable ? 0 : -EINVAL;
+	}
+}
+
  long do_arch_prctl_64(struct task_struct *task, int option, unsigned 
long arg2)
  {
  	int ret = 0;
@@ -807,6 +821,9 @@ long do_arch_prctl_64(struct task_struct *task, int 
option, unsigned long arg2)
  		return prctl_map_vdso(&vdso_image_64, arg2);
  #endif

+	case ARCH_VSYSCALL_CTRL:
+		return prctl_vsyscall_ctrl(arg2);
+
  	default:
  		ret = -EINVAL;
  		break;
diff --git a/tools/arch/x86/include/uapi/asm/prctl.h 
b/tools/arch/x86/include/uapi/asm/prctl.h
index 9245bf629120..2476f46fa51f 100644
--- a/tools/arch/x86/include/uapi/asm/prctl.h
+++ b/tools/arch/x86/include/uapi/asm/prctl.h
@@ -13,6 +13,7 @@
  #define ARCH_MAP_VDSO_X32	0x2001
  #define ARCH_MAP_VDSO_32	0x2002
  #define ARCH_MAP_VDSO_64	0x2003
+#define ARCH_VSYSCALL_CTL	0x2004

  #define ARCH_X86_CET_STATUS		0x3001
  #define ARCH_X86_CET_DISABLE		0x3002
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation
  2020-10-06 19:09                               ` Yu, Yu-cheng
@ 2020-10-09 17:42                                 ` Andy Lutomirski
  0 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2020-10-09 17:42 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, H.J. Lu, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, LKML, open list:DOCUMENTATION,
	Linux-MM, linux-arch, Linux API, Arnd Bergmann, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu

On Tue, Oct 6, 2020 at 12:09 PM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>
> On 10/1/2020 10:26 AM, Andy Lutomirski wrote:
> > On Thu, Oct 1, 2020 at 9:51 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
> >>
> >> On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
> >>> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>>>
> >>>> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <luto@kernel.org> wrote:
> >>
> >> [...]
> >>
> >>>>>>>>>     From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> >>>>>>>>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> >>>>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> >>>>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> >>>>>>>>> Indirect Branch
> >>>>>>>>>      Tracking for vsyscall emulation
> >>>>>>>>>
> >>>>>>>>> Vsyscall entry points are effectively branch targets.  Mark them with
> >>>>>>>>> ENDBR64 opcodes.  When emulating the RET instruction, unwind shadow stack
> >>>>>>>>> and reset IBT state machine.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
>
> [...]
>
> >>>>>>>>
> >>>>>>>
> >>>>>>> For what it's worth, I think there is an alternative.  If you all
> >>>>>>> (userspace people, etc) can come up with a credible way for a user
> >>>>>>> program to statically declare that it doesn't need vsyscalls, then we
> >>>>>>> could make SHSTK depend on *that*, and we could avoid this mess.  This
> >>>>>>> breaks orthogonality, but it's probably a decent outcome.
> >>>>>>>
> >>>>>>
> >>>>>> Would an arch_prctl(DISABLE_VSYSCALL) work?  The kernel then sets a
> >>>>>> thread flag, and in emulate_vsyscall(), checks the flag.
> >>>>>>
> >>>>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> >>>>>>
> >>>>>> How is that?
> >>>>>
> >>>>> Backwards, no?  Presumably vsyscall needs to be disabled before or
> >>>>> concurrently with CET being enabled, not after.
> >>>>>
> >>>>> I think the solution of making vsyscall emulation work correctly with
> >>>>> CET is going to be better and possibly more straightforward.
> >>>>>
> >>>>
> >>>> We can do
> >>>>
> >>>> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> >>>> 2. If CPU supports CET and the program is CET enabled:
> >>>>       a. Disable the vsyscall page.
> >>>>       b. Pass control to user.
> >>>>       c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
> >>>>
> >>>> So when control is passed from kernel to user, the vsyscall page is
> >>>> disabled if the program
> >>>> is CET enabled.
> >>>
> >>> Let me say this one more time:
> >>>
> >>> If we have a per-process vsyscall disable control and a per-process
> >>> CET control, we are going to keep those settings orthogonal.  I'm
> >>> willing to entertain an option in which enabling SHSTK without also
> >>> disabling vsyscalls is disallowed, We are *not* going to have any CET
> >>> flags magically disable vsyscalls, though, and we are not going to
> >>> have a situation where disabling vsyscalls on process startup requires
> >>> enabling SHSTK.
> >>>
> >>> Any possible static vsyscall controls (and CET controls, for that
> >>> matter) also need to come with some explanation of whether they are
> >>> properties set on the ELF loader, the ELF program being loaded, or
> >>> both.  And this explanation needs to cover what happens when old
> >>> binaries link against new libc versions and vice versa.  A new
> >>> CET-enabled binary linked against old libc running on a new kernel
> >>> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
> >>>
> >>> Right now, literally the only thing preventing vsyscall emulation from
> >>> coexisting with SHSTK is that the implementation eeds work.
> >>>
> >>> So your proposal is rejected.  Sorry.
> >>>
> >> I think, even with shadow stack/ibt enabled, we can still allow XONLY
> >> without too much mess.
> >>
> >> What about this?
> >>
> >> Thanks,
> >> Yu-cheng
> >>
> >> ======
> >>
> >> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> >> b/arch/x86/entry/vsyscall/vsyscall_64.c
> >> index 8b0b32ac7791..d39da0a15521 100644
> >> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> >> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> >> @@ -48,16 +48,16 @@
> >>    static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
> >>    #ifdef CONFIG_LEGACY_VSYSCALL_NONE
> >>           NONE;
> >> -#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
> >> +#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
> >>           XONLY;
> >> -#else
> >> +#else
> >>           EMULATE;
> >>    #endif
> >
> > I don't get it.
> >
> > First, you can't do any of this based on config -- it must be runtime.
> >
> > Second, and more importantly, I don't see how XONLY helps at all.  The
> > (non-executable) text that's exposed to user code in EMULATE mode is
> > trivial to get right with CET -- your code already handles it.  It's
> > the emulation code (that runs identically in EMULATE and XONLY mode)
> > that's tricky.
> >
>
> Hi,
>
> There has been some ambiguity in my previous proposals.  To make things
> clear, I created a patch for arch_prctl(VSYSCALL_CTL), which controls
> the TIF_VSYSCALL_DISABLE flag.  It is entirely orthogonal to shadow
> stack or IBT.  On top of the patch, we can do SET_PERSONALITY2() to
> disable vsyscall, e.g.

NAK.  Let me try explaining again.

>
> ======
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 0e1be2a13359..c730ff00bc62 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -394,6 +394,19 @@ struct arch_elf_state {
>         .gnu_property = 0,      \
>   }
>
> +#define SET_PERSONALITY2(ex, state)                            \
> +do {                                                           \
> +       unsigned int has_cet;                                   \
> +                                                               \
> +       has_cet = GNU_PROPERTY_X86_FEATURE_1_SHSTK |            \
> +                 GNU_PROPERTY_X86_FEATURE_1_IBT;               \
> +                                                               \
> +       if ((state)->gnu_property & has_cet)                    \
> +               set_thread_flag(TIF_VSYSCALL_DISABLE);          \
> +                                                               \
> +       SET_PERSONALITY(ex);                                    \
> +} while (0)
> +

This is not what "orthogonal" means.  If the bits were orthogonal, the
logic would be:

if (gnu_property & DISABLE_VSYSCALL)
  disable vsyscall;
if (gnu_property & SHSTK)
  enable SHSTK;
if (gnu_property & IBT);
  enable IBT;

and, if necessarily (although I still think it would be preferable not
to do this):

if ((gnu_property & (DISABLE_VSYSCALL | SHSTK)) == SHSTK)
  return -EINVAL;

As far as I'm concerned, you have two choices:

a) Make SHSTK work *correctly* with vsyscall emulation.

b) Add a high quality mechanism to disable vsyscall emulation and make
SHSTK depend on that.

As far as I'm concerned, (a) is preferable.  Ideally we'd get (a)
*and* a high quality vsyscall emulation disable mechanism with no
dependencies.


> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 44c33103a955..fe8f3db6d21b 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -127,6 +127,9 @@ bool emulate_vsyscall(unsigned long error_code,
>         long ret;
>         unsigned long orig_dx;
>
> +       if (test_thread_flag(TIF_VSYSCALL_DISABLE))
> +               return false;
> +

This needs to be per-mm, not per-thread.  There's a patch floating
around that gets us about a quarter of the way there.  I'm not
convinced that CET should wait for this to finish.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, back to index

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 14:57 [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking Yu-cheng Yu
2020-09-25 14:57 ` [PATCH v13 1/8] x86/cet/ibt: Add Kconfig option for user-mode " Yu-cheng Yu
2020-09-25 14:57 ` [PATCH v13 2/8] x86/cet/ibt: User-mode Indirect Branch Tracking support Yu-cheng Yu
2020-09-25 14:57 ` [PATCH v13 3/8] x86/cet/ibt: Handle signals for Indirect Branch Tracking Yu-cheng Yu
2020-09-25 14:58 ` [PATCH v13 4/8] x86/cet/ibt: ELF header parsing " Yu-cheng Yu
2020-09-25 14:58 ` [PATCH v13 5/8] x86/cet/ibt: Update arch_prctl functions " Yu-cheng Yu
2020-09-25 14:58 ` [PATCH v13 6/8] x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point Yu-cheng Yu
2020-09-25 14:58 ` [PATCH v13 7/8] x86/vdso: Insert endbr32/endbr64 to vDSO Yu-cheng Yu
2020-09-25 16:18   ` Andy Lutomirski
2020-09-25 16:24     ` Yu, Yu-cheng
2020-09-25 14:58 ` [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation Yu-cheng Yu
2020-09-25 16:31   ` Andy Lutomirski
2020-09-25 16:47     ` Yu, Yu-cheng
2020-09-25 16:51       ` Andy Lutomirski
2020-09-28 16:59         ` Yu-cheng Yu
2020-09-28 17:37           ` Andy Lutomirski
2020-09-28 19:04             ` Yu, Yu-cheng
2020-09-29 18:37             ` Yu, Yu-cheng
2020-09-29 19:57               ` Andy Lutomirski
2020-09-29 20:00                 ` Andy Lutomirski
2020-09-30 22:33                   ` Yu, Yu-cheng
2020-09-30 23:44                     ` Andy Lutomirski
2020-10-01  1:00                       ` H.J. Lu
2020-10-01  1:10                         ` Andy Lutomirski
2020-10-01  1:21                           ` H.J. Lu
2020-10-01 16:51                           ` Yu, Yu-cheng
2020-10-01 17:26                             ` Andy Lutomirski
2020-10-06 19:09                               ` Yu, Yu-cheng
2020-10-09 17:42                                 ` Andy Lutomirski

Linux-arch Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arch/0 linux-arch/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arch linux-arch/ https://lore.kernel.org/linux-arch \
		linux-arch@vger.kernel.org
	public-inbox-index linux-arch

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-arch


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git