linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] x86: Clean up fast syscall return validation
@ 2023-07-18 13:44 Brian Gerst
  2023-07-18 13:44 ` [PATCH 1/6] x86/entry/64: Remove obsolete comment on tracing vs. SYSRET Brian Gerst
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

This patch set cleans up the tests done to determine if a fast syscall
return instruction can be used to return to userspace.  It converts the
code to C, and refactors existing code to be more readable.

Brian Gerst (6):
  x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
  x86/entry/64: Convert SYSRET validation tests to C
  x86/entry/compat: Combine return value test from syscall handler
  x86/entry/32: Convert do_fast_syscall_32() to bool return type
  x86/entry/32: Remove SEP test for SYSEXIT
  x86/entry/32: Clean up syscall fast exit tests

 arch/x86/entry/common.c          | 109 +++++++++++++++++++++----------
 arch/x86/entry/entry_32.S        |   2 +-
 arch/x86/entry/entry_64.S        |  68 +------------------
 arch/x86/entry/entry_64_compat.S |  12 ++--
 arch/x86/include/asm/syscall.h   |   6 +-
 5 files changed, 87 insertions(+), 110 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
  2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
@ 2023-07-18 13:44 ` Brian Gerst
  2023-07-18 13:44 ` [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C Brian Gerst
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

This comment comes from a time when the kernel attempted to use SYSRET
on all returns to userspace, including interrupts and exceptions.  Ever
since commit fffbb5dc ("Move opportunistic sysret code to syscall code
path"), SYSRET is only used for returning from system calls. The
specific tracing issue listed in this comment is not possible anymore.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_64.S | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 91f6818884fa..c01776a51545 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -166,22 +166,9 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
 	jne	swapgs_restore_regs_and_return_to_usermode
 
 	/*
-	 * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
-	 * restore RF properly. If the slowpath sets it for whatever reason, we
-	 * need to restore it correctly.
-	 *
-	 * SYSRET can restore TF, but unlike IRET, restoring TF results in a
-	 * trap from userspace immediately after SYSRET.  This would cause an
-	 * infinite loop whenever #DB happens with register state that satisfies
-	 * the opportunistic SYSRET conditions.  For example, single-stepping
-	 * this user code:
-	 *
-	 *           movq	$stuck_here, %rcx
-	 *           pushfq
-	 *           popq %r11
-	 *   stuck_here:
-	 *
-	 * would never get past 'stuck_here'.
+	 * SYSRET cannot restore RF.  It can restore TF, but unlike IRET,
+	 * restoring TF results in a trap from userspace immediately after
+	 * SYSRET.
 	 */
 	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
 	jnz	swapgs_restore_regs_and_return_to_usermode
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C
  2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
  2023-07-18 13:44 ` [PATCH 1/6] x86/entry/64: Remove obsolete comment on tracing vs. SYSRET Brian Gerst
@ 2023-07-18 13:44 ` Brian Gerst
  2023-07-18 14:16   ` Mika Penttilä
  2023-07-18 13:44 ` [PATCH 3/6] x86/entry/compat: Combine return value test from syscall handler Brian Gerst
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/common.c        | 50 ++++++++++++++++++++++++++++++-
 arch/x86/entry/entry_64.S      | 55 ++--------------------------------
 arch/x86/include/asm/syscall.h |  2 +-
 3 files changed, 52 insertions(+), 55 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b33..afe79c3f1c5b 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 	return false;
 }
 
-__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
+/* Returns true to return using SYSRET, or false to use IRET */
+__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
 {
+	long rip;
+	unsigned int shift_rip;
+
 	add_random_kstack_offset();
 	nr = syscall_enter_from_user_mode(regs, nr);
 
@@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
 
 	instrumentation_end();
 	syscall_exit_to_user_mode(regs);
+
+	/*
+	 * Check that the register state is valid for using SYSRET to exit
+	 * to userspace.  Otherwise use the slower but fully capable IRET
+	 * exit path.
+	 */
+
+	/* XEN PV guests always use IRET path */
+	if (cpu_feature_enabled(X86_FEATURE_XENPV))
+		return false;
+
+	/* SYSRET requires RCX == RIP and R11 == EFLAGS */
+	if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
+		return false;
+
+	/* CS and SS must match the values set in MSR_STAR */
+	if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
+		return false;
+
+	/*
+	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
+	 * in kernel space.  This essentially lets the user take over
+	 * the kernel, since userspace controls RSP.
+	 *
+	 * Change top bits to match most significant bit (47th or 56th bit
+	 * depending on paging mode) in the address.
+	 */
+	shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);
+	rip = (long) regs->ip;
+	rip <<= shift_rip;
+	rip >>= shift_rip;
+	if (unlikely((unsigned long) rip != regs->ip))
+		return false;
+
+	/*
+	 * SYSRET cannot restore RF.  It can restore TF, but unlike IRET,
+	 * restoring TF results in a trap from userspace immediately after
+	 * SYSRET.
+	 */
+	if (unlikely(regs->flags & (X86_EFLAGS_RF | X86_EFLAGS_TF)))
+		return false;
+
+	/* Use SYSRET to exit to userspace */
+	return true;
 }
 #endif
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c01776a51545..b1288e22cae8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -123,60 +123,9 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
 	 * Try to use SYSRET instead of IRET if we're returning to
 	 * a completely clean 64-bit userspace context.  If we're not,
 	 * go to the slow exit path.
-	 * In the Xen PV case we must use iret anyway.
 	 */
-
-	ALTERNATIVE "", "jmp	swapgs_restore_regs_and_return_to_usermode", \
-		X86_FEATURE_XENPV
-
-	movq	RCX(%rsp), %rcx
-	movq	RIP(%rsp), %r11
-
-	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	/*
-	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
-	 * in kernel space.  This essentially lets the user take over
-	 * the kernel, since userspace controls RSP.
-	 *
-	 * If width of "canonical tail" ever becomes variable, this will need
-	 * to be updated to remain correct on both old and new CPUs.
-	 *
-	 * Change top bits to match most significant bit (47th or 56th bit
-	 * depending on paging mode) in the address.
-	 */
-#ifdef CONFIG_X86_5LEVEL
-	ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
-		"shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
-#else
-	shl	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
-	sar	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
-#endif
-
-	/* If this changed %rcx, it was not canonical */
-	cmpq	%rcx, %r11
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	movq	R11(%rsp), %r11
-	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	/*
-	 * SYSRET cannot restore RF.  It can restore TF, but unlike IRET,
-	 * restoring TF results in a trap from userspace immediately after
-	 * SYSRET.
-	 */
-	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
-	jnz	swapgs_restore_regs_and_return_to_usermode
-
-	/* nothing to check for RSP */
-
-	cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
-	jne	swapgs_restore_regs_and_return_to_usermode
+	testb	%al, %al
+	jz	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * We win! This label is here just for ease of understanding
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 4fb36fba4b5a..be6c5515e0b9 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -126,7 +126,7 @@ static inline int syscall_get_arch(struct task_struct *task)
 		? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
 }
 
-void do_syscall_64(struct pt_regs *regs, int nr);
+bool do_syscall_64(struct pt_regs *regs, int nr);
 
 #endif	/* CONFIG_X86_32 */
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] x86/entry/compat: Combine return value test from syscall handler
  2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
  2023-07-18 13:44 ` [PATCH 1/6] x86/entry/64: Remove obsolete comment on tracing vs. SYSRET Brian Gerst
  2023-07-18 13:44 ` [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C Brian Gerst
@ 2023-07-18 13:44 ` Brian Gerst
  2023-07-18 13:44 ` [PATCH 4/6] x86/entry/32: Convert do_fast_syscall_32() to bool return type Brian Gerst
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

Move the sysret32_from_system_call label to remove a duplicate test of
the return value from the syscall handler.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_64_compat.S | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 70150298f8bd..b16272395f1a 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -118,9 +118,6 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
 
 	movq	%rsp, %rdi
 	call	do_SYSENTER_32
-	/* XEN PV guests always use IRET path */
-	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
-		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 	jmp	sysret32_from_system_call
 
 .Lsysenter_fix_flags:
@@ -212,13 +209,15 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 
 	movq	%rsp, %rdi
 	call	do_fast_syscall_32
+
+sysret32_from_system_call:
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
 		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 
-	/* Opportunistic SYSRET */
-sysret32_from_system_call:
 	/*
+	 * Opportunistic SYSRET
+	 *
 	 * We are not going to return to userspace from the trampoline
 	 * stack. So let's erase the thread stack right now.
 	 */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] x86/entry/32: Convert do_fast_syscall_32() to bool return type
  2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
                   ` (2 preceding siblings ...)
  2023-07-18 13:44 ` [PATCH 3/6] x86/entry/compat: Combine return value test from syscall handler Brian Gerst
@ 2023-07-18 13:44 ` Brian Gerst
  2023-07-18 13:44 ` [PATCH 5/6] x86/entry/32: Remove SEP test for SYSEXIT Brian Gerst
  2023-07-18 13:44 ` [PATCH 6/6] x86/entry/32: Clean up syscall fast exit tests Brian Gerst
  5 siblings, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/common.c          | 10 +++++-----
 arch/x86/entry/entry_32.S        |  2 +-
 arch/x86/entry/entry_64_compat.S |  2 +-
 arch/x86/include/asm/syscall.h   |  4 ++--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index afe79c3f1c5b..15660f936ede 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -230,8 +230,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
 	return true;
 }
 
-/* Returns 0 to return using IRET or 1 to return using SYSEXIT/SYSRETL. */
-__visible noinstr long do_fast_syscall_32(struct pt_regs *regs)
+/* Returns true to return using SYSEXIT/SYSRETL, or false to use IRET */
+__visible noinstr bool do_fast_syscall_32(struct pt_regs *regs)
 {
 	/*
 	 * Called using the internal vDSO SYSENTER/SYSCALL32 calling
@@ -249,7 +249,7 @@ __visible noinstr long do_fast_syscall_32(struct pt_regs *regs)
 
 	/* Invoke the syscall. If it failed, keep it simple: use IRET. */
 	if (!__do_fast_syscall_32(regs))
-		return 0;
+		return false;
 
 #ifdef CONFIG_X86_64
 	/*
@@ -282,8 +282,8 @@ __visible noinstr long do_fast_syscall_32(struct pt_regs *regs)
 #endif
 }
 
-/* Returns 0 to return using IRET or 1 to return using SYSEXIT/SYSRETL. */
-__visible noinstr long do_SYSENTER_32(struct pt_regs *regs)
+/* Returns true to return using SYSEXIT/SYSRETL, or false to use IRET */
+__visible noinstr bool do_SYSENTER_32(struct pt_regs *regs)
 {
 	/* SYSENTER loses RSP, but the vDSO saved it in RBP. */
 	regs->sp = regs->bp;
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6e6af42e044a..c73047bf9f4b 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -837,7 +837,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
 
 	movl	%esp, %eax
 	call	do_SYSENTER_32
-	testl	%eax, %eax
+	testb	%al, %al
 	jz	.Lsyscall_32_done
 
 	STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index b16272395f1a..27c05d08558a 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -212,7 +212,7 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 
 sysret32_from_system_call:
 	/* XEN PV guests always use IRET path */
-	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
+	ALTERNATIVE "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \
 		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 
 	/*
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index be6c5515e0b9..f44e2f9ab65d 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -131,7 +131,7 @@ bool do_syscall_64(struct pt_regs *regs, int nr);
 #endif	/* CONFIG_X86_32 */
 
 void do_int80_syscall_32(struct pt_regs *regs);
-long do_fast_syscall_32(struct pt_regs *regs);
-long do_SYSENTER_32(struct pt_regs *regs);
+bool do_fast_syscall_32(struct pt_regs *regs);
+bool do_SYSENTER_32(struct pt_regs *regs);
 
 #endif	/* _ASM_X86_SYSCALL_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] x86/entry/32: Remove SEP test for SYSEXIT
  2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
                   ` (3 preceding siblings ...)
  2023-07-18 13:44 ` [PATCH 4/6] x86/entry/32: Convert do_fast_syscall_32() to bool return type Brian Gerst
@ 2023-07-18 13:44 ` Brian Gerst
  2023-07-18 13:44 ` [PATCH 6/6] x86/entry/32: Clean up syscall fast exit tests Brian Gerst
  5 siblings, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

SEP must be already be present in order for do_fast_syscall_32() to be
called on native 32-bit, so checking it again is unnecessary.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 15660f936ede..fca6f2b7daf3 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -275,8 +275,7 @@ __visible noinstr bool do_fast_syscall_32(struct pt_regs *regs)
 	 * We don't allow syscalls at all from VM86 mode, but we still
 	 * need to check VM, because we might be returning from sys_vm86.
 	 */
-	return static_cpu_has(X86_FEATURE_SEP) &&
-		regs->cs == __USER_CS && regs->ss == __USER_DS &&
+	return regs->cs == __USER_CS && regs->ss == __USER_DS &&
 		regs->ip == landing_pad &&
 		(regs->flags & (X86_EFLAGS_RF | X86_EFLAGS_TF | X86_EFLAGS_VM)) == 0;
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] x86/entry/32: Clean up syscall fast exit tests
  2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
                   ` (4 preceding siblings ...)
  2023-07-18 13:44 ` [PATCH 5/6] x86/entry/32: Remove SEP test for SYSEXIT Brian Gerst
@ 2023-07-18 13:44 ` Brian Gerst
  5 siblings, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 13:44 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Brian Gerst

Merge compat and native code and clarify comments.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/common.c          | 48 +++++++++++++++-----------------
 arch/x86/entry/entry_64_compat.S |  5 ++--
 2 files changed, 24 insertions(+), 29 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index fca6f2b7daf3..b975dc1d0812 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -251,34 +251,30 @@ __visible noinstr bool do_fast_syscall_32(struct pt_regs *regs)
 	if (!__do_fast_syscall_32(regs))
 		return false;
 
-#ifdef CONFIG_X86_64
 	/*
-	 * Opportunistic SYSRETL: if possible, try to return using SYSRETL.
-	 * SYSRETL is available on all 64-bit CPUs, so we don't need to
-	 * bother with SYSEXIT.
-	 *
-	 * Unlike 64-bit opportunistic SYSRET, we can't check that CX == IP,
-	 * because the ECX fixup above will ensure that this is essentially
-	 * never the case.
+	 * Check that the register state is valid for using SYSRETL/SYSEXIT
+	 * to exit to userspace.  Otherwise use the slower but fully capable
+	 * IRET exit path.
 	 */
-	return regs->cs == __USER32_CS && regs->ss == __USER_DS &&
-		regs->ip == landing_pad &&
-		(regs->flags & (X86_EFLAGS_RF | X86_EFLAGS_TF)) == 0;
-#else
-	/*
-	 * Opportunistic SYSEXIT: if possible, try to return using SYSEXIT.
-	 *
-	 * Unlike 64-bit opportunistic SYSRET, we can't check that CX == IP,
-	 * because the ECX fixup above will ensure that this is essentially
-	 * never the case.
-	 *
-	 * We don't allow syscalls at all from VM86 mode, but we still
-	 * need to check VM, because we might be returning from sys_vm86.
-	 */
-	return regs->cs == __USER_CS && regs->ss == __USER_DS &&
-		regs->ip == landing_pad &&
-		(regs->flags & (X86_EFLAGS_RF | X86_EFLAGS_TF | X86_EFLAGS_VM)) == 0;
-#endif
+
+	/* XEN PV guests always use IRET path */
+	if (cpu_feature_enabled(X86_FEATURE_XENPV))
+		return false;
+
+	/* EIP must point to the VDSO landing pad */
+	if (unlikely(regs->ip != landing_pad))
+		return false;
+
+	/* CS and SS must match the values set in MSR_STAR */
+	if (unlikely(regs->cs != __USER32_CS || regs->ss != __USER_DS))
+		return false;
+
+	/* If the TF, RF, or VM flags are set, use IRET */
+	if (unlikely(regs->flags & (X86_EFLAGS_RF | X86_EFLAGS_TF | X86_EFLAGS_VM)))
+		return false;
+
+	/* Use SYSRETL/SYSEXIT to exit to userspace */
+	return true;
 }
 
 /* Returns true to return using SYSEXIT/SYSRETL, or false to use IRET */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 27c05d08558a..84e21d1ebf10 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -211,9 +211,8 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 	call	do_fast_syscall_32
 
 sysret32_from_system_call:
-	/* XEN PV guests always use IRET path */
-	ALTERNATIVE "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \
-		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
+	testb	%al, %al		/* Is SYSRET allowed? */
+	jz	swapgs_restore_regs_and_return_to_usermode
 
 	/*
 	 * Opportunistic SYSRET
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C
  2023-07-18 13:44 ` [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C Brian Gerst
@ 2023-07-18 14:16   ` Mika Penttilä
  2023-07-18 14:25     ` Brian Gerst
  0 siblings, 1 reply; 12+ messages in thread
From: Mika Penttilä @ 2023-07-18 14:16 UTC (permalink / raw)
  To: Brian Gerst, linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin, Andy Lutomirski

Hi,


On 18.7.2023 16.44, Brian Gerst wrote:
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>   arch/x86/entry/common.c        | 50 ++++++++++++++++++++++++++++++-
>   arch/x86/entry/entry_64.S      | 55 ++--------------------------------
>   arch/x86/include/asm/syscall.h |  2 +-
>   3 files changed, 52 insertions(+), 55 deletions(-)
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6c2826417b33..afe79c3f1c5b 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
>   	return false;
>   }
>   
> -__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> +/* Returns true to return using SYSRET, or false to use IRET */
> +__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
>   {
> +	long rip;
> +	unsigned int shift_rip;
> +
>   	add_random_kstack_offset();
>   	nr = syscall_enter_from_user_mode(regs, nr);
>   
> @@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
>   
>   	instrumentation_end();
>   	syscall_exit_to_user_mode(regs);
> +
> +	/*
> +	 * Check that the register state is valid for using SYSRET to exit
> +	 * to userspace.  Otherwise use the slower but fully capable IRET
> +	 * exit path.
> +	 */
> +
> +	/* XEN PV guests always use IRET path */
> +	if (cpu_feature_enabled(X86_FEATURE_XENPV))
> +		return false;
> +
> +	/* SYSRET requires RCX == RIP and R11 == EFLAGS */
> +	if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
> +		return false;
> +
> +	/* CS and SS must match the values set in MSR_STAR */
> +	if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
> +		return false;
> +
> +	/*
> +	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
> +	 * in kernel space.  This essentially lets the user take over
> +	 * the kernel, since userspace controls RSP.
> +	 *
> +	 * Change top bits to match most significant bit (47th or 56th bit
> +	 * depending on paging mode) in the address.
> +	 */
> +	shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);

Should this be:

	shift_rip = (64 - __VIRTUAL_MASK_SHIFT - 1);
?

> +	rip = (long) regs->ip;
> +	rip <<= shift_rip;
> +	rip >>= shift_rip;
> +	if (unlikely((unsigned long) rip != regs->ip))
> +		return false;
> +
> +	/*
> +	 * SYSRET cannot restore RF.  It can restore TF, but unlike IRET,
> +	 * restoring TF results in a trap from userspace immediately after
> +	 * SYSRET.
> +	 */
> +	if (unlikely(regs->flags & (X86_EFLAGS_RF | X86_EFLAGS_TF)))
> +		return false;
> +
> +	/* Use SYSRET to exit to userspace */
> +	return true;
>   }
>   #endif
>   
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c01776a51545..b1288e22cae8 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -123,60 +123,9 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>   	 * Try to use SYSRET instead of IRET if we're returning to
>   	 * a completely clean 64-bit userspace context.  If we're not,
>   	 * go to the slow exit path.
> -	 * In the Xen PV case we must use iret anyway.
>   	 */
> -
> -	ALTERNATIVE "", "jmp	swapgs_restore_regs_and_return_to_usermode", \
> -		X86_FEATURE_XENPV
> -
> -	movq	RCX(%rsp), %rcx
> -	movq	RIP(%rsp), %r11
> -
> -	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
> -	jne	swapgs_restore_regs_and_return_to_usermode
> -
> -	/*
> -	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
> -	 * in kernel space.  This essentially lets the user take over
> -	 * the kernel, since userspace controls RSP.
> -	 *
> -	 * If width of "canonical tail" ever becomes variable, this will need
> -	 * to be updated to remain correct on both old and new CPUs.
> -	 *
> -	 * Change top bits to match most significant bit (47th or 56th bit
> -	 * depending on paging mode) in the address.
> -	 */
> -#ifdef CONFIG_X86_5LEVEL
> -	ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
> -		"shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
> -#else
> -	shl	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
> -	sar	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
> -#endif
> -
> -	/* If this changed %rcx, it was not canonical */
> -	cmpq	%rcx, %r11
> -	jne	swapgs_restore_regs_and_return_to_usermode
> -
> -	cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
> -	jne	swapgs_restore_regs_and_return_to_usermode
> -
> -	movq	R11(%rsp), %r11
> -	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
> -	jne	swapgs_restore_regs_and_return_to_usermode
> -
> -	/*
> -	 * SYSRET cannot restore RF.  It can restore TF, but unlike IRET,
> -	 * restoring TF results in a trap from userspace immediately after
> -	 * SYSRET.
> -	 */
> -	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
> -	jnz	swapgs_restore_regs_and_return_to_usermode
> -
> -	/* nothing to check for RSP */
> -
> -	cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
> -	jne	swapgs_restore_regs_and_return_to_usermode
> +	testb	%al, %al
> +	jz	swapgs_restore_regs_and_return_to_usermode
>   
>   	/*
>   	 * We win! This label is here just for ease of understanding
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 4fb36fba4b5a..be6c5515e0b9 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -126,7 +126,7 @@ static inline int syscall_get_arch(struct task_struct *task)
>   		? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
>   }
>   
> -void do_syscall_64(struct pt_regs *regs, int nr);
> +bool do_syscall_64(struct pt_regs *regs, int nr);
>   
>   #endif	/* CONFIG_X86_32 */
>   


--Mika


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C
  2023-07-18 14:16   ` Mika Penttilä
@ 2023-07-18 14:25     ` Brian Gerst
  2023-07-18 14:49       ` Mika Penttilä
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 14:25 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: linux-kernel, x86, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski

On Tue, Jul 18, 2023 at 10:17 AM Mika Penttilä <mpenttil@redhat.com> wrote:
>
> Hi,
>
>
> On 18.7.2023 16.44, Brian Gerst wrote:
> > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > ---
> >   arch/x86/entry/common.c        | 50 ++++++++++++++++++++++++++++++-
> >   arch/x86/entry/entry_64.S      | 55 ++--------------------------------
> >   arch/x86/include/asm/syscall.h |  2 +-
> >   3 files changed, 52 insertions(+), 55 deletions(-)
> >
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 6c2826417b33..afe79c3f1c5b 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
> >       return false;
> >   }
> >
> > -__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> > +/* Returns true to return using SYSRET, or false to use IRET */
> > +__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
> >   {
> > +     long rip;
> > +     unsigned int shift_rip;
> > +
> >       add_random_kstack_offset();
> >       nr = syscall_enter_from_user_mode(regs, nr);
> >
> > @@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> >
> >       instrumentation_end();
> >       syscall_exit_to_user_mode(regs);
> > +
> > +     /*
> > +      * Check that the register state is valid for using SYSRET to exit
> > +      * to userspace.  Otherwise use the slower but fully capable IRET
> > +      * exit path.
> > +      */
> > +
> > +     /* XEN PV guests always use IRET path */
> > +     if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > +             return false;
> > +
> > +     /* SYSRET requires RCX == RIP and R11 == EFLAGS */
> > +     if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
> > +             return false;
> > +
> > +     /* CS and SS must match the values set in MSR_STAR */
> > +     if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
> > +             return false;
> > +
> > +     /*
> > +      * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
> > +      * in kernel space.  This essentially lets the user take over
> > +      * the kernel, since userspace controls RSP.
> > +      *
> > +      * Change top bits to match most significant bit (47th or 56th bit
> > +      * depending on paging mode) in the address.
> > +      */
> > +     shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);
>
> Should this be:
>
>         shift_rip = (64 - __VIRTUAL_MASK_SHIFT - 1);
> ?

I removed a set of parentheses, which switched the sign from -1 to +1.
I could put it back if that's less confusing.

Brian Gerst

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C
  2023-07-18 14:25     ` Brian Gerst
@ 2023-07-18 14:49       ` Mika Penttilä
  2023-07-18 15:21         ` Brian Gerst
  0 siblings, 1 reply; 12+ messages in thread
From: Mika Penttilä @ 2023-07-18 14:49 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski



On 18.7.2023 17.25, Brian Gerst wrote:
> On Tue, Jul 18, 2023 at 10:17 AM Mika Penttilä <mpenttil@redhat.com> wrote:
>>
>> Hi,
>>
>>
>> On 18.7.2023 16.44, Brian Gerst wrote:
>>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>>> ---
>>>    arch/x86/entry/common.c        | 50 ++++++++++++++++++++++++++++++-
>>>    arch/x86/entry/entry_64.S      | 55 ++--------------------------------
>>>    arch/x86/include/asm/syscall.h |  2 +-
>>>    3 files changed, 52 insertions(+), 55 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
>>> index 6c2826417b33..afe79c3f1c5b 100644
>>> --- a/arch/x86/entry/common.c
>>> +++ b/arch/x86/entry/common.c
>>> @@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
>>>        return false;
>>>    }
>>>
>>> -__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
>>> +/* Returns true to return using SYSRET, or false to use IRET */
>>> +__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
>>>    {
>>> +     long rip;
>>> +     unsigned int shift_rip;
>>> +
>>>        add_random_kstack_offset();
>>>        nr = syscall_enter_from_user_mode(regs, nr);
>>>
>>> @@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
>>>
>>>        instrumentation_end();
>>>        syscall_exit_to_user_mode(regs);
>>> +
>>> +     /*
>>> +      * Check that the register state is valid for using SYSRET to exit
>>> +      * to userspace.  Otherwise use the slower but fully capable IRET
>>> +      * exit path.
>>> +      */
>>> +
>>> +     /* XEN PV guests always use IRET path */
>>> +     if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>> +             return false;
>>> +
>>> +     /* SYSRET requires RCX == RIP and R11 == EFLAGS */
>>> +     if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
>>> +             return false;
>>> +
>>> +     /* CS and SS must match the values set in MSR_STAR */
>>> +     if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
>>> +             return false;
>>> +
>>> +     /*
>>> +      * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
>>> +      * in kernel space.  This essentially lets the user take over
>>> +      * the kernel, since userspace controls RSP.
>>> +      *
>>> +      * Change top bits to match most significant bit (47th or 56th bit
>>> +      * depending on paging mode) in the address.
>>> +      */
>>> +     shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);
>>
>> Should this be:
>>
>>          shift_rip = (64 - __VIRTUAL_MASK_SHIFT - 1);
>> ?
> 
> I removed a set of parentheses, which switched the sign from -1 to +1.
> I could put it back if that's less confusing.
> 

I mean isn't it supposed to be:
shift_rip = (64 - 48) for 4 level, now it's
shift_rip = (64 - 46)

__VIRTUAL_MASK_SHIFT == 47


> Brian Gerst
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C
  2023-07-18 14:49       ` Mika Penttilä
@ 2023-07-18 15:21         ` Brian Gerst
  2023-07-18 15:46           ` Brian Gerst
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 15:21 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: linux-kernel, x86, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski

On Tue, Jul 18, 2023 at 10:49 AM Mika Penttilä <mpenttil@redhat.com> wrote:
>
>
>
> On 18.7.2023 17.25, Brian Gerst wrote:
> > On Tue, Jul 18, 2023 at 10:17 AM Mika Penttilä <mpenttil@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >> On 18.7.2023 16.44, Brian Gerst wrote:
> >>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> >>> ---
> >>>    arch/x86/entry/common.c        | 50 ++++++++++++++++++++++++++++++-
> >>>    arch/x86/entry/entry_64.S      | 55 ++--------------------------------
> >>>    arch/x86/include/asm/syscall.h |  2 +-
> >>>    3 files changed, 52 insertions(+), 55 deletions(-)
> >>>
> >>> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> >>> index 6c2826417b33..afe79c3f1c5b 100644
> >>> --- a/arch/x86/entry/common.c
> >>> +++ b/arch/x86/entry/common.c
> >>> @@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
> >>>        return false;
> >>>    }
> >>>
> >>> -__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> >>> +/* Returns true to return using SYSRET, or false to use IRET */
> >>> +__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
> >>>    {
> >>> +     long rip;
> >>> +     unsigned int shift_rip;
> >>> +
> >>>        add_random_kstack_offset();
> >>>        nr = syscall_enter_from_user_mode(regs, nr);
> >>>
> >>> @@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> >>>
> >>>        instrumentation_end();
> >>>        syscall_exit_to_user_mode(regs);
> >>> +
> >>> +     /*
> >>> +      * Check that the register state is valid for using SYSRET to exit
> >>> +      * to userspace.  Otherwise use the slower but fully capable IRET
> >>> +      * exit path.
> >>> +      */
> >>> +
> >>> +     /* XEN PV guests always use IRET path */
> >>> +     if (cpu_feature_enabled(X86_FEATURE_XENPV))
> >>> +             return false;
> >>> +
> >>> +     /* SYSRET requires RCX == RIP and R11 == EFLAGS */
> >>> +     if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
> >>> +             return false;
> >>> +
> >>> +     /* CS and SS must match the values set in MSR_STAR */
> >>> +     if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
> >>> +             return false;
> >>> +
> >>> +     /*
> >>> +      * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
> >>> +      * in kernel space.  This essentially lets the user take over
> >>> +      * the kernel, since userspace controls RSP.
> >>> +      *
> >>> +      * Change top bits to match most significant bit (47th or 56th bit
> >>> +      * depending on paging mode) in the address.
> >>> +      */
> >>> +     shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);
> >>
> >> Should this be:
> >>
> >>          shift_rip = (64 - __VIRTUAL_MASK_SHIFT - 1);
> >> ?
> >
> > I removed a set of parentheses, which switched the sign from -1 to +1.
> > I could put it back if that's less confusing.
> >
>
> I mean isn't it supposed to be:
> shift_rip = (64 - 48) for 4 level, now it's
> shift_rip = (64 - 46)
>
> __VIRTUAL_MASK_SHIFT == 47

Original:
(64 - (47 + 1)) = (64 - 48) = 16

  c5:   48 c1 e1 10             shl    $0x10,%rcx
  c9:   48 c1 f9 10             sar    $0x10,%rcx

New:
(64 - 47 - 1) = (17 - 1) = 16

 18b:   b9 10 00 00 00          mov    $0x10,%ecx
 193:   48 d3 e2                shl    %cl,%rdx
 196:   48 d3 fa                sar    %cl,%rdx

Anyways, I'll switch it back to the original formula.  I'm not going
to argue any more about basic math.

Brian Gerst

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C
  2023-07-18 15:21         ` Brian Gerst
@ 2023-07-18 15:46           ` Brian Gerst
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2023-07-18 15:46 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: linux-kernel, x86, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski

On Tue, Jul 18, 2023 at 11:21 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> On Tue, Jul 18, 2023 at 10:49 AM Mika Penttilä <mpenttil@redhat.com> wrote:
> >
> >
> >
> > On 18.7.2023 17.25, Brian Gerst wrote:
> > > On Tue, Jul 18, 2023 at 10:17 AM Mika Penttilä <mpenttil@redhat.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >>
> > >> On 18.7.2023 16.44, Brian Gerst wrote:
> > >>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > >>> ---
> > >>>    arch/x86/entry/common.c        | 50 ++++++++++++++++++++++++++++++-
> > >>>    arch/x86/entry/entry_64.S      | 55 ++--------------------------------
> > >>>    arch/x86/include/asm/syscall.h |  2 +-
> > >>>    3 files changed, 52 insertions(+), 55 deletions(-)
> > >>>
> > >>> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > >>> index 6c2826417b33..afe79c3f1c5b 100644
> > >>> --- a/arch/x86/entry/common.c
> > >>> +++ b/arch/x86/entry/common.c
> > >>> @@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
> > >>>        return false;
> > >>>    }
> > >>>
> > >>> -__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> > >>> +/* Returns true to return using SYSRET, or false to use IRET */
> > >>> +__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
> > >>>    {
> > >>> +     long rip;
> > >>> +     unsigned int shift_rip;
> > >>> +
> > >>>        add_random_kstack_offset();
> > >>>        nr = syscall_enter_from_user_mode(regs, nr);
> > >>>
> > >>> @@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> > >>>
> > >>>        instrumentation_end();
> > >>>        syscall_exit_to_user_mode(regs);
> > >>> +
> > >>> +     /*
> > >>> +      * Check that the register state is valid for using SYSRET to exit
> > >>> +      * to userspace.  Otherwise use the slower but fully capable IRET
> > >>> +      * exit path.
> > >>> +      */
> > >>> +
> > >>> +     /* XEN PV guests always use IRET path */
> > >>> +     if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > >>> +             return false;
> > >>> +
> > >>> +     /* SYSRET requires RCX == RIP and R11 == EFLAGS */
> > >>> +     if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
> > >>> +             return false;
> > >>> +
> > >>> +     /* CS and SS must match the values set in MSR_STAR */
> > >>> +     if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
> > >>> +             return false;
> > >>> +
> > >>> +     /*
> > >>> +      * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
> > >>> +      * in kernel space.  This essentially lets the user take over
> > >>> +      * the kernel, since userspace controls RSP.
> > >>> +      *
> > >>> +      * Change top bits to match most significant bit (47th or 56th bit
> > >>> +      * depending on paging mode) in the address.
> > >>> +      */
> > >>> +     shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);
> > >>
> > >> Should this be:
> > >>
> > >>          shift_rip = (64 - __VIRTUAL_MASK_SHIFT - 1);
> > >> ?
> > >
> > > I removed a set of parentheses, which switched the sign from -1 to +1.
> > > I could put it back if that's less confusing.
> > >
> >
> > I mean isn't it supposed to be:
> > shift_rip = (64 - 48) for 4 level, now it's
> > shift_rip = (64 - 46)
> >
> > __VIRTUAL_MASK_SHIFT == 47

My apologies, you were right.  I've been sitting on this series for a
while and finally got around to posting it and didn't catch that
error.

>
> Original:
> (64 - (47 + 1)) = (64 - 48) = 16
>
>   c5:   48 c1 e1 10             shl    $0x10,%rcx
>   c9:   48 c1 f9 10             sar    $0x10,%rcx

This was wrong.  I hastily compiled this after I had reverted to the
original formula.

> New:
> (64 - 47 - 1) = (17 - 1) = 16
>
>  18b:   b9 10 00 00 00          mov    $0x10,%ecx
>  193:   48 d3 e2                shl    %cl,%rdx
>  196:   48 d3 fa                sar    %cl,%rdx
>
> Anyways, I'll switch it back to the original formula.  I'm not going
> to argue any more about basic math.

I'll send a v2 later after any more feedback.  Thanks.

 Brian Gerst

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-18 15:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-18 13:44 [PATCH 0/6] x86: Clean up fast syscall return validation Brian Gerst
2023-07-18 13:44 ` [PATCH 1/6] x86/entry/64: Remove obsolete comment on tracing vs. SYSRET Brian Gerst
2023-07-18 13:44 ` [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C Brian Gerst
2023-07-18 14:16   ` Mika Penttilä
2023-07-18 14:25     ` Brian Gerst
2023-07-18 14:49       ` Mika Penttilä
2023-07-18 15:21         ` Brian Gerst
2023-07-18 15:46           ` Brian Gerst
2023-07-18 13:44 ` [PATCH 3/6] x86/entry/compat: Combine return value test from syscall handler Brian Gerst
2023-07-18 13:44 ` [PATCH 4/6] x86/entry/32: Convert do_fast_syscall_32() to bool return type Brian Gerst
2023-07-18 13:44 ` [PATCH 5/6] x86/entry/32: Remove SEP test for SYSEXIT Brian Gerst
2023-07-18 13:44 ` [PATCH 6/6] x86/entry/32: Clean up syscall fast exit tests Brian Gerst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).