linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code
@ 2021-11-26 10:11 Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 01/49] x86/entry: Add fence for kernel entry swapgs in paranoid_entry() Lai Jiangshan
                   ` (50 more replies)
  0 siblings, 51 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, H. Peter Anvin,
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Changed from V5:
	Fix the code order of FENCE_SWAPGS_KERNEL_ENTRY in patch1 and
	change the new corresponding C entry code to match the asm code.

	Squash the patch of removing stack-protector from traps.c into
	a later patch that uses C entry code for #DB and #MCE

	Kill .Lgs_change and use the new asm_load_gs_index_gs_change in
	_ASM_EXTABLE

	s/ETNRY/ENTRY/g for DEFINE_IDTENTRY_IST_ENTRY macros
----

Many ASM code in entry_64.S can be rewritten in C if they can be written
to be non-instrumentable and are called in the right order regarding to
whether CR3/gsbase is changed to kernel CR3/gsbase.

The patchset covert some of them to C code.

The patch 23 converts the error_entry() to C code. And patch 1-23
are fixes and preparation for it.

The patches 24-26 convert entry_INT80_compat and do cleanup.

The patches 27-45 convert the IST entry code to C code.  Many of them
are preparation for the actual conversion.

The patches 46-48 do cleanup.

The patch 49 converts a small part of ASM code of syscall to C code which
does the checking for whether it can use sysret to return to userspace.

Some other paths can be possible to be in C code, for example: the
error exit, the syscall entry/exit.  The PTI handling for them can
be in C code.  But it would required the pt_regs to be copied/pushed
to the entry stack which means the C code would not be efficient.

When converting ASM to C, the most effort is to make them the same.
Almost no creative was involved.  The code are kept as the same as ASM
as possible and no functional change intended unless my misunderstanding
in the ASM code was involved.  The functions called by the C entry code
are checked to be ensured noinstr or __always_inline.  Some of them have
more than one definitions and require some more cares from reviewers.
The comments in the ASM are also copied in the right place in the C code.

Changed from V4:
	Move FENCE_SWAPGS_KERNEL_ENTRY up in the patch1. And change the
	corresponding C code in later patches to keep coherence.

	Jmp to xenpv_restore_regs_and_return_to_usermode in
	swapgs_restore_regs_and_return_to_usermode instead of calling
	it everywhere.

	Add Miguel Ojeda's Reviewed-by.

Changed from V3:
	Add a "Reviewed-by" for the xenpv fix
	Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

	Change __attribute((__section__(section))) to __section(section)

	Move a part of ist_paranoid_exit() as a new ist_restore_gsbase()

	Add a new commit (patch 32) to change the ASM RESTORE_CR3, the
		corresponding C version ist_restore_cr3() is changed too.

Changed from V2:
	Fix two places with missed FENCE_SWAPGS_KERNEL_ENTRY.

	Fix swapgs_restore_regs_and_return_to_usermode for XENPV.

	Updates the C entry_error()/parnoid_entry() to use
		fence_swapgs_kernel_entry when with user gsbase
		in kernel CR3.

	Simplify removing stack-protector in MAKEFILE.

	Squash commits about removing stack-protector in MAKEFILE.

	In V2 the C entry_error() checks xenpv first and uses natvie_swapgs
		but ASM entry_error() uses pv-aware SWAPGS.  In V3, the
		commit is split into 3 commit, so the conversion has no
		semantic change.

	Move cld to the start of idtentry.

	Use idtentry macro for entry_INT80_compat and remove the old one.

	Add cleanup for PTI_USER_PGTABLE_BIT when it is moved to header
	file.

	Remove pv-aware SWAPGS.

Changed from V1:
	Add a fix as the patch1.  Found by trying to applied Peterz's
		suggestion in patch11.
	The whole entry_error() is converted to C instead of partial.
	The whole parnoid_entry() is converted to C instead of partial.
	The asm code of "parnoid_entry() cfunc() parnoid_exit()" are
		converted to C as suggested by Peterz.
	Add entry64.c rather than move traps.c to arch/x86/entry/
	The order of some commits is changed.
	Remove two cleanups

[V1]: https://lore.kernel.org/all/20210831175025.27570-1-jiangshanlai@gmail.com/
[V2]: https://lore.kernel.org/lkml/20210926150838.197719-1-jiangshanlai@gmail.com/
[V3]: https://lore.kernel.org/lkml/20211014031413.14471-1-jiangshanlai@gmail.com/
[V4]: https://lore.kernel.org/lkml/20211026141420.17138-1-jiangshanlai@gmail.com/
[V5]: https://lore.kernel.org/lkml/20211110115736.3776-1-jiangshanlai@gmail.com/

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <jroedel@suse.de>

Lai Jiangshan (49):
  x86/entry: Add fence for kernel entry swapgs in paranoid_entry()
  x86/entry: Use the correct fence macro after swapgs in kernel CR3
  x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
  x86/entry: Use swapgs and native_iret directly in
    swapgs_restore_regs_and_return_to_usermode
  compiler_types.h: Add __noinstr_section() for noinstr
  x86/entry: Introduce __entry_text for entry code written in C
  x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h
  x86: Remove unused kernel_to_user_p4dp() and user_to_kernel_p4dp()
  x86: Replace PTI_PGTABLE_SWITCH_BIT with PTI_USER_PGTABLE_BIT
  x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline
  x86/traps: Move the declaration of native_irq_return_iret into proto.h
  x86/entry: Add arch/x86/entry/entry64.c for C entry code
  x86/entry: Expose the address of .Lgs_change to entry64.c
  x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as
    switch_to_kernel_cr3()
  x86/traps: Add fence_swapgs_{user,kernel}_entry()
  x86/entry: Add C user_entry_swapgs_and_fence()
  x86/traps: Move pt_regs only in fixup_bad_iret()
  x86/entry: Switch the stack after error_entry() returns
  x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
  x86/entry: Move cld to the start of idtentry
  x86/entry: Don't call error_entry for XENPV
  x86/entry: Convert SWAPGS to swapgs in error_entry()
  x86/entry: Implement the whole error_entry() as C code
  x86/entry: Use idtentry macro for entry_INT80_compat
  x86/entry: Convert SWAPGS to swapgs in entry_SYSENTER_compat()
  x86: Remove the definition of SWAPGS
  x86/entry: Make paranoid_exit() callable
  x86/entry: Call paranoid_exit() in asm_exc_nmi()
  x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry
  x86/entry: Add the C version ist_switch_to_kernel_cr3()
  x86/entry: Skip CR3 write when the saved CR3 is kernel CR3 in
    RESTORE_CR3
  x86/entry: Add the C version ist_restore_cr3()
  x86/entry: Add the C version get_percpu_base()
  x86/entry: Add the C version ist_switch_to_kernel_gsbase()
  x86/entry: Implement the C version ist_paranoid_entry()
  x86/entry: Implement the C version ist_paranoid_exit()
  x86/entry: Add a C macro to define the function body for IST in
    .entry.text
  x86/debug, mce: Use C entry code
  x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up
  x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi
  x86/nmi: Use C entry code
  x86/entry: Add a C macro to define the function body for IST in
    .entry.text with an error code
  x86/doublefault: Use C entry code
  x86/sev: Add and use ist_vc_switch_off_ist()
  x86/sev: Use C entry code
  x86/entry: Remove ASM function paranoid_entry() and paranoid_exit()
  x86/entry: Remove the unused ASM macros
  x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS
  x86/syscall/64: Move the checking for sysret to C code

 arch/x86/entry/Makefile                |   3 +-
 arch/x86/entry/calling.h               | 142 +-------
 arch/x86/entry/common.c                |  73 +++-
 arch/x86/entry/entry64.c               | 348 +++++++++++++++++++
 arch/x86/entry/entry_64.S              | 448 ++++---------------------
 arch/x86/entry/entry_64_compat.S       | 104 +-----
 arch/x86/include/asm/idtentry.h        | 111 +++++-
 arch/x86/include/asm/irqflags.h        |   8 -
 arch/x86/include/asm/pgtable.h         |  23 +-
 arch/x86/include/asm/processor-flags.h |  15 +
 arch/x86/include/asm/proto.h           |   5 +-
 arch/x86/include/asm/special_insns.h   |   4 +-
 arch/x86/include/asm/syscall.h         |   2 +-
 arch/x86/include/asm/traps.h           |   6 +-
 arch/x86/kernel/Makefile               |   3 +
 arch/x86/kernel/cpu/mce/Makefile       |   3 +
 arch/x86/kernel/nmi.c                  |   2 +-
 arch/x86/kernel/traps.c                |  33 +-
 arch/x86/xen/xen-asm.S                 |  20 ++
 include/linux/compiler_types.h         |   8 +-
 20 files changed, 677 insertions(+), 684 deletions(-)
 create mode 100644 arch/x86/entry/entry64.c

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH V6 01/49] x86/entry: Add fence for kernel entry swapgs in paranoid_entry()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-12-04 11:45   ` [tip: x86/urgent] x86/entry: Add a fence for kernel entry SWAPGS " tip-bot2 for Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 02/49] x86/entry: Use the correct fence macro after swapgs in kernel CR3 Lai Jiangshan
                   ` (49 subsequent siblings)
  50 siblings, 1 reply; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Josh Poimboeuf, Chang S . Bae, Sasha Levin,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Commit 18ec54fdd6d18 ("x86/speculation: Prepare entry code for Spectre
v1 swapgs mitigations") adds FENCE_SWAPGS_{KERNEL|USER}_ENTRY
for conditional swapgs.  And in paranoid_entry(), it uses only
FENCE_SWAPGS_KERNEL_ENTRY for both branches.  It is because the fence
is required for both cases since the CR3 write is conditinal even PTI
is enabled.

But commit 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in
paranoid entry") switches the code order and changes the branches.
And it misses the needed FENCE_SWAPGS_KERNEL_ENTRY for user gsbase case.

Add it back by changing the branches so that FENCE_SWAPGS_KERNEL_ENTRY
can cover both branches.

Fixes: Commit 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e38a4cf795d9..8582709576bf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -897,11 +897,12 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	movl	$MSR_GS_BASE, %ecx
 	rdmsr
 	testl	%edx, %edx
-	jns	.Lparanoid_entry_swapgs
-	ret
+	js	.Lparanoid_kernel_gsbase
 
-.Lparanoid_entry_swapgs:
+	/* EBX = 0 -> SWAPGS required on exit */
+	xorl	%ebx, %ebx
 	swapgs
+.Lparanoid_kernel_gsbase:
 
 	/*
 	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
@@ -909,9 +910,6 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	 * to prevent GS speculation, regardless of whether PTI is enabled.
 	 */
 	FENCE_SWAPGS_KERNEL_ENTRY
-
-	/* EBX = 0 -> SWAPGS required on exit */
-	xorl	%ebx, %ebx
 	ret
 SYM_CODE_END(paranoid_entry)
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 02/49] x86/entry: Use the correct fence macro after swapgs in kernel CR3
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 01/49] x86/entry: Add fence for kernel entry swapgs in paranoid_entry() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-12-04 11:45   ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 03/49] x86/xen: Add xenpv_restore_regs_and_return_to_usermode() Lai Jiangshan
                   ` (48 subsequent siblings)
  50 siblings, 1 reply; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

The commit c75890700455 ("x86/entry/64: Remove unneeded kernel CR3
switching") removes a CR3 write in the faulting path of load_gs_index.

But the path's FENCE_SWAPGS_USER_ENTRY has no fence operation if PTI
is enabled.  Rahter, it depends on the CR3 write of SWITCH_TO_KERNEL_CR3.
So the path should use FENCE_SWAPGS_KERNEL_ENTRY if SWITCH_TO_KERNEL_CR3
is removed.

Fixes: c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 8582709576bf..4967edded48d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -991,11 +991,6 @@ SYM_CODE_START_LOCAL(error_entry)
 	pushq	%r12
 	ret
 
-.Lerror_entry_done_lfence:
-	FENCE_SWAPGS_KERNEL_ENTRY
-.Lerror_entry_done:
-	ret
-
 	/*
 	 * There are two places in the kernel that can potentially fault with
 	 * usergs. Handle them here.  B stepping K8s sometimes report a
@@ -1018,8 +1013,15 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * .Lgs_change's error handler with kernel gsbase.
 	 */
 	SWAPGS
-	FENCE_SWAPGS_USER_ENTRY
-	jmp .Lerror_entry_done
+
+	/*
+	 * The above code has no serializing instruction.  So do an lfence
+	 * to prevent GS speculation, regardless of whether it is kernel
+	 * gsbase or user gsbase.
+	 */
+.Lerror_entry_done_lfence:
+	FENCE_SWAPGS_KERNEL_ENTRY
+	ret
 
 .Lbstep_iret:
 	/* Fix truncated RIP */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 03/49] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 01/49] x86/entry: Add fence for kernel entry swapgs in paranoid_entry() Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 02/49] x86/entry: Use the correct fence macro after swapgs in kernel CR3 Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-12-04 11:45   ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 04/49] x86/entry: Use swapgs and native_iret directly in swapgs_restore_regs_and_return_to_usermode Lai Jiangshan
                   ` (47 subsequent siblings)
  50 siblings, 1 reply; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Jan Beulich, Thomas Gleixner, Juergen Gross,
	Peter Anvin, xen-devel, Andy Lutomirski, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Boris Ostrovsky,
	Stefano Stabellini

From: Lai Jiangshan <laijs@linux.alibaba.com>

While in the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
trampoline stack.  But XEN pv doesn't use trampoline stack, so
PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.  Hence source
and destination stacks are identical in that case, which means reusing
swapgs_restore_regs_and_return_to_usermode() in XEN pv would cause %rsp
to move up to the top of the kernel stack and leave the IRET frame below
%rsp, which is dangerous to be corrupted if #NMI / #MC hit as either of
these events occurring in the middle of the stack pushing would clobber
data on the (original) stack.

And, when XEN pv, swapgs_restore_regs_and_return_to_usermode() pushing
the IRET frame on to the original address is useless and error-prone
when there is any future attempt to modify the code.

Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S |  4 ++++
 arch/x86/xen/xen-asm.S    | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4967edded48d..68e697acefac 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -574,6 +574,10 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 	ud2
 1:
 #endif
+#ifdef CONFIG_XEN_PV
+	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
+#endif
+
 	POP_REGS pop_rdi=0
 
 	/*
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 220dd9678494..444d824775f6 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,6 +20,7 @@
 
 #include <linux/init.h>
 #include <linux/linkage.h>
+#include <../entry/calling.h>
 
 .pushsection .noinstr.text, "ax"
 /*
@@ -192,6 +193,25 @@ SYM_CODE_START(xen_iret)
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)
 
+/*
+ * XEN pv doesn't use trampoline stack, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is
+ * also the kernel stack.  Reusing swapgs_restore_regs_and_return_to_usermode()
+ * in XEN pv would cause %rsp to move up to the top of the kernel stack and
+ * leave the IRET frame below %rsp, which is dangerous to be corrupted if #NMI
+ * interrupts. And swapgs_restore_regs_and_return_to_usermode() pushing the IRET
+ * frame at the same address is useless.
+ */
+SYM_CODE_START(xenpv_restore_regs_and_return_to_usermode)
+	UNWIND_HINT_REGS
+	POP_REGS
+
+	/* stackleak_erase() can work safely on the kernel stack. */
+	STACKLEAK_ERASE_NOCLOBBER
+
+	addq	$8, %rsp	/* skip regs->orig_ax */
+	jmp xen_iret
+SYM_CODE_END(xenpv_restore_regs_and_return_to_usermode)
+
 /*
  * Xen handles syscall callbacks much like ordinary exceptions, which
  * means we have:
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 04/49] x86/entry: Use swapgs and native_iret directly in swapgs_restore_regs_and_return_to_usermode
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (2 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 03/49] x86/xen: Add xenpv_restore_regs_and_return_to_usermode() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 05/49] compiler_types.h: Add __noinstr_section() for noinstr Lai Jiangshan
                   ` (46 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

swapgs_restore_regs_and_return_to_usermode() is used in native code
(non-xenpv) only now, so it doesn't need the PV-aware SWAPGS and
INTERRUPT_RETURN.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 68e697acefac..44774cc5bcc9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -608,8 +608,8 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 
 	/* Restore RDI. */
 	popq	%rdi
-	SWAPGS
-	INTERRUPT_RETURN
+	swapgs
+	jmp	native_iret
 
 
 SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 05/49] compiler_types.h: Add __noinstr_section() for noinstr
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (3 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 04/49] x86/entry: Use swapgs and native_iret directly in swapgs_restore_regs_and_return_to_usermode Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 06/49] x86/entry: Introduce __entry_text for entry code written in C Lai Jiangshan
                   ` (45 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Borislav Petkov, Kees Cook, Miguel Ojeda,
	Nathan Chancellor, Nick Desaulniers, Andrew Morton,
	Sami Tolvanen, Marco Elver, Masahiro Yamada

From: Lai Jiangshan <laijs@linux.alibaba.com>

And it will be extended for C entry code.

Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/compiler_types.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 1d32f4c03c9e..1c9ca1e3ad26 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -208,9 +208,11 @@ struct ftrace_likely_data {
 #endif
 
 /* Section for code which can't be instrumented at all */
-#define noinstr								\
-	noinline notrace __attribute((__section__(".noinstr.text")))	\
-	__no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
+#define __noinstr_section(section)				\
+	noinline notrace __section(section) __no_profile	\
+	__no_kcsan __no_sanitize_address __no_sanitize_coverage
+
+#define noinstr __noinstr_section(".noinstr.text")
 
 #endif /* __KERNEL__ */
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 06/49] x86/entry: Introduce __entry_text for entry code written in C
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (4 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 05/49] compiler_types.h: Add __noinstr_section() for noinstr Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 07/49] x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h Lai Jiangshan
                   ` (44 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Some entry code will be implemented in C files.  __entry_text is needed
to set them in .entry.text section.  __entry_text disables instrumentation
like noinstr, but it doesn't disable stack protector since not all
compiler supported by kernel supporting function level granular
attribute to disable stack protector.  It will be disabled by C file
level.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..6779def97591 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -11,6 +11,9 @@
 
 #include <asm/irq_stack.h>
 
+/* Entry code written in C. */
+#define __entry_text __noinstr_section(".entry.text")
+
 /**
  * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
  *		      No error code pushed by hardware
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 07/49] x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (5 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 06/49] x86/entry: Introduce __entry_text for entry code written in C Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 08/49] x86: Remove unused kernel_to_user_p4dp() and user_to_kernel_p4dp() Lai Jiangshan
                   ` (43 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

These constants will be also used in C file, so we move them to
arch/x86/include/asm/processor-flags.h which already has a kin
X86_CR3_PTI_PCID_USER_BIT defined in it.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/calling.h               | 10 ----------
 arch/x86/include/asm/processor-flags.h | 15 +++++++++++++++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a4c061fb7c6e..996b041e92d2 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -149,16 +149,6 @@ For 32-bit we have the following conventions - kernel is built with
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 
-/*
- * PAGE_TABLE_ISOLATION PGDs are 8k.  Flip bit 12 to switch between the two
- * halves:
- */
-#define PTI_USER_PGTABLE_BIT		PAGE_SHIFT
-#define PTI_USER_PGTABLE_MASK		(1 << PTI_USER_PGTABLE_BIT)
-#define PTI_USER_PCID_BIT		X86_CR3_PTI_PCID_USER_BIT
-#define PTI_USER_PCID_MASK		(1 << PTI_USER_PCID_BIT)
-#define PTI_USER_PGTABLE_AND_PCID_MASK  (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
-
 .macro SET_NOFLUSH_BIT	reg:req
 	bts	$X86_CR3_PCID_NOFLUSH_BIT, \reg
 .endm
diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 02c2cbda4a74..4dd2fbbc861a 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -4,6 +4,7 @@
 
 #include <uapi/asm/processor-flags.h>
 #include <linux/mem_encrypt.h>
+#include <asm/page_types.h>
 
 #ifdef CONFIG_VM86
 #define X86_VM_MASK	X86_EFLAGS_VM
@@ -50,7 +51,21 @@
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
+
 # define X86_CR3_PTI_PCID_USER_BIT	11
+
+#ifdef CONFIG_X86_64
+/*
+ * PAGE_TABLE_ISOLATION PGDs are 8k.  Flip bit 12 to switch between the two
+ * halves:
+ */
+#define PTI_USER_PGTABLE_BIT		PAGE_SHIFT
+#define PTI_USER_PGTABLE_MASK		(1 << PTI_USER_PGTABLE_BIT)
+#define PTI_USER_PCID_BIT		X86_CR3_PTI_PCID_USER_BIT
+#define PTI_USER_PCID_MASK		(1 << PTI_USER_PCID_BIT)
+#define PTI_USER_PGTABLE_AND_PCID_MASK  (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
+#endif
+
 #endif
 
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 08/49] x86: Remove unused kernel_to_user_p4dp() and user_to_kernel_p4dp()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (6 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 07/49] x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 09/49] x86: Replace PTI_PGTABLE_SWITCH_BIT with PTI_USER_PGTABLE_BIT Lai Jiangshan
                   ` (42 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Andrew Morton,
	Aneesh Kumar K.V

From: Lai Jiangshan <laijs@linux.alibaba.com>

kernel_to_user_p4dp() and user_to_kernel_p4dp() have no caller and can
be removed.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/pgtable.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ec..65542106464b 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1200,16 +1200,6 @@ static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
 {
 	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
 }
-
-static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
-{
-	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
-{
-	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
 #endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 /*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 09/49] x86: Replace PTI_PGTABLE_SWITCH_BIT with PTI_USER_PGTABLE_BIT
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (7 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 08/49] x86: Remove unused kernel_to_user_p4dp() and user_to_kernel_p4dp() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 10/49] x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline Lai Jiangshan
                   ` (41 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Andrew Morton,
	Aneesh Kumar K.V

From: Lai Jiangshan <laijs@linux.alibaba.com>

They are the same in meaning and value.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/pgtable.h | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 65542106464b..c8909457574a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -5,6 +5,7 @@
 #include <linux/mem_encrypt.h>
 #include <asm/page.h>
 #include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
 
 /*
  * Macro to mark a page protection value as UC-
@@ -1164,14 +1165,6 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 static inline int pgd_large(pgd_t pgd) { return 0; }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
- * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
- * the user one is in the last 4k.  To switch between them, you
- * just need to flip the 12th bit in their addresses.
- */
-#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
-
 /*
  * This generates better code than the inline assembly in
  * __set_bit().
@@ -1193,12 +1186,12 @@ static inline void *ptr_clear_bit(void *ptr, int bit)
 
 static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
 {
-	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+	return ptr_set_bit(pgdp, PTI_USER_PGTABLE_BIT);
 }
 
 static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
 {
-	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+	return ptr_clear_bit(pgdp, PTI_USER_PGTABLE_BIT);
 }
 #endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 10/49] x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (8 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 09/49] x86: Replace PTI_PGTABLE_SWITCH_BIT with PTI_USER_PGTABLE_BIT Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 11/49] x86/traps: Move the declaration of native_irq_return_iret into proto.h Lai Jiangshan
                   ` (40 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Ben Widawsky,
	Dave Jiang, Dan Williams, Peter Zijlstra, Kees Cook

From: Lai Jiangshan <laijs@linux.alibaba.com>

We need __native_read_cr3() & native_write_cr3() to be ensured noinstr.

It is prepared for later patches which implement entry code in C file.
Some of the code needs to handle KPTI and has to read/write CR3.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/special_insns.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 68c257a3de0d..fbb057ba60e6 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -42,14 +42,14 @@ static __always_inline void native_write_cr2(unsigned long val)
 	asm volatile("mov %0,%%cr2": : "r" (val) : "memory");
 }
 
-static inline unsigned long __native_read_cr3(void)
+static __always_inline unsigned long __native_read_cr3(void)
 {
 	unsigned long val;
 	asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : __FORCE_ORDER);
 	return val;
 }
 
-static inline void native_write_cr3(unsigned long val)
+static __always_inline void native_write_cr3(unsigned long val)
 {
 	asm volatile("mov %0,%%cr3": : "r" (val) : "memory");
 }
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 11/49] x86/traps: Move the declaration of native_irq_return_iret into proto.h
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (9 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 10/49] x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 12/49] x86/entry: Add arch/x86/entry/entry64.c for C entry code Lai Jiangshan
                   ` (39 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Chang S. Bae,
	Jan Kiszka, Joerg Roedel, Peter Zijlstra

From: Lai Jiangshan <laijs@linux.alibaba.com>

The declaration of native_irq_return_iret is used in exc_double_fault()
only by now.  But it will be used in other place later, so the declaration
is moved to a header file for preparation.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/proto.h | 1 +
 arch/x86/kernel/traps.c      | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index feed36d44d04..33ae276c8b34 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -13,6 +13,7 @@ void syscall_init(void);
 #ifdef CONFIG_X86_64
 void entry_SYSCALL_64(void);
 void entry_SYSCALL_64_safe_stack(void);
+extern unsigned char native_irq_return_iret[];
 long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2);
 #endif
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c9d566dcf89a..1be5c1edad6b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -359,8 +359,6 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
 #endif
 
 #ifdef CONFIG_X86_ESPFIX64
-	extern unsigned char native_irq_return_iret[];
-
 	/*
 	 * If IRET takes a non-IST fault on the espfix64 stack, then we
 	 * end up promoting it to a doublefault.  In that case, take
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 12/49] x86/entry: Add arch/x86/entry/entry64.c for C entry code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (10 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 11/49] x86/traps: Move the declaration of native_irq_return_iret into proto.h Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 13/49] x86/entry: Expose the address of .Lgs_change to entry64.c Lai Jiangshan
                   ` (38 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Add a C file "entry64.c" to deposit C entry code for traps and faults
which will be as the same logic as the existing ASM code in entry_64.S.

The file is as low level as entry_64.S and its code can be running in
the environments that the GS base is user controlled value, or the CR3
is PTI user CR3 or both.

All the code in this file should not be instrumentable.  Many instrument
facilities can be disabled by per-function attributes which are included
in __noinstr_section.  But stack-protector can not be disabled function-
granularly by many versions of GCC that can be supported for compiling
the kernel.  So stack-protector is disabled for the whole file in Makefile.

It is prepared for later patches that implement C version of the entry
code in entry64.c.

Suggested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/Makefile  |  3 ++-
 arch/x86/entry/entry64.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/entry/entry64.c

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index 7fec5dcf6438..792f7009ff32 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -10,13 +10,14 @@ KCOV_INSTRUMENT := n
 CFLAGS_REMOVE_common.o		= $(CC_FLAGS_FTRACE)
 
 CFLAGS_common.o			+= -fno-stack-protector
+CFLAGS_entry64.o		+= -fno-stack-protector
 
 obj-y				:= entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
 obj-y				+= common.o
+obj-$(CONFIG_X86_64)		+= entry64.o
 
 obj-y				+= vdso/
 obj-y				+= vsyscall/
 
 obj-$(CONFIG_IA32_EMULATION)	+= entry_64_compat.o syscall_32.o
 obj-$(CONFIG_X86_X32_ABI)	+= syscall_x32.o
-
diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
new file mode 100644
index 000000000000..762595603ce7
--- /dev/null
+++ b/arch/x86/entry/entry64.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *  Copyright (C) 2000, 2001, 2002  Andi Kleen SuSE Labs
+ *  Copyright (C) 2000  Pavel Machek <pavel@suse.cz>
+ *  Copyright (C) 2021 Lai Jiangshan, Alibaba
+ *
+ * Handle entries and exits for hardware traps and faults.
+ *
+ * It is as low level as entry_64.S and its code can be running in the
+ * environments that the GS base is user controlled value, or the CR3
+ * is PTI user CR3 or both.
+ */
+#include <asm/traps.h>
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 13/49] x86/entry: Expose the address of .Lgs_change to entry64.c
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (11 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 12/49] x86/entry: Add arch/x86/entry/entry64.c for C entry code Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 14/49] x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as switch_to_kernel_cr3() Lai Jiangshan
                   ` (37 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

The address of .Lgs_change will be used in traps.c in later patch when
some entry code is implemented in entry64.c.  So the address of .Lgs_change
is exposed to traps.c for preparation.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c  | 2 ++
 arch/x86/entry/entry_64.S | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 762595603ce7..9813a30dbadb 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -12,3 +12,5 @@
  * is PTI user CR3 or both.
  */
 #include <asm/traps.h>
+
+extern unsigned char asm_load_gs_index_gs_change[];
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 44774cc5bcc9..5db0196835cd 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -733,7 +733,7 @@ _ASM_NOKPROBE(common_interrupt_return)
 SYM_FUNC_START(asm_load_gs_index)
 	FRAME_BEGIN
 	swapgs
-.Lgs_change:
+SYM_INNER_LABEL(asm_load_gs_index_gs_change, SYM_L_GLOBAL)
 	movl	%edi, %gs
 2:	ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
 	swapgs
@@ -742,7 +742,7 @@ SYM_FUNC_START(asm_load_gs_index)
 SYM_FUNC_END(asm_load_gs_index)
 EXPORT_SYMBOL(asm_load_gs_index)
 
-	_ASM_EXTABLE(.Lgs_change, .Lbad_gs)
+	_ASM_EXTABLE(asm_load_gs_index_gs_change, .Lbad_gs)
 	.section .fixup, "ax"
 	/* running with kernelgs */
 SYM_CODE_START_LOCAL_NOALIGN(.Lbad_gs)
@@ -1008,7 +1008,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	cmpq	$asm_load_gs_index_gs_change, RIP+8(%rsp)
 	jne	.Lerror_entry_done_lfence
 
 	/*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 14/49] x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as switch_to_kernel_cr3()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (12 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 13/49] x86/entry: Expose the address of .Lgs_change to entry64.c Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 15/49] x86/traps: Add fence_swapgs_{user,kernel}_entry() Lai Jiangshan
                   ` (36 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

The C version switch_to_kernel_cr3() implements SWITCH_TO_KERNEL_CR3().

No functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 9813a30dbadb..9a5c535b1ddf 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -14,3 +14,27 @@
 #include <asm/traps.h>
 
 extern unsigned char asm_load_gs_index_gs_change[];
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static __always_inline void pti_switch_to_kernel_cr3(unsigned long user_cr3)
+{
+	/*
+	 * Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3
+	 * at kernel pagetables:
+	 */
+	unsigned long cr3 = user_cr3 & ~PTI_USER_PGTABLE_AND_PCID_MASK;
+
+	if (static_cpu_has(X86_FEATURE_PCID))
+		cr3 |= X86_CR3_PCID_NOFLUSH;
+
+	native_write_cr3(cr3);
+}
+
+static __always_inline void switch_to_kernel_cr3(void)
+{
+	if (static_cpu_has(X86_FEATURE_PTI))
+		pti_switch_to_kernel_cr3(__native_read_cr3());
+}
+#else
+static __always_inline void switch_to_kernel_cr3(void) {}
+#endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 15/49] x86/traps: Add fence_swapgs_{user,kernel}_entry()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (13 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 14/49] x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as switch_to_kernel_cr3() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 16/49] x86/entry: Add C user_entry_swapgs_and_fence() Lai Jiangshan
                   ` (35 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Josh Poimboeuf, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

fence_swapgs_{user,kernel}_entry() in entry64.c are the same as
the ASM macro FENCE_SWAPGS_{USER,KERNEL}_ENTRY.

fence_swapgs_user_entry is used in the user entry swapgs code path,
to prevent a speculative swapgs when coming from kernel space.

fence_swapgs_kernel_entry is used in the kernel entry code path,
to prevent the swapgs from getting speculatively skipped when
coming from user space.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 9a5c535b1ddf..bdc9540f25d3 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -38,3 +38,24 @@ static __always_inline void switch_to_kernel_cr3(void)
 #else
 static __always_inline void switch_to_kernel_cr3(void) {}
 #endif
+
+/*
+ * Mitigate Spectre v1 for conditional swapgs code paths.
+ *
+ * fence_swapgs_user_entry is used in the user entry swapgs code path, to
+ * prevent a speculative swapgs when coming from kernel space.  It must be
+ * used with switch_to_kernel_cr3() in the same path.
+ *
+ * fence_swapgs_kernel_entry is used in the kernel entry code path without
+ * CR3 write or with conditinal CR3 write only, to prevent the swapgs from
+ * getting speculatively skipped when coming from user space.
+ */
+static __always_inline void fence_swapgs_user_entry(void)
+{
+	alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_USER);
+}
+
+static __always_inline void fence_swapgs_kernel_entry(void)
+{
+	alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL);
+}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 16/49] x86/entry: Add C user_entry_swapgs_and_fence()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (14 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 15/49] x86/traps: Add fence_swapgs_{user,kernel}_entry() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 17/49] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
                   ` (34 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Josh Poimboeuf, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

The C user_entry_swapgs_and_fence() implements the ASM code:
        swapgs
        FENCE_SWAPGS_USER_ENTRY

It will be used in the user entry swapgs code path,  doing the swapgs and
lfence to prevent a speculative swapgs when coming from kernel space.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index bdc9540f25d3..3db503ea0703 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -49,6 +49,9 @@ static __always_inline void switch_to_kernel_cr3(void) {}
  * fence_swapgs_kernel_entry is used in the kernel entry code path without
  * CR3 write or with conditinal CR3 write only, to prevent the swapgs from
  * getting speculatively skipped when coming from user space.
+ *
+ * user_entry_swapgs_and_fence is a wrapper of swapgs and fence for user entry
+ * code path.
  */
 static __always_inline void fence_swapgs_user_entry(void)
 {
@@ -59,3 +62,9 @@ static __always_inline void fence_swapgs_kernel_entry(void)
 {
 	alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL);
 }
+
+static __always_inline void user_entry_swapgs_and_fence(void)
+{
+	native_swapgs();
+	fence_swapgs_user_entry();
+}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 17/49] x86/traps: Move pt_regs only in fixup_bad_iret()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (15 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 16/49] x86/entry: Add C user_entry_swapgs_and_fence() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 18/49] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
                   ` (33 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Peter Zijlstra, Joerg Roedel, Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

Make fixup_bad_iret() works like sync_regs() which doesn't
move the return address of error_entry().

It is prepared later patch which implements the body of error_entry()
in C code.  The fixup_bad_iret() can't handle return address when it
is called from C code.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S    |  5 ++++-
 arch/x86/include/asm/traps.h |  2 +-
 arch/x86/kernel/traps.c      | 17 ++++++-----------
 3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5db0196835cd..0d81f9b77367 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1045,9 +1045,12 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET.
 	 */
-	mov	%rsp, %rdi
+	popq	%r12				/* save return addr in %12 */
+	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
 	call	fixup_bad_iret
 	mov	%rax, %rsp
+	ENCODE_FRAME_POINTER
+	pushq	%r12
 	jmp	.Lerror_entry_from_usermode_after_swapgs
 SYM_CODE_END(error_entry)
 
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 6221be7cafc3..1cdd7e8bcba7 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -13,7 +13,7 @@
 #ifdef CONFIG_X86_64
 asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
 asmlinkage __visible notrace
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
 void __init trap_init(void);
 asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
 #endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 1be5c1edad6b..4e9d306f313c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -759,13 +759,8 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r
 }
 #endif
 
-struct bad_iret_stack {
-	void *error_entry_ret;
-	struct pt_regs regs;
-};
-
 asmlinkage __visible noinstr
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
 {
 	/*
 	 * This is called from entry_64.S early in handling a fault
@@ -775,19 +770,19 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	 * just below the IRET frame) and we want to pretend that the
 	 * exception came from the IRET target.
 	 */
-	struct bad_iret_stack tmp, *new_stack =
-		(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+	struct pt_regs tmp, *new_stack =
+		(struct pt_regs *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
 
 	/* Copy the IRET target to the temporary storage. */
-	__memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+	__memcpy(&tmp.ip, (void *)bad_regs->sp, 5*8);
 
 	/* Copy the remainder of the stack from the current stack. */
-	__memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+	__memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));
 
 	/* Update the entry stack */
 	__memcpy(new_stack, &tmp, sizeof(tmp));
 
-	BUG_ON(!user_mode(&new_stack->regs));
+	BUG_ON(!user_mode(new_stack));
 	return new_stack;
 }
 #endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 18/49] x86/entry: Switch the stack after error_entry() returns
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (16 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 17/49] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 19/49] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
                   ` (32 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

error_entry() calls sync_regs() to settle/copy the pt_regs and switches
the stack directly after sync_regs().  But because error_entry() is also
called from entry, the switching has to handle the return address together,
which causes the behavior tangly.

Switching to the stack after error_entry() makes the code simpler and
intuitive.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0d81f9b77367..e5d69604322d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
 .macro idtentry_body cfunc has_error_code:req
 
 	call	error_entry
+	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
+	ENCODE_FRAME_POINTER
 	UNWIND_HINT_REGS
 
 	movq	%rsp, %rdi			/* pt_regs pointer into 1st argument*/
@@ -985,14 +987,10 @@ SYM_CODE_START_LOCAL(error_entry)
 	/* We have user CR3.  Change to kernel CR3. */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 
+	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
 	call	sync_regs
-	movq	%rax, %rsp			/* switch stack */
-	ENCODE_FRAME_POINTER
-	pushq	%r12
 	ret
 
 	/*
@@ -1025,6 +1023,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 */
 .Lerror_entry_done_lfence:
 	FENCE_SWAPGS_KERNEL_ENTRY
+	leaq	8(%rsp), %rax			/* return pt_regs pointer */
 	ret
 
 .Lbstep_iret:
@@ -1045,12 +1044,9 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET.
 	 */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
+	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
 	call	fixup_bad_iret
-	mov	%rax, %rsp
-	ENCODE_FRAME_POINTER
-	pushq	%r12
+	mov	%rax, %rdi
 	jmp	.Lerror_entry_from_usermode_after_swapgs
 SYM_CODE_END(error_entry)
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 19/49] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (17 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 18/49] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 20/49] x86/entry: Move cld to the start of idtentry Lai Jiangshan
                   ` (31 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Moving PUSH_AND_CLEAR_REGS out of error_entry doesn't change any
functionality.  It will enlarge the size:

size arch/x86/entry/entry_64.o.before:
   text	   data	    bss	    dec	    hex	filename
  17916	    384	      0	  18300	   477c	arch/x86/entry/entry_64.o

size --format=SysV arch/x86/entry/entry_64.o.before:
.entry.text                      5528      0
.orc_unwind                      6456      0
.orc_unwind_ip                   4304      0

size arch/x86/entry/entry_64.o.after:
   text	   data	    bss	    dec	    hex	filename
  26868	    384	      0	  27252	   6a74	arch/x86/entry/entry_64.o

size --format=SysV arch/x86/entry/entry_64.o.after:
.entry.text                      8200      0
.orc_unwind                     10224      0
.orc_unwind_ip                   6816      0

But .entry.text in x86_64 is 2M aligned, enlarging it to 8.2k doesn't
enlarge the final text size.

The tables .orc_unwind[_ip] are enlarged due to it adds many pushes.

It is prepared for converting the whole error_entry into C code.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e5d69604322d..4781ffbe39ba 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -322,6 +322,9 @@ SYM_CODE_END(ret_from_fork)
  */
 .macro idtentry_body cfunc has_error_code:req
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	call	error_entry
 	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
 	ENCODE_FRAME_POINTER
@@ -973,8 +976,6 @@ SYM_CODE_END(paranoid_exit)
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
 	cld
-	PUSH_AND_CLEAR_REGS save_ret=1
-	ENCODE_FRAME_POINTER 8
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 20/49] x86/entry: Move cld to the start of idtentry
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (18 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 19/49] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 21/49] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
                   ` (30 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Make it next to CLAC

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4781ffbe39ba..09bd77e49249 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -356,6 +356,7 @@ SYM_CODE_END(ret_from_fork)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 	ASM_CLAC
+	cld
 
 	.if \has_error_code == 0
 		pushq	$-1			/* ORIG_RAX: no syscall to restart */
@@ -423,6 +424,7 @@ SYM_CODE_END(\asmsym)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
 	ASM_CLAC
+	cld
 
 	pushq	$-1			/* ORIG_RAX: no syscall to restart */
 
@@ -478,6 +480,7 @@ SYM_CODE_END(\asmsym)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
 	ASM_CLAC
+	cld
 
 	/*
 	 * If the entry is from userspace, switch stacks and treat it as
@@ -539,6 +542,7 @@ SYM_CODE_END(\asmsym)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=8
 	ASM_CLAC
+	cld
 
 	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
 	call	paranoid_entry
@@ -853,7 +857,6 @@ SYM_CODE_END(xen_failsafe_callback)
  */
 SYM_CODE_START_LOCAL(paranoid_entry)
 	UNWIND_HINT_FUNC
-	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
 
@@ -975,7 +978,6 @@ SYM_CODE_END(paranoid_exit)
  */
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
-	cld
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
 
@@ -1109,6 +1111,7 @@ SYM_CODE_START(asm_exc_nmi)
 	 */
 
 	ASM_CLAC
+	cld
 
 	/* Use %rdx as our temp variable throughout */
 	pushq	%rdx
@@ -1128,7 +1131,6 @@ SYM_CODE_START(asm_exc_nmi)
 	 */
 
 	swapgs
-	cld
 	FENCE_SWAPGS_USER_ENTRY
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 21/49] x86/entry: Don't call error_entry for XENPV
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (19 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 20/49] x86/entry: Move cld to the start of idtentry Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 22/49] x86/entry: Convert SWAPGS to swapgs in error_entry() Lai Jiangshan
                   ` (29 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

When in XENPV, it is already in the task stack, and it can't fault
at native_irq_return_iret nor asm_load_gs_index_gs_change since
XENPV uses its own pvops for iret and load_gs_index().  And it
doesn't need to switch CR3.  So it can skip invoking error_entry().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 09bd77e49249..c09e5a4dfbbf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -325,8 +325,17 @@ SYM_CODE_END(ret_from_fork)
 	PUSH_AND_CLEAR_REGS
 	ENCODE_FRAME_POINTER
 
-	call	error_entry
-	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
+	/*
+	 * Call error_entry and switch stack settled by sync_regs().
+	 *
+	 * When in XENPV, it is already in the task stack, and it can't fault
+	 * at native_irq_return_iret nor asm_load_gs_index_gs_change since
+	 * XENPV uses its own pvops for iret and load_gs_index().  And it
+	 * doesn't need to switch CR3.  So it can skip invoking error_entry().
+	 */
+	ALTERNATIVE "call error_entry; movq %rax, %rsp", \
+		"", X86_FEATURE_XENPV
+
 	ENCODE_FRAME_POINTER
 	UNWIND_HINT_REGS
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 22/49] x86/entry: Convert SWAPGS to swapgs in error_entry()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (20 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 21/49] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 23/49] x86/entry: Implement the whole error_entry() as C code Lai Jiangshan
                   ` (28 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

XENPV doesn't use error_entry() anymore, so the pv-aware SWAPGS can be
changed to native swapgs.

It is prepared for later patch to convert error_entry() to C code, which
uses native_swapgs() directly.  Converting SWAPGS to swapgs in ASM
error_entry first to ensure the later patch has zero semantic change.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c09e5a4dfbbf..4d88cd0c46c6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -994,7 +994,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * We entered from user mode or we're pretending to have entered
 	 * from user mode due to an IRET fault.
 	 */
-	SWAPGS
+	swapgs
 	FENCE_SWAPGS_USER_ENTRY
 	/* We have user CR3.  Change to kernel CR3. */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
@@ -1026,7 +1026,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * gsbase and proceed.  We'll fix up the exception and land in
 	 * .Lgs_change's error handler with kernel gsbase.
 	 */
-	SWAPGS
+	swapgs
 
 	/*
 	 * The above code has no serializing instruction.  So do an lfence
@@ -1048,7 +1048,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * We came from an IRET to user mode, so we have user
 	 * gsbase and CR3.  Switch to kernel gsbase and CR3:
 	 */
-	SWAPGS
+	swapgs
 	FENCE_SWAPGS_USER_ENTRY
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 23/49] x86/entry: Implement the whole error_entry() as C code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (21 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 22/49] x86/entry: Convert SWAPGS to swapgs in error_entry() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 24/49] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
                   ` (27 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Peter Zijlstra

From: Lai Jiangshan <laijs@linux.alibaba.com>

All the needed facilities are set in entry64.c, the whole error_entry()
can be implemented in C in entry64.c.  The C version generally has better
readability and easier to be updated/improved.

No function change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c     | 68 ++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S    | 82 +-----------------------------------
 arch/x86/include/asm/traps.h |  1 +
 3 files changed, 70 insertions(+), 81 deletions(-)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 3db503ea0703..0dc63ae8153a 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -68,3 +68,71 @@ static __always_inline void user_entry_swapgs_and_fence(void)
 	native_swapgs();
 	fence_swapgs_user_entry();
 }
+
+/*
+ * Put pt_regs onto the task stack and switch GS and CR3 if needed.
+ * The actual stack switch is done in entry_64.S.
+ *
+ * Be careful, it might be in the user CR3 and user GS base at the start
+ * of the function.
+ */
+asmlinkage __visible __entry_text
+struct pt_regs *error_entry(struct pt_regs *eregs)
+{
+	unsigned long iret_ip = (unsigned long)native_irq_return_iret;
+
+	if (user_mode(eregs)) {
+		/*
+		 * We entered from user mode.
+		 * Switch to kernel gsbase and CR3.
+		 */
+		user_entry_swapgs_and_fence();
+		switch_to_kernel_cr3();
+
+		/* Put pt_regs onto the task stack. */
+		return sync_regs(eregs);
+	}
+
+	/*
+	 * There are two places in the kernel that can potentially fault with
+	 * usergs. Handle them here.  B stepping K8s sometimes report a
+	 * truncated RIP for IRET exceptions returning to compat mode. Check
+	 * for these here too.
+	 */
+	if ((eregs->ip == iret_ip) || (eregs->ip == (unsigned int)iret_ip)) {
+		eregs->ip = iret_ip; /* Fix truncated RIP */
+
+		/*
+		 * We came from an IRET to user mode, so we have user
+		 * gsbase and CR3.  Switch to kernel gsbase and CR3:
+		 */
+		user_entry_swapgs_and_fence();
+		switch_to_kernel_cr3();
+
+		/*
+		 * Pretend that the exception came from user mode: set up
+		 * pt_regs as if we faulted immediately after IRET and put
+		 * pt_regs onto the real task stack.
+		 */
+		return sync_regs(fixup_bad_iret(eregs));
+	}
+
+	/*
+	 * Hack: asm_load_gs_index_gs_change can fail with user gsbase.
+	 * If this happens, fix up gsbase and proceed.  We'll fix up the
+	 * exception and land in asm_load_gs_index_gs_change's error
+	 * handler with kernel gsbase.
+	 */
+	if (eregs->ip == (unsigned long)asm_load_gs_index_gs_change)
+		native_swapgs();
+
+	/*
+	 * The above code has no serializing instruction.  So do an lfence
+	 * to prevent GS speculation, regardless of whether it is kernel
+	 * gsbase or user gsbase.
+	 */
+	fence_swapgs_kernel_entry();
+
+	/* Enter from kernel, don't move pt_regs */
+	return eregs;
+}
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4d88cd0c46c6..16b2215bdb23 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -333,7 +333,7 @@ SYM_CODE_END(ret_from_fork)
 	 * XENPV uses its own pvops for iret and load_gs_index().  And it
 	 * doesn't need to switch CR3.  So it can skip invoking error_entry().
 	 */
-	ALTERNATIVE "call error_entry; movq %rax, %rsp", \
+	ALTERNATIVE "movq %rsp, %rdi; call error_entry; movq %rax, %rsp", \
 		"", X86_FEATURE_XENPV
 
 	ENCODE_FRAME_POINTER
@@ -982,86 +982,6 @@ SYM_CODE_START_LOCAL(paranoid_exit)
 	jmp		restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
-/*
- * Save all registers in pt_regs, and switch GS if needed.
- */
-SYM_CODE_START_LOCAL(error_entry)
-	UNWIND_HINT_FUNC
-	testb	$3, CS+8(%rsp)
-	jz	.Lerror_kernelspace
-
-	/*
-	 * We entered from user mode or we're pretending to have entered
-	 * from user mode due to an IRET fault.
-	 */
-	swapgs
-	FENCE_SWAPGS_USER_ENTRY
-	/* We have user CR3.  Change to kernel CR3. */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
-
-	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
-.Lerror_entry_from_usermode_after_swapgs:
-	/* Put us onto the real thread stack. */
-	call	sync_regs
-	ret
-
-	/*
-	 * There are two places in the kernel that can potentially fault with
-	 * usergs. Handle them here.  B stepping K8s sometimes report a
-	 * truncated RIP for IRET exceptions returning to compat mode. Check
-	 * for these here too.
-	 */
-.Lerror_kernelspace:
-	leaq	native_irq_return_iret(%rip), %rcx
-	cmpq	%rcx, RIP+8(%rsp)
-	je	.Lerror_bad_iret
-	movl	%ecx, %eax			/* zero extend */
-	cmpq	%rax, RIP+8(%rsp)
-	je	.Lbstep_iret
-	cmpq	$asm_load_gs_index_gs_change, RIP+8(%rsp)
-	jne	.Lerror_entry_done_lfence
-
-	/*
-	 * hack: .Lgs_change can fail with user gsbase.  If this happens, fix up
-	 * gsbase and proceed.  We'll fix up the exception and land in
-	 * .Lgs_change's error handler with kernel gsbase.
-	 */
-	swapgs
-
-	/*
-	 * The above code has no serializing instruction.  So do an lfence
-	 * to prevent GS speculation, regardless of whether it is kernel
-	 * gsbase or user gsbase.
-	 */
-.Lerror_entry_done_lfence:
-	FENCE_SWAPGS_KERNEL_ENTRY
-	leaq	8(%rsp), %rax			/* return pt_regs pointer */
-	ret
-
-.Lbstep_iret:
-	/* Fix truncated RIP */
-	movq	%rcx, RIP+8(%rsp)
-	/* fall through */
-
-.Lerror_bad_iret:
-	/*
-	 * We came from an IRET to user mode, so we have user
-	 * gsbase and CR3.  Switch to kernel gsbase and CR3:
-	 */
-	swapgs
-	FENCE_SWAPGS_USER_ENTRY
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
-
-	/*
-	 * Pretend that the exception came from user mode: set up pt_regs
-	 * as if we faulted immediately after IRET.
-	 */
-	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
-	call	fixup_bad_iret
-	mov	%rax, %rdi
-	jmp	.Lerror_entry_from_usermode_after_swapgs
-SYM_CODE_END(error_entry)
-
 SYM_CODE_START_LOCAL(error_return)
 	UNWIND_HINT_REGS
 	DEBUG_ENTRY_ASSERT_IRQS_OFF
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 1cdd7e8bcba7..686461ac9803 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -14,6 +14,7 @@
 asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
 asmlinkage __visible notrace
 struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
+asmlinkage __visible notrace struct pt_regs *error_entry(struct pt_regs *eregs);
 void __init trap_init(void);
 asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
 #endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 24/49] x86/entry: Use idtentry macro for entry_INT80_compat
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (22 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 23/49] x86/entry: Implement the whole error_entry() as C code Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 25/49] x86/entry: Convert SWAPGS to swapgs in entry_SYSENTER_compat() Lai Jiangshan
                   ` (26 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel, Chang S. Bae, Jan Kiszka

From: Lai Jiangshan <laijs@linux.alibaba.com>

entry_INT80_compat is identical to idtentry macro except a special
handling for %rax in the prolog.

Add the prolog to idtentry and use idtentry for entry_INT80_compat.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S        |  18 ++++++
 arch/x86/entry/entry_64_compat.S | 102 -------------------------------
 arch/x86/include/asm/idtentry.h  |  47 ++++++++++++++
 arch/x86/include/asm/proto.h     |   4 --
 4 files changed, 65 insertions(+), 106 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 16b2215bdb23..dd0cb43627a3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -371,6 +371,24 @@ SYM_CODE_START(\asmsym)
 		pushq	$-1			/* ORIG_RAX: no syscall to restart */
 	.endif
 
+	.if \vector == IA32_SYSCALL_VECTOR
+		/*
+		 * User tracing code (ptrace or signal handlers) might assume
+		 * that the saved RAX contains a 32-bit number when we're
+		 * invoking a 32-bit syscall.  Just in case the high bits are
+		 * nonzero, zero-extend the syscall number.  (This could almost
+		 * certainly be deleted with no ill effects.)
+		 */
+		movl	%eax, %eax
+
+		/*
+		 * do_int80_syscall_32() expects regs->orig_ax to be user ax,
+		 * and regs->ax to be $-ENOSYS.
+		 */
+		movq	%rax, (%rsp)
+		movq	$-ENOSYS, %rax
+	.endif
+
 	.if \vector == X86_TRAP_BP
 		/*
 		 * If coming from kernel space, create a 6-word gap to allow the
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 0051cf5c792d..a4fcea0cab14 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -311,105 +311,3 @@ sysret32_from_system_call:
 	swapgs
 	sysretl
 SYM_CODE_END(entry_SYSCALL_compat)
-
-/*
- * 32-bit legacy system call entry.
- *
- * 32-bit x86 Linux system calls traditionally used the INT $0x80
- * instruction.  INT $0x80 lands here.
- *
- * This entry point can be used by 32-bit and 64-bit programs to perform
- * 32-bit system calls.  Instances of INT $0x80 can be found inline in
- * various programs and libraries.  It is also used by the vDSO's
- * __kernel_vsyscall fallback for hardware that doesn't support a faster
- * entry method.  Restarted 32-bit system calls also fall back to INT
- * $0x80 regardless of what instruction was originally used to do the
- * system call.
- *
- * This is considered a slow path.  It is not used by most libc
- * implementations on modern hardware except during process startup.
- *
- * Arguments:
- * eax  system call number
- * ebx  arg1
- * ecx  arg2
- * edx  arg3
- * esi  arg4
- * edi  arg5
- * ebp  arg6
- */
-SYM_CODE_START(entry_INT80_compat)
-	UNWIND_HINT_EMPTY
-	/*
-	 * Interrupts are off on entry.
-	 */
-	ASM_CLAC			/* Do this early to minimize exposure */
-	SWAPGS
-
-	/*
-	 * User tracing code (ptrace or signal handlers) might assume that
-	 * the saved RAX contains a 32-bit number when we're invoking a 32-bit
-	 * syscall.  Just in case the high bits are nonzero, zero-extend
-	 * the syscall number.  (This could almost certainly be deleted
-	 * with no ill effects.)
-	 */
-	movl	%eax, %eax
-
-	/* switch to thread stack expects orig_ax and rdi to be pushed */
-	pushq	%rax			/* pt_regs->orig_ax */
-	pushq	%rdi			/* pt_regs->di */
-
-	/* Need to switch before accessing the thread stack. */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
-
-	/* In the Xen PV case we already run on the thread stack. */
-	ALTERNATIVE "", "jmp .Lint80_keep_stack", X86_FEATURE_XENPV
-
-	movq	%rsp, %rdi
-	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-
-	pushq	6*8(%rdi)		/* regs->ss */
-	pushq	5*8(%rdi)		/* regs->rsp */
-	pushq	4*8(%rdi)		/* regs->eflags */
-	pushq	3*8(%rdi)		/* regs->cs */
-	pushq	2*8(%rdi)		/* regs->ip */
-	pushq	1*8(%rdi)		/* regs->orig_ax */
-	pushq	(%rdi)			/* pt_regs->di */
-.Lint80_keep_stack:
-
-	pushq	%rsi			/* pt_regs->si */
-	xorl	%esi, %esi		/* nospec   si */
-	pushq	%rdx			/* pt_regs->dx */
-	xorl	%edx, %edx		/* nospec   dx */
-	pushq	%rcx			/* pt_regs->cx */
-	xorl	%ecx, %ecx		/* nospec   cx */
-	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   %r8			/* pt_regs->r8 */
-	xorl	%r8d, %r8d		/* nospec   r8 */
-	pushq   %r9			/* pt_regs->r9 */
-	xorl	%r9d, %r9d		/* nospec   r9 */
-	pushq   %r10			/* pt_regs->r10*/
-	xorl	%r10d, %r10d		/* nospec   r10 */
-	pushq   %r11			/* pt_regs->r11 */
-	xorl	%r11d, %r11d		/* nospec   r11 */
-	pushq   %rbx                    /* pt_regs->rbx */
-	xorl	%ebx, %ebx		/* nospec   rbx */
-	pushq   %rbp                    /* pt_regs->rbp */
-	xorl	%ebp, %ebp		/* nospec   rbp */
-	pushq   %r12                    /* pt_regs->r12 */
-	xorl	%r12d, %r12d		/* nospec   r12 */
-	pushq   %r13                    /* pt_regs->r13 */
-	xorl	%r13d, %r13d		/* nospec   r13 */
-	pushq   %r14                    /* pt_regs->r14 */
-	xorl	%r14d, %r14d		/* nospec   r14 */
-	pushq   %r15                    /* pt_regs->r15 */
-	xorl	%r15d, %r15d		/* nospec   r15 */
-
-	UNWIND_HINT_REGS
-
-	cld
-
-	movq	%rsp, %rdi
-	call	do_int80_syscall_32
-	jmp	swapgs_restore_regs_and_return_to_usermode
-SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 6779def97591..49fabc3e3f0d 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -207,6 +207,20 @@ __visible noinstr void func(struct pt_regs *regs,			\
 									\
 static noinline void __##func(struct pt_regs *regs, u32 vector)
 
+/**
+ * DECLARE_IDTENTRY_IA32_EMULATION - Declare functions for int80
+ * @vector:	Vector number (ignored for C)
+ * @asm_func:	Function name of the entry point
+ * @cfunc:	The C handler called from the ASM entry point (ignored for C)
+ *
+ * Declares two functions:
+ * - The ASM entry point: asm_func
+ * - The XEN PV trap entry point: xen_##asm_func (maybe unused)
+ */
+#define DECLARE_IDTENTRY_IA32_EMULATION(vector, asm_func, cfunc)	\
+	asmlinkage void asm_func(void);					\
+	asmlinkage void xen_##asm_func(void)
+
 /**
  * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
  * @vector:	Vector number (ignored for C)
@@ -433,6 +447,35 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #define DECLARE_IDTENTRY_ERRORCODE(vector, func)			\
 	idtentry vector asm_##func func has_error_code=1
 
+/*
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction.  INT $0x80 lands here.
+ *
+ * This entry point can be used by 32-bit and 64-bit programs to perform
+ * 32-bit system calls.  Instances of INT $0x80 can be found inline in
+ * various programs and libraries.  It is also used by the vDSO's
+ * __kernel_vsyscall fallback for hardware that doesn't support a faster
+ * entry method.  Restarted 32-bit system calls also fall back to INT
+ * $0x80 regardless of what instruction was originally used to do the
+ * system call.
+ *
+ * This is considered a slow path.  It is not used by most libc
+ * implementations on modern hardware except during process startup.
+ *
+ * Arguments:
+ * eax  system call number
+ * ebx  arg1
+ * ecx  arg2
+ * edx  arg3
+ * esi  arg4
+ * edi  arg5
+ * ebp  arg6
+ */
+#define DECLARE_IDTENTRY_IA32_EMULATION(vector, asm_func, cfunc)	\
+	idtentry vector asm_func cfunc has_error_code=0
+
 /* Special case for 32bit IRET 'trap'. Do not emit ASM code */
 #define DECLARE_IDTENTRY_SW(vector, func)
 
@@ -634,6 +677,10 @@ DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	common_interrupt);
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	spurious_interrupt);
 #endif
 
+#ifdef CONFIG_IA32_EMULATION
+DECLARE_IDTENTRY_IA32_EMULATION(IA32_SYSCALL_VECTOR,	entry_INT80_compat, do_int80_syscall_32);
+#endif
+
 /* System vector entry points */
 #ifdef CONFIG_X86_LOCAL_APIC
 DECLARE_IDTENTRY_SYSVEC(ERROR_APIC_VECTOR,		sysvec_error_interrupt);
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 33ae276c8b34..597c767091cb 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -29,10 +29,6 @@ void entry_SYSENTER_compat(void);
 void __end_entry_SYSENTER_compat(void);
 void entry_SYSCALL_compat(void);
 void entry_SYSCALL_compat_safe_stack(void);
-void entry_INT80_compat(void);
-#ifdef CONFIG_XEN_PV
-void xen_entry_INT80_compat(void);
-#endif
 #endif
 
 void x86_configure_nx(void);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 25/49] x86/entry: Convert SWAPGS to swapgs in entry_SYSENTER_compat()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (23 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 24/49] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 26/49] x86: Remove the definition of SWAPGS Lai Jiangshan
                   ` (25 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

XENPV has its own entry point for SYSENTER, it doesn't use
entry_SYSENTER_compat.  So the pv-awared SWAPGS can be changed to
swapgs.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64_compat.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index a4fcea0cab14..72e017c3941f 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,7 +49,7 @@
 SYM_CODE_START(entry_SYSENTER_compat)
 	UNWIND_HINT_EMPTY
 	/* Interrupts are off on entry. */
-	SWAPGS
+	swapgs
 
 	pushq	%rax
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 26/49] x86: Remove the definition of SWAPGS
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (24 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 25/49] x86/entry: Convert SWAPGS to swapgs in entry_SYSENTER_compat() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 27/49] x86/entry: Make paranoid_exit() callable Lai Jiangshan
                   ` (24 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel)

From: Lai Jiangshan <laijs@linux.alibaba.com>

There is no user of the pv-aware SWAPGS anymore.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/irqflags.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c5ce9845c999..da41a80eb912 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -139,14 +139,6 @@ static __always_inline void arch_local_irq_restore(unsigned long flags)
 	if (!arch_irqs_disabled_flags(flags))
 		arch_local_irq_enable();
 }
-#else
-#ifdef CONFIG_X86_64
-#ifdef CONFIG_XEN_PV
-#define SWAPGS	ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
-#else
-#define SWAPGS	swapgs
-#endif
-#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 27/49] x86/entry: Make paranoid_exit() callable
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (25 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 26/49] x86: Remove the definition of SWAPGS Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 28/49] x86/entry: Call paranoid_exit() in asm_exc_nmi() Lai Jiangshan
                   ` (23 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

Move the last JMP out of paranoid_exit() and make it callable.

Allow paranoid_exit() to be re-written in C later and also allow
asm_exc_nmi() to call it to avoid duplicated code.

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index dd0cb43627a3..897892dc563c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -471,7 +471,8 @@ SYM_CODE_START(\asmsym)
 
 	call	\cfunc
 
-	jmp	paranoid_exit
+	call	paranoid_exit
+	jmp	restore_regs_and_return_to_kernel
 
 	/* Switch to the regular task stack and use the noist entry point */
 .Lfrom_usermode_switch_stack_\@:
@@ -549,7 +550,8 @@ SYM_CODE_START(\asmsym)
 	 * identical to the stack in the IRET frame or the VC fall-back stack,
 	 * so it is definitely mapped even with PTI enabled.
 	 */
-	jmp	paranoid_exit
+	call	paranoid_exit
+	jmp	restore_regs_and_return_to_kernel
 
 	/* Switch to the regular task stack */
 .Lfrom_usermode_switch_stack_\@:
@@ -580,7 +582,8 @@ SYM_CODE_START(\asmsym)
 	movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
 	call	\cfunc
 
-	jmp	paranoid_exit
+	call	paranoid_exit
+	jmp	restore_regs_and_return_to_kernel
 
 _ASM_NOKPROBE(\asmsym)
 SYM_CODE_END(\asmsym)
@@ -972,7 +975,7 @@ SYM_CODE_END(paranoid_entry)
  *     Y        User space GSBASE, must be restored unconditionally
  */
 SYM_CODE_START_LOCAL(paranoid_exit)
-	UNWIND_HINT_REGS
+	UNWIND_HINT_REGS offset=8
 	/*
 	 * The order of operations is important. RESTORE_CR3 requires
 	 * kernel GSBASE.
@@ -988,16 +991,17 @@ SYM_CODE_START_LOCAL(paranoid_exit)
 
 	/* With FSGSBASE enabled, unconditionally restore GSBASE */
 	wrgsbase	%rbx
-	jmp		restore_regs_and_return_to_kernel
+	ret
 
 .Lparanoid_exit_checkgs:
 	/* On non-FSGSBASE systems, conditionally do SWAPGS */
 	testl		%ebx, %ebx
-	jnz		restore_regs_and_return_to_kernel
+	jnz		.Lparanoid_exit_done
 
 	/* We are returning to a context with user GSBASE */
 	swapgs
-	jmp		restore_regs_and_return_to_kernel
+.Lparanoid_exit_done:
+	ret
 SYM_CODE_END(paranoid_exit)
 
 SYM_CODE_START_LOCAL(error_return)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 28/49] x86/entry: Call paranoid_exit() in asm_exc_nmi()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (26 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 27/49] x86/entry: Make paranoid_exit() callable Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 29/49] x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry Lai Jiangshan
                   ` (22 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

The code between "call exc_nmi" and nmi_restore is as the same as
paranoid_exit(), so we can just use paranoid_exit() instead of the open
duplicated code.

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 34 +++++-----------------------------
 1 file changed, 5 insertions(+), 29 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 897892dc563c..9fda034f3e92 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -957,8 +957,7 @@ SYM_CODE_END(paranoid_entry)
 
 /*
  * "Paranoid" exit path from exception stack.  This is invoked
- * only on return from non-NMI IST interrupts that came
- * from kernel space.
+ * only on return from IST interrupts that came from kernel space.
  *
  * We may be returning to very strange contexts (e.g. very early
  * in syscall entry), so checking for preemption here would
@@ -1306,11 +1305,7 @@ end_repeat_nmi:
 	pushq	$-1				/* ORIG_RAX: no syscall to restart */
 
 	/*
-	 * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
-	 * as we should not be calling schedule in NMI context.
-	 * Even with normal interrupts enabled. An NMI should not be
-	 * setting NEED_RESCHED or anything that normal interrupts and
-	 * exceptions might do.
+	 * Use paranoid_entry to handle SWAPGS and CR3.
 	 */
 	call	paranoid_entry
 	UNWIND_HINT_REGS
@@ -1319,31 +1314,12 @@ end_repeat_nmi:
 	movq	$-1, %rsi
 	call	exc_nmi
 
-	/* Always restore stashed CR3 value (see paranoid_entry) */
-	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
-
 	/*
-	 * The above invocation of paranoid_entry stored the GSBASE
-	 * related information in R/EBX depending on the availability
-	 * of FSGSBASE.
-	 *
-	 * If FSGSBASE is enabled, restore the saved GSBASE value
-	 * unconditionally, otherwise take the conditional SWAPGS path.
+	 * Use paranoid_exit to handle SWAPGS and CR3, but no need to use
+	 * restore_regs_and_return_to_kernel as we must handle nested NMI.
 	 */
-	ALTERNATIVE "jmp nmi_no_fsgsbase", "", X86_FEATURE_FSGSBASE
-
-	wrgsbase	%rbx
-	jmp	nmi_restore
-
-nmi_no_fsgsbase:
-	/* EBX == 0 -> invoke SWAPGS */
-	testl	%ebx, %ebx
-	jnz	nmi_restore
-
-nmi_swapgs:
-	swapgs
+	call	paranoid_exit
 
-nmi_restore:
 	POP_REGS
 
 	/*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 29/49] x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (27 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 28/49] x86/entry: Call paranoid_exit() in asm_exc_nmi() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 30/49] x86/entry: Add the C version ist_switch_to_kernel_cr3() Lai Jiangshan
                   ` (21 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

It is prepared for converting the whole paranoid_entry() into C code.

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9fda034f3e92..bd5e005316a3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -322,9 +322,6 @@ SYM_CODE_END(ret_from_fork)
  */
 .macro idtentry_body cfunc has_error_code:req
 
-	PUSH_AND_CLEAR_REGS
-	ENCODE_FRAME_POINTER
-
 	/*
 	 * Call error_entry and switch stack settled by sync_regs().
 	 *
@@ -403,6 +400,9 @@ SYM_CODE_START(\asmsym)
 .Lfrom_usermode_no_gap_\@:
 	.endif
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	idtentry_body \cfunc \has_error_code
 
 _ASM_NOKPROBE(\asmsym)
@@ -455,11 +455,14 @@ SYM_CODE_START(\asmsym)
 
 	pushq	$-1			/* ORIG_RAX: no syscall to restart */
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	/*
 	 * If the entry is from userspace, switch stacks and treat it as
 	 * a normal entry.
 	 */
-	testb	$3, CS-ORIG_RAX(%rsp)
+	testb	$3, CS(%rsp)
 	jnz	.Lfrom_usermode_switch_stack_\@
 
 	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
@@ -510,11 +513,14 @@ SYM_CODE_START(\asmsym)
 	ASM_CLAC
 	cld
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	/*
 	 * If the entry is from userspace, switch stacks and treat it as
 	 * a normal entry.
 	 */
-	testb	$3, CS-ORIG_RAX(%rsp)
+	testb	$3, CS(%rsp)
 	jnz	.Lfrom_usermode_switch_stack_\@
 
 	/*
@@ -573,6 +579,9 @@ SYM_CODE_START(\asmsym)
 	ASM_CLAC
 	cld
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
 	call	paranoid_entry
 	UNWIND_HINT_REGS
@@ -887,8 +896,6 @@ SYM_CODE_END(xen_failsafe_callback)
  */
 SYM_CODE_START_LOCAL(paranoid_entry)
 	UNWIND_HINT_FUNC
-	PUSH_AND_CLEAR_REGS save_ret=1
-	ENCODE_FRAME_POINTER 8
 
 	/*
 	 * Always stash CR3 in %r14.  This value will be restored,
@@ -1304,6 +1311,9 @@ end_repeat_nmi:
 	 */
 	pushq	$-1				/* ORIG_RAX: no syscall to restart */
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	/*
 	 * Use paranoid_entry to handle SWAPGS and CR3.
 	 */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 30/49] x86/entry: Add the C version ist_switch_to_kernel_cr3()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (28 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 29/49] x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 31/49] x86/entry: Skip CR3 write when the saved CR3 is kernel CR3 in RESTORE_CR3 Lai Jiangshan
                   ` (20 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

It switches the CR3 to kernel CR3 and returns the original CR3, and
the caller should save the return value.

It is the C version of SAVE_AND_SWITCH_TO_KERNEL_CR3.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 0dc63ae8153a..283bd685a275 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -35,8 +35,23 @@ static __always_inline void switch_to_kernel_cr3(void)
 	if (static_cpu_has(X86_FEATURE_PTI))
 		pti_switch_to_kernel_cr3(__native_read_cr3());
 }
+
+static __always_inline unsigned long ist_switch_to_kernel_cr3(void)
+{
+	unsigned long cr3 = 0;
+
+	if (static_cpu_has(X86_FEATURE_PTI)) {
+		cr3 = __native_read_cr3();
+
+		if (cr3 & PTI_USER_PGTABLE_MASK)
+			pti_switch_to_kernel_cr3(cr3);
+	}
+
+	return cr3;
+}
 #else
 static __always_inline void switch_to_kernel_cr3(void) {}
+static __always_inline unsigned long ist_switch_to_kernel_cr3(void) { return 0; }
 #endif
 
 /*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 31/49] x86/entry: Skip CR3 write when the saved CR3 is kernel CR3 in RESTORE_CR3
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (29 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 30/49] x86/entry: Add the C version ist_switch_to_kernel_cr3() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 32/49] x86/entry: Add the C version ist_restore_cr3() Lai Jiangshan
                   ` (19 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Peter Zijlstra, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

When the original CR3 is kernel CR3, paranoid_entry() hasn't changed
the CR3, so the CR3 doesn't need to restored when paranoid_exit() in
the this case.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/calling.h | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 996b041e92d2..9065c31d2875 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -231,14 +231,11 @@ For 32-bit we have the following conventions - kernel is built with
 .macro RESTORE_CR3 scratch_reg:req save_reg:req
 	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
 
-	ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
-
-	/*
-	 * KERNEL pages can always resume with NOFLUSH as we do
-	 * explicit flushes.
-	 */
+	/* No need to restore when the saved CR3 is kernel CR3. */
 	bt	$PTI_USER_PGTABLE_BIT, \save_reg
-	jnc	.Lnoflush_\@
+	jnc	.Lend_\@
+
+	ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
 
 	/*
 	 * Check if there's a pending flush for the user ASID we're
@@ -256,10 +253,6 @@ For 32-bit we have the following conventions - kernel is built with
 	SET_NOFLUSH_BIT \save_reg
 
 .Lwrcr3_\@:
-	/*
-	 * The CR3 write could be avoided when not changing its value,
-	 * but would require a CR3 read *and* a scratch register.
-	 */
 	movq	\save_reg, %cr3
 .Lend_\@:
 .endm
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 32/49] x86/entry: Add the C version ist_restore_cr3()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (30 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 31/49] x86/entry: Skip CR3 write when the saved CR3 is kernel CR3 in RESTORE_CR3 Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 33/49] x86/entry: Add the C version get_percpu_base() Lai Jiangshan
                   ` (18 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Peter Zijlstra, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

It implements the C version of RESTORE_CR3().

Not functional difference intended except the ASM code uses bit test
and clear operations while the C version uses mask check and 'AND'
operations.  The resulted asm code of both versions are very similar.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 283bd685a275..5f47221d8935 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -11,6 +11,7 @@
  * environments that the GS base is user controlled value, or the CR3
  * is PTI user CR3 or both.
  */
+#include <asm/tlbflush.h>
 #include <asm/traps.h>
 
 extern unsigned char asm_load_gs_index_gs_change[];
@@ -30,6 +31,26 @@ static __always_inline void pti_switch_to_kernel_cr3(unsigned long user_cr3)
 	native_write_cr3(cr3);
 }
 
+static __always_inline void pti_switch_to_user_cr3(unsigned long user_cr3)
+{
+#define KERN_PCID_MASK (CR3_PCID_MASK & ~PTI_USER_PCID_MASK)
+
+	if (static_cpu_has(X86_FEATURE_PCID)) {
+		int pcid = user_cr3 & KERN_PCID_MASK;
+		unsigned short pcid_mask = 1ull << pcid;
+
+		/*
+		 * Check if there's a pending flush for the user ASID we're
+		 * about to set.
+		 */
+		if (!(this_cpu_read(cpu_tlbstate.user_pcid_flush_mask) & pcid_mask))
+			user_cr3 |= X86_CR3_PCID_NOFLUSH;
+		else
+			this_cpu_and(cpu_tlbstate.user_pcid_flush_mask, ~pcid_mask);
+	}
+	native_write_cr3(user_cr3);
+}
+
 static __always_inline void switch_to_kernel_cr3(void)
 {
 	if (static_cpu_has(X86_FEATURE_PTI))
@@ -49,9 +70,20 @@ static __always_inline unsigned long ist_switch_to_kernel_cr3(void)
 
 	return cr3;
 }
+
+static __always_inline void ist_restore_cr3(unsigned long cr3)
+{
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	/* No need to restore when @cr3 is kernel CR3. */
+	if (cr3 & PTI_USER_PGTABLE_MASK)
+		pti_switch_to_user_cr3(cr3);
+}
 #else
 static __always_inline void switch_to_kernel_cr3(void) {}
 static __always_inline unsigned long ist_switch_to_kernel_cr3(void) { return 0; }
+static __always_inline void ist_restore_cr3(unsigned long cr3) {}
 #endif
 
 /*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 33/49] x86/entry: Add the C version get_percpu_base()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (31 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 32/49] x86/entry: Add the C version ist_restore_cr3() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 34/49] x86/entry: Add the C version ist_switch_to_kernel_gsbase() Lai Jiangshan
                   ` (17 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

It implements the C version of asm macro GET_PERCPU_BASE().

Not functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 5f47221d8935..3ec145c38e9e 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -183,3 +183,39 @@ struct pt_regs *error_entry(struct pt_regs *eregs)
 	/* Enter from kernel, don't move pt_regs */
 	return eregs;
 }
+
+#ifdef CONFIG_SMP
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ *
+ * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
+ * may not restore the host's value until the CPU returns to userspace.
+ * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
+ * while running KVM's run loop.
+ */
+static __always_inline unsigned int gdt_get_cpu(void)
+{
+	unsigned int p;
+
+	asm ("lsl %[seg],%[p]" : [p] "=a" (p) : [seg] "r" (__CPUNODE_SEG));
+
+	return p & VDSO_CPUNODE_MASK;
+}
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor.
+ *
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+static __always_inline unsigned long get_percpu_base(void)
+{
+	return __per_cpu_offset[gdt_get_cpu()];
+}
+#else
+static __always_inline unsigned long get_percpu_base(void)
+{
+	return pcpu_unit_offsets;
+}
+#endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 34/49] x86/entry: Add the C version ist_switch_to_kernel_gsbase()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (32 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 33/49] x86/entry: Add the C version get_percpu_base() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 35/49] x86/entry: Implement the C version ist_paranoid_entry() Lai Jiangshan
                   ` (16 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

It implements the second half of paranoid_entry() whose functionality
is to switch to kernel gsbase.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c | 49 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 3ec145c38e9e..60c37dbe650b 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -219,3 +219,52 @@ static __always_inline unsigned long get_percpu_base(void)
 	return pcpu_unit_offsets;
 }
 #endif
+
+/*
+ * Handle GSBASE depends on the availability of FSGSBASE.
+ *
+ * Without FSGSBASE the kernel enforces that negative GSBASE
+ * values indicate kernel GSBASE. With FSGSBASE no assumptions
+ * can be made about the GSBASE value when entering from user
+ * space.
+ */
+static __always_inline unsigned long ist_switch_to_kernel_gsbase(void)
+{
+	int ret = 1;
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/*
+		 * Read the current GSBASE for return.
+		 * Retrieve and set the current CPUs kernel GSBASE.
+		 *
+		 * The unconditional write to GS base below ensures that
+		 * no subsequent loads based on a mispredicted GS base can
+		 * happen, therefore no LFENCE is needed here.
+		 */
+		gsbase = rdgsbase();
+		wrgsbase(get_percpu_base());
+		return gsbase;
+	}
+
+	gsbase = __rdmsr(MSR_GS_BASE);
+
+	/*
+	 * The kernel-enforced convention is a negative GSBASE indicates
+	 * a kernel value.  No SWAPGS needed on entry and exit.
+	 */
+	if ((long)gsbase >= 0) {
+		/* User GSBASE active, SWAPGS required on exit */
+		ret = 0;
+		native_swapgs();
+	}
+
+	/*
+	 * The above ist_switch_to_kernel_cr3() doesn't do an unconditional
+	 * CR3 write, even in the PTI case.  So do an lfence to prevent GS
+	 * speculation, regardless of whether PTI is enabled.
+	 */
+	fence_swapgs_kernel_entry();
+
+	return ret;
+}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 35/49] x86/entry: Implement the C version ist_paranoid_entry()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (33 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 34/49] x86/entry: Add the C version ist_switch_to_kernel_gsbase() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 36/49] x86/entry: Implement the C version ist_paranoid_exit() Lai Jiangshan
                   ` (15 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

It implements the whole ASM version paranoid_entry().

No functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c        | 37 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/idtentry.h |  3 +++
 2 files changed, 40 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 60c37dbe650b..dc0bd9dc6d48 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -268,3 +268,40 @@ static __always_inline unsigned long ist_switch_to_kernel_gsbase(void)
 
 	return ret;
 }
+
+/*
+ * Switch and save CR3 in *@cr3 if PTI enabled. Return GSBASE related
+ * information in *@gsbase depending on the availability of the FSGSBASE
+ * instructions:
+ *
+ * FSGSBASE	*@gsbase
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
+ *
+ *     Y        GSBASE value at entry, must be restored in ist_paranoid_exit
+ */
+__visible __entry_text
+void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase)
+{
+	/*
+	 * Always stash CR3 in *@cr3.  This value will be restored,
+	 * verbatim, at exit.  Needed if ist_paranoid_entry interrupted
+	 * another entry that already switched to the user CR3 value
+	 * but has not yet returned to userspace.
+	 *
+	 * This is also why CS (stashed in the "iret frame" by the
+	 * hardware at entry) can not be used: this may be a return
+	 * to kernel code, but with a user CR3 value.
+	 *
+	 * Switching CR3 does not depend on kernel GSBASE so it can
+	 * be done before switching to the kernel GSBASE. This is
+	 * required for FSGSBASE because the kernel GSBASE has to
+	 * be retrieved from a kernel internal table.
+	 */
+	*cr3 = ist_switch_to_kernel_cr3();
+
+	barrier();
+
+	/* Handle GSBASE, store the return value in *@gsbase for exit. */
+	*gsbase = ist_switch_to_kernel_gsbase();
+}
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 49fabc3e3f0d..f6efa21ec242 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -307,6 +307,9 @@ static __always_inline void __##func(struct pt_regs *regs)
 	DECLARE_IDTENTRY(vector, func)
 
 #ifdef CONFIG_X86_64
+__visible __entry_text
+void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase);
+
 /**
  * DECLARE_IDTENTRY_IST - Declare functions for IST handling IDT entry points
  * @vector:	Vector number (ignored for C)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 36/49] x86/entry: Implement the C version ist_paranoid_exit()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (34 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 35/49] x86/entry: Implement the C version ist_paranoid_entry() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 37/49] x86/entry: Add a C macro to define the function body for IST in .entry.text Lai Jiangshan
                   ` (14 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

It implements the whole ASM version paranoid_exit().

No functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry64.c        | 41 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/idtentry.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index dc0bd9dc6d48..63a6021a1f70 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -269,6 +269,29 @@ static __always_inline unsigned long ist_switch_to_kernel_gsbase(void)
 	return ret;
 }
 
+static __always_inline void ist_restore_gsbase(unsigned long gsbase)
+{
+	/*
+	 * Handle the three GSBASE cases.
+	 *
+	 * @gsbase contains the GSBASE related information depending
+	 * on the availability of the FSGSBASE instructions:
+	 *
+	 * FSGSBASE	@gsbase
+	 *     N        0 -> SWAPGS on exit
+	 *              1 -> no SWAPGS on exit
+	 *
+	 *     Y        User space GSBASE, must be restored unconditionally
+	 */
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		wrgsbase(gsbase);
+		return;
+	}
+
+	if (!gsbase)
+		native_swapgs();
+}
+
 /*
  * Switch and save CR3 in *@cr3 if PTI enabled. Return GSBASE related
  * information in *@gsbase depending on the availability of the FSGSBASE
@@ -305,3 +328,21 @@ void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase)
 	/* Handle GSBASE, store the return value in *@gsbase for exit. */
 	*gsbase = ist_switch_to_kernel_gsbase();
 }
+
+/*
+ * "Paranoid" exit path from exception stack.  This is invoked
+ * only on return from IST interrupts that came from kernel space.
+ *
+ * We may be returning to very strange contexts (e.g. very early
+ * in syscall entry), so checking for preemption here would
+ * be complicated.  Fortunately, there's no good reason to try
+ * to handle preemption here.
+ */
+__visible __entry_text
+void ist_paranoid_exit(unsigned long cr3, unsigned long gsbase)
+{
+	/* Restore CR3 at first, it can use kernel GSBASE. */
+	ist_restore_cr3(cr3);
+	barrier();
+	ist_restore_gsbase(gsbase);
+}
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index f6efa21ec242..cf41901227ed 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -309,6 +309,8 @@ static __always_inline void __##func(struct pt_regs *regs)
 #ifdef CONFIG_X86_64
 __visible __entry_text
 void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase);
+__visible __entry_text
+void ist_paranoid_exit(unsigned long cr3, unsigned long gsbase);
 
 /**
  * DECLARE_IDTENTRY_IST - Declare functions for IST handling IDT entry points
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 37/49] x86/entry: Add a C macro to define the function body for IST in .entry.text
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (35 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 36/49] x86/entry: Implement the C version ist_paranoid_exit() Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 38/49] x86/debug, mce: Use C entry code Lai Jiangshan
                   ` (13 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Add DEFINE_IDTENTRY_IST_ENTRY() macro to define C code to implement
the ASM code which calls paranoid_entry(), cfunc(), paranoid_exit()
in series for IST exceptions without error code.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index cf41901227ed..7b17ffa43e10 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -337,6 +337,20 @@ void ist_paranoid_exit(unsigned long cr3, unsigned long gsbase);
 	__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code);	\
 	__visible noinstr void   user_##func(struct pt_regs *regs, unsigned long error_code)
 
+/**
+ * DEFINE_IDTENTRY_IST_ENTRY - Emit __entry_text code for IST entry points
+ * @func:	Function name of the entry point
+ */
+#define DEFINE_IDTENTRY_IST_ENTRY(func)					\
+__visible __entry_text void ist_##func(struct pt_regs *regs)		\
+{									\
+	unsigned long cr3, gsbase;					\
+									\
+	ist_paranoid_entry(&cr3, &gsbase);				\
+	func(regs);							\
+	ist_paranoid_exit(cr3, gsbase);					\
+}
+
 /**
  * DEFINE_IDTENTRY_IST - Emit code for IST entry points
  * @func:	Function name of the entry point
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 38/49] x86/debug, mce: Use C entry code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (36 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 37/49] x86/entry: Add a C macro to define the function body for IST in .entry.text Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:11 ` [PATCH V6 39/49] x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up Lai Jiangshan
                   ` (12 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Tony Luck, Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel, Javier Martinez Canillas,
	Daniel Bristot de Oliveira, Brijesh Singh, Andy Shevchenko,
	Tom Lendacky, linux-edac

From: Lai Jiangshan <laijs@linux.alibaba.com>

Use DEFINE_IDTENTRY_IST_ENTRY to emit C entry function in C files and
use the function directly in entry_64.S.

It also removes stack-protector from C files because the C entry code
can NOT use percpu register until GSBASE is properly switched.
And stack-protector depends on the percpu register to work.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S        | 10 +---------
 arch/x86/include/asm/idtentry.h  |  1 +
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/mce/Makefile |  3 +++
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bd5e005316a3..ac05cbf894f5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -465,16 +465,8 @@ SYM_CODE_START(\asmsym)
 	testb	$3, CS(%rsp)
 	jnz	.Lfrom_usermode_switch_stack_\@
 
-	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
-	call	paranoid_entry
-
-	UNWIND_HINT_REGS
-
 	movq	%rsp, %rdi		/* pt_regs pointer */
-
-	call	\cfunc
-
-	call	paranoid_exit
+	call	ist_\cfunc
 	jmp	restore_regs_and_return_to_kernel
 
 	/* Switch to the regular task stack and use the noist entry point */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 7b17ffa43e10..f274e4e2ca17 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -358,6 +358,7 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
  * Maps to DEFINE_IDTENTRY_RAW
  */
 #define DEFINE_IDTENTRY_IST(func)					\
+	DEFINE_IDTENTRY_IST_ENTRY(func)					\
 	DEFINE_IDTENTRY_RAW(func)
 
 /**
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..8ac45801ba8b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -50,6 +50,7 @@ KCOV_INSTRUMENT		:= n
 
 CFLAGS_head$(BITS).o	+= -fno-stack-protector
 CFLAGS_cc_platform.o	+= -fno-stack-protector
+CFLAGS_traps.o		+= -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
diff --git a/arch/x86/kernel/cpu/mce/Makefile b/arch/x86/kernel/cpu/mce/Makefile
index 015856abdbb1..555963416ec3 100644
--- a/arch/x86/kernel/cpu/mce/Makefile
+++ b/arch/x86/kernel/cpu/mce/Makefile
@@ -1,4 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
+
+CFLAGS_core.o			+= -fno-stack-protector
+
 obj-y				=  core.o severity.o genpool.o
 
 obj-$(CONFIG_X86_ANCIENT_MCE)	+= winchip.o p5.o
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 39/49] x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (37 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 38/49] x86/debug, mce: Use C entry code Lai Jiangshan
@ 2021-11-26 10:11 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 40/49] x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi Lai Jiangshan
                   ` (11 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Move them closer to the related definitions and reduce a #ifdef entry.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index f274e4e2ca17..737fbbe19d84 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -372,6 +372,14 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
 #define DEFINE_IDTENTRY_NOIST(func)					\
 	DEFINE_IDTENTRY_RAW(noist_##func)
 
+#define DECLARE_IDTENTRY_MCE		DECLARE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_MCE		DEFINE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_MCE_USER	DEFINE_IDTENTRY_NOIST
+
+#define DECLARE_IDTENTRY_DEBUG		DECLARE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_DEBUG		DEFINE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_DEBUG_USER	DEFINE_IDTENTRY_NOIST
+
 /**
  * DECLARE_IDTENTRY_DF - Declare functions for double fault
  * @vector:	Vector number (ignored for C)
@@ -446,16 +454,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #define DECLARE_IDTENTRY_NMI		DECLARE_IDTENTRY_RAW
 #define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
 
-#ifdef CONFIG_X86_64
-#define DECLARE_IDTENTRY_MCE		DECLARE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_MCE		DEFINE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_MCE_USER	DEFINE_IDTENTRY_NOIST
-
-#define DECLARE_IDTENTRY_DEBUG		DECLARE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_DEBUG		DEFINE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_DEBUG_USER	DEFINE_IDTENTRY_NOIST
-#endif
-
 #else /* !__ASSEMBLY__ */
 
 /*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 40/49] x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (38 preceding siblings ...)
  2021-11-26 10:11 ` [PATCH V6 39/49] x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 41/49] x86/nmi: Use C entry code Lai Jiangshan
                   ` (10 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
	Brijesh Singh

From: Lai Jiangshan <laijs@linux.alibaba.com>

DEFINE_IDTENTRY_NMI is defined, but not used.  It is better to use it.

It is also prepared for later patch to define DEFINE_IDTENTRY_NMI
differently in 32bit and 64bit.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kernel/nmi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4bce802d25fb..44c3adb68282 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -473,7 +473,7 @@ static DEFINE_PER_CPU(enum nmi_states, nmi_state);
 static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 static DEFINE_PER_CPU(unsigned long, nmi_dr7);
 
-DEFINE_IDTENTRY_RAW(exc_nmi)
+DEFINE_IDTENTRY_NMI(exc_nmi)
 {
 	irqentry_state_t irq_state;
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 41/49] x86/nmi: Use C entry code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (39 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 40/49] x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 42/49] x86/entry: Add a C macro to define the function body for IST in .entry.text with an error code Lai Jiangshan
                   ` (9 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel, Javier Martinez Canillas,
	Daniel Bristot de Oliveira, Brijesh Singh, Andy Shevchenko,
	Tom Lendacky

From: Lai Jiangshan <laijs@linux.alibaba.com>

Use DEFINE_IDTENTRY_IST_ENTRY to emit C entry function and use the function
directly in entry_64.S.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S       | 17 ++---------------
 arch/x86/include/asm/idtentry.h |  5 ++++-
 arch/x86/kernel/Makefile        |  1 +
 3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ac05cbf894f5..cc552e23d691 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1306,21 +1306,8 @@ end_repeat_nmi:
 	PUSH_AND_CLEAR_REGS
 	ENCODE_FRAME_POINTER
 
-	/*
-	 * Use paranoid_entry to handle SWAPGS and CR3.
-	 */
-	call	paranoid_entry
-	UNWIND_HINT_REGS
-
-	movq	%rsp, %rdi
-	movq	$-1, %rsi
-	call	exc_nmi
-
-	/*
-	 * Use paranoid_exit to handle SWAPGS and CR3, but no need to use
-	 * restore_regs_and_return_to_kernel as we must handle nested NMI.
-	 */
-	call	paranoid_exit
+	movq	%rsp, %rdi		/* pt_regs pointer */
+	call	ist_exc_nmi
 
 	POP_REGS
 
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 737fbbe19d84..b65cb61aafdc 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -372,6 +372,8 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
 #define DEFINE_IDTENTRY_NOIST(func)					\
 	DEFINE_IDTENTRY_RAW(noist_##func)
 
+#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_IST
+
 #define DECLARE_IDTENTRY_MCE		DECLARE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_MCE		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_MCE_USER	DEFINE_IDTENTRY_NOIST
@@ -421,6 +423,8 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
 
 #else	/* CONFIG_X86_64 */
 
+#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
+
 /**
  * DECLARE_IDTENTRY_DF - Declare functions for double fault 32bit variant
  * @vector:	Vector number (ignored for C)
@@ -452,7 +456,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 
 /* C-Code mapping */
 #define DECLARE_IDTENTRY_NMI		DECLARE_IDTENTRY_RAW
-#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
 
 #else /* !__ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8ac45801ba8b..28815c2e6cb2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -51,6 +51,7 @@ KCOV_INSTRUMENT		:= n
 CFLAGS_head$(BITS).o	+= -fno-stack-protector
 CFLAGS_cc_platform.o	+= -fno-stack-protector
 CFLAGS_traps.o		+= -fno-stack-protector
+CFLAGS_nmi.o		+= -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 42/49] x86/entry: Add a C macro to define the function body for IST in .entry.text with an error code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (40 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 41/49] x86/nmi: Use C entry code Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 43/49] x86/doublefault: Use C entry code Lai Jiangshan
                   ` (8 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Juergen Gross,
	Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Add DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE() macro to define C code to
implement the ASM code which calls paranoid_entry(), modify orig_ax,
cfunc(), paranoid_exit() in series for IST exceptions with an error code.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/idtentry.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b65cb61aafdc..46b2ef021992 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -351,6 +351,22 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
 	ist_paranoid_exit(cr3, gsbase);					\
 }
 
+/**
+ * DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE - Emit __entry_text code for IST
+ *					 entry points with an error code
+ * @func:	Function name of the entry point
+ */
+#define DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE(func)			\
+__visible __entry_text void ist_##func(struct pt_regs *regs)		\
+{									\
+	unsigned long cr3, gsbase, error_code = regs->orig_ax;		\
+									\
+	ist_paranoid_entry(&cr3, &gsbase);				\
+	regs->orig_ax = -1;	/* no syscall to restart */		\
+	func(regs, error_code);						\
+	ist_paranoid_exit(cr3, gsbase);					\
+}
+
 /**
  * DEFINE_IDTENTRY_IST - Emit code for IST entry points
  * @func:	Function name of the entry point
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 43/49] x86/doublefault: Use C entry code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (41 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 42/49] x86/entry: Add a C macro to define the function body for IST in .entry.text with an error code Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 44/49] x86/sev: Add and use ist_vc_switch_off_ist() Lai Jiangshan
                   ` (7 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Use DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE to emit C entry function and
use the function directly in entry_64.S.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S       | 12 ++----------
 arch/x86/include/asm/idtentry.h |  1 +
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cc552e23d691..5a9738218722 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -574,16 +574,8 @@ SYM_CODE_START(\asmsym)
 	PUSH_AND_CLEAR_REGS
 	ENCODE_FRAME_POINTER
 
-	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
-	call	paranoid_entry
-	UNWIND_HINT_REGS
-
-	movq	%rsp, %rdi		/* pt_regs pointer into first argument */
-	movq	ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
-	movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
-	call	\cfunc
-
-	call	paranoid_exit
+	movq	%rsp, %rdi		/* pt_regs pointer */
+	call	ist_\cfunc
 	jmp	restore_regs_and_return_to_kernel
 
 _ASM_NOKPROBE(\asmsym)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 46b2ef021992..144f3a6d875a 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -415,6 +415,7 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
  * Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
  */
 #define DEFINE_IDTENTRY_DF(func)					\
+	DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE(func)			\
 	DEFINE_IDTENTRY_RAW_ERRORCODE(func)
 
 /**
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 44/49] x86/sev: Add and use ist_vc_switch_off_ist()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (42 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 43/49] x86/doublefault: Use C entry code Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 45/49] x86/sev: Use C entry code Lai Jiangshan
                   ` (6 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Peter Zijlstra, Joerg Roedel, Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

ist_vc_switch_off_ist() is the same as vc_switch_off_ist(), but it is
called without CR3 or gsbase fixed.  It has to call ist_paranoid_entry()
by its own.

It is prepared for using C code for the other part of identry_vc and
remove ASM paranoid_entry() and paranoid_exit().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S    | 20 ++++++++++----------
 arch/x86/include/asm/traps.h |  3 ++-
 arch/x86/kernel/traps.c      | 14 +++++++++++++-
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5a9738218722..b18df736b981 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -515,26 +515,26 @@ SYM_CODE_START(\asmsym)
 	testb	$3, CS(%rsp)
 	jnz	.Lfrom_usermode_switch_stack_\@
 
-	/*
-	 * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
-	 * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
-	 */
-	call	paranoid_entry
-
-	UNWIND_HINT_REGS
-
 	/*
 	 * Switch off the IST stack to make it free for nested exceptions. The
-	 * vc_switch_off_ist() function will switch back to the interrupted
+	 * ist_vc_switch_off_ist() function will switch back to the interrupted
 	 * stack if it is safe to do so. If not it switches to the VC fall-back
 	 * stack.
 	 */
 	movq	%rsp, %rdi		/* pt_regs pointer */
-	call	vc_switch_off_ist
+	call	ist_vc_switch_off_ist
 	movq	%rax, %rsp		/* Switch to new stack */
 
 	UNWIND_HINT_REGS
 
+	/*
+	 * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
+	 * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
+	 */
+	call	paranoid_entry
+
+	UNWIND_HINT_REGS
+
 	/* Update pt_regs */
 	movq	ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
 	movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 686461ac9803..1aefc081d763 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,7 +16,8 @@ asmlinkage __visible notrace
 struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
 asmlinkage __visible notrace struct pt_regs *error_entry(struct pt_regs *eregs);
 void __init trap_init(void);
-asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
+asmlinkage __visible __entry_text
+struct pt_regs *ist_vc_switch_off_ist(struct pt_regs *eregs);
 #endif
 
 #ifdef CONFIG_X86_F00F_BUG
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4e9d306f313c..1a84587cb4c7 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -717,7 +717,7 @@ asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
 }
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
-asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *regs)
+static noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *regs)
 {
 	unsigned long sp, *stack;
 	struct stack_info info;
@@ -757,6 +757,18 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r
 
 	return regs_ret;
 }
+
+asmlinkage __visible __entry_text
+struct pt_regs *ist_vc_switch_off_ist(struct pt_regs *regs)
+{
+	unsigned long cr3, gsbase;
+
+	ist_paranoid_entry(&cr3, &gsbase);
+	regs = vc_switch_off_ist(regs);
+	ist_paranoid_exit(cr3, gsbase);
+
+	return regs;
+}
 #endif
 
 asmlinkage __visible noinstr
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 45/49] x86/sev: Use C entry code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (43 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 44/49] x86/sev: Add and use ist_vc_switch_off_ist() Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 46/49] x86/entry: Remove ASM function paranoid_entry() and paranoid_exit() Lai Jiangshan
                   ` (5 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Liam Merwick, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Juergen Gross, Peter Zijlstra (Intel),
	Joerg Roedel, Javier Martinez Canillas,
	Daniel Bristot de Oliveira, Brijesh Singh, Andy Shevchenko,
	Tom Lendacky

From: Lai Jiangshan <laijs@linux.alibaba.com>

Use DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE to emit C entry function and
use the function directly in entry_64.S.

Cc: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S       | 22 +---------------------
 arch/x86/include/asm/idtentry.h |  1 +
 arch/x86/kernel/Makefile        |  1 +
 3 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b18df736b981..614e6cbb871b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -527,28 +527,8 @@ SYM_CODE_START(\asmsym)
 
 	UNWIND_HINT_REGS
 
-	/*
-	 * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
-	 * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
-	 */
-	call	paranoid_entry
-
-	UNWIND_HINT_REGS
-
-	/* Update pt_regs */
-	movq	ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
-	movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
-
 	movq	%rsp, %rdi		/* pt_regs pointer */
-
-	call	kernel_\cfunc
-
-	/*
-	 * No need to switch back to the IST stack. The current stack is either
-	 * identical to the stack in the IRET frame or the VC fall-back stack,
-	 * so it is definitely mapped even with PTI enabled.
-	 */
-	call	paranoid_exit
+	call	ist_kernel_\cfunc
 	jmp	restore_regs_and_return_to_kernel
 
 	/* Switch to the regular task stack */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 144f3a6d875a..7cfdb898982e 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -426,6 +426,7 @@ __visible __entry_text void ist_##func(struct pt_regs *regs)		\
  * Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
  */
 #define DEFINE_IDTENTRY_VC_KERNEL(func)				\
+	DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE(kernel_##func)	\
 	DEFINE_IDTENTRY_RAW_ERRORCODE(kernel_##func)
 
 /**
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 28815c2e6cb2..9535d03aaa61 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -52,6 +52,7 @@ CFLAGS_head$(BITS).o	+= -fno-stack-protector
 CFLAGS_cc_platform.o	+= -fno-stack-protector
 CFLAGS_traps.o		+= -fno-stack-protector
 CFLAGS_nmi.o		+= -fno-stack-protector
+CFLAGS_sev.o		+= -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 46/49] x86/entry: Remove ASM function paranoid_entry() and paranoid_exit()
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (44 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 45/49] x86/sev: Use C entry code Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 47/49] x86/entry: Remove the unused ASM macros Lai Jiangshan
                   ` (4 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

IST exceptions are changed to use C entry code which uses the C function
ist_paranoid_entry() and ist_paranoid_exit().  The ASM function
paranoid_entry() and paranoid_exit() are useless.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 126 --------------------------------------
 1 file changed, 126 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 614e6cbb871b..a583089e88c1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -848,132 +848,6 @@ SYM_CODE_START(xen_failsafe_callback)
 SYM_CODE_END(xen_failsafe_callback)
 #endif /* CONFIG_XEN_PV */
 
-/*
- * Save all registers in pt_regs. Return GSBASE related information
- * in EBX depending on the availability of the FSGSBASE instructions:
- *
- * FSGSBASE	R/EBX
- *     N        0 -> SWAPGS on exit
- *              1 -> no SWAPGS on exit
- *
- *     Y        GSBASE value at entry, must be restored in paranoid_exit
- */
-SYM_CODE_START_LOCAL(paranoid_entry)
-	UNWIND_HINT_FUNC
-
-	/*
-	 * Always stash CR3 in %r14.  This value will be restored,
-	 * verbatim, at exit.  Needed if paranoid_entry interrupted
-	 * another entry that already switched to the user CR3 value
-	 * but has not yet returned to userspace.
-	 *
-	 * This is also why CS (stashed in the "iret frame" by the
-	 * hardware at entry) can not be used: this may be a return
-	 * to kernel code, but with a user CR3 value.
-	 *
-	 * Switching CR3 does not depend on kernel GSBASE so it can
-	 * be done before switching to the kernel GSBASE. This is
-	 * required for FSGSBASE because the kernel GSBASE has to
-	 * be retrieved from a kernel internal table.
-	 */
-	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
-
-	/*
-	 * Handling GSBASE depends on the availability of FSGSBASE.
-	 *
-	 * Without FSGSBASE the kernel enforces that negative GSBASE
-	 * values indicate kernel GSBASE. With FSGSBASE no assumptions
-	 * can be made about the GSBASE value when entering from user
-	 * space.
-	 */
-	ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
-
-	/*
-	 * Read the current GSBASE and store it in %rbx unconditionally,
-	 * retrieve and set the current CPUs kernel GSBASE. The stored value
-	 * has to be restored in paranoid_exit unconditionally.
-	 *
-	 * The unconditional write to GS base below ensures that no subsequent
-	 * loads based on a mispredicted GS base can happen, therefore no LFENCE
-	 * is needed here.
-	 */
-	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
-	ret
-
-.Lparanoid_entry_checkgs:
-	/* EBX = 1 -> kernel GSBASE active, no restore required */
-	movl	$1, %ebx
-	/*
-	 * The kernel-enforced convention is a negative GSBASE indicates
-	 * a kernel value. No SWAPGS needed on entry and exit.
-	 */
-	movl	$MSR_GS_BASE, %ecx
-	rdmsr
-	testl	%edx, %edx
-	js	.Lparanoid_kernel_gsbase
-
-	/* EBX = 0 -> SWAPGS required on exit */
-	xorl	%ebx, %ebx
-	swapgs
-.Lparanoid_kernel_gsbase:
-
-	/*
-	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
-	 * unconditional CR3 write, even in the PTI case.  So do an lfence
-	 * to prevent GS speculation, regardless of whether PTI is enabled.
-	 */
-	FENCE_SWAPGS_KERNEL_ENTRY
-	ret
-SYM_CODE_END(paranoid_entry)
-
-/*
- * "Paranoid" exit path from exception stack.  This is invoked
- * only on return from IST interrupts that came from kernel space.
- *
- * We may be returning to very strange contexts (e.g. very early
- * in syscall entry), so checking for preemption here would
- * be complicated.  Fortunately, there's no good reason to try
- * to handle preemption here.
- *
- * R/EBX contains the GSBASE related information depending on the
- * availability of the FSGSBASE instructions:
- *
- * FSGSBASE	R/EBX
- *     N        0 -> SWAPGS on exit
- *              1 -> no SWAPGS on exit
- *
- *     Y        User space GSBASE, must be restored unconditionally
- */
-SYM_CODE_START_LOCAL(paranoid_exit)
-	UNWIND_HINT_REGS offset=8
-	/*
-	 * The order of operations is important. RESTORE_CR3 requires
-	 * kernel GSBASE.
-	 *
-	 * NB to anyone to try to optimize this code: this code does
-	 * not execute at all for exceptions from user mode. Those
-	 * exceptions go through error_exit instead.
-	 */
-	RESTORE_CR3	scratch_reg=%rax save_reg=%r14
-
-	/* Handle the three GSBASE cases */
-	ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
-
-	/* With FSGSBASE enabled, unconditionally restore GSBASE */
-	wrgsbase	%rbx
-	ret
-
-.Lparanoid_exit_checkgs:
-	/* On non-FSGSBASE systems, conditionally do SWAPGS */
-	testl		%ebx, %ebx
-	jnz		.Lparanoid_exit_done
-
-	/* We are returning to a context with user GSBASE */
-	swapgs
-.Lparanoid_exit_done:
-	ret
-SYM_CODE_END(paranoid_exit)
-
 SYM_CODE_START_LOCAL(error_return)
 	UNWIND_HINT_REGS
 	DEBUG_ENTRY_ASSERT_IRQS_OFF
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 47/49] x86/entry: Remove the unused ASM macros
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (45 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 46/49] x86/entry: Remove ASM function paranoid_entry() and paranoid_exit() Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 48/49] x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS Lai Jiangshan
                   ` (3 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

They are implemented and used in C code.  The ASM version is not needed
any more.

FENCE_SWAPGS_USER_ENTRY is not removed because it is still being used
in the nmi userspace path.  It could be possible to be removed in
future entry code enhancement.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/calling.h | 99 ----------------------------------------
 1 file changed, 99 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9065c31d2875..d42012fc694d 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -210,53 +210,6 @@ For 32-bit we have the following conventions - kernel is built with
 	popq	%rax
 .endm
 
-.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
-	ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
-	movq	%cr3, \scratch_reg
-	movq	\scratch_reg, \save_reg
-	/*
-	 * Test the user pagetable bit. If set, then the user page tables
-	 * are active. If clear CR3 already has the kernel page table
-	 * active.
-	 */
-	bt	$PTI_USER_PGTABLE_BIT, \scratch_reg
-	jnc	.Ldone_\@
-
-	ADJUST_KERNEL_CR3 \scratch_reg
-	movq	\scratch_reg, %cr3
-
-.Ldone_\@:
-.endm
-
-.macro RESTORE_CR3 scratch_reg:req save_reg:req
-	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
-
-	/* No need to restore when the saved CR3 is kernel CR3. */
-	bt	$PTI_USER_PGTABLE_BIT, \save_reg
-	jnc	.Lend_\@
-
-	ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
-
-	/*
-	 * Check if there's a pending flush for the user ASID we're
-	 * about to set.
-	 */
-	movq	\save_reg, \scratch_reg
-	andq	$(0x7FF), \scratch_reg
-	bt	\scratch_reg, THIS_CPU_user_pcid_flush_mask
-	jnc	.Lnoflush_\@
-
-	btr	\scratch_reg, THIS_CPU_user_pcid_flush_mask
-	jmp	.Lwrcr3_\@
-
-.Lnoflush_\@:
-	SET_NOFLUSH_BIT \save_reg
-
-.Lwrcr3_\@:
-	movq	\save_reg, %cr3
-.Lend_\@:
-.endm
-
 #else /* CONFIG_PAGE_TABLE_ISOLATION=n: */
 
 .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
@@ -265,10 +218,6 @@ For 32-bit we have the following conventions - kernel is built with
 .endm
 .macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
 .endm
-.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
-.endm
-.macro RESTORE_CR3 scratch_reg:req save_reg:req
-.endm
 
 #endif
 
@@ -277,17 +226,10 @@ For 32-bit we have the following conventions - kernel is built with
  *
  * FENCE_SWAPGS_USER_ENTRY is used in the user entry swapgs code path, to
  * prevent a speculative swapgs when coming from kernel space.
- *
- * FENCE_SWAPGS_KERNEL_ENTRY is used in the kernel entry non-swapgs code path,
- * to prevent the swapgs from getting speculatively skipped when coming from
- * user space.
  */
 .macro FENCE_SWAPGS_USER_ENTRY
 	ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_USER
 .endm
-.macro FENCE_SWAPGS_KERNEL_ENTRY
-	ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL
-.endm
 
 .macro STACKLEAK_ERASE_NOCLOBBER
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
@@ -297,12 +239,6 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
-.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
-	rdgsbase \save_reg
-	GET_PERCPU_BASE \scratch_reg
-	wrgsbase \scratch_reg
-.endm
-
 #else /* CONFIG_X86_64 */
 # undef		UNWIND_HINT_IRET_REGS
 # define	UNWIND_HINT_IRET_REGS
@@ -313,38 +249,3 @@ For 32-bit we have the following conventions - kernel is built with
 	call stackleak_erase
 #endif
 .endm
-
-#ifdef CONFIG_SMP
-
-/*
- * CPU/node NR is loaded from the limit (size) field of a special segment
- * descriptor entry in GDT.
- */
-.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
-	movq	$__CPUNODE_SEG, \reg
-	lsl	\reg, \reg
-.endm
-
-/*
- * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
- * We normally use %gs for accessing per-CPU data, but we are setting up
- * %gs here and obviously can not use %gs itself to access per-CPU data.
- *
- * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
- * may not restore the host's value until the CPU returns to userspace.
- * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
- * while running KVM's run loop.
- */
-.macro GET_PERCPU_BASE reg:req
-	LOAD_CPU_AND_NODE_SEG_LIMIT \reg
-	andq	$VDSO_CPUNODE_MASK, \reg
-	movq	__per_cpu_offset(, \reg, 8), \reg
-.endm
-
-#else
-
-.macro GET_PERCPU_BASE reg:req
-	movq	pcpu_unit_offsets(%rip), \reg
-.endm
-
-#endif /* CONFIG_SMP */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 48/49] x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (46 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 47/49] x86/entry: Remove the unused ASM macros Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-26 10:12 ` [PATCH V6 49/49] x86/syscall/64: Move the checking for sysret to C code Lai Jiangshan
                   ` (2 subsequent siblings)
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <laijs@linux.alibaba.com>

PUSH_AND_CLEAR_REGS is never used with save_ret anymore.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/calling.h | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index d42012fc694d..6f9de1c6da73 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -63,15 +63,9 @@ For 32-bit we have the following conventions - kernel is built with
  * for assembly code:
  */
 
-.macro PUSH_REGS rdx=%rdx rax=%rax save_ret=0
-	.if \save_ret
-	pushq	%rsi		/* pt_regs->si */
-	movq	8(%rsp), %rsi	/* temporarily store the return address in %rsi */
-	movq	%rdi, 8(%rsp)	/* pt_regs->di (overwriting original return address) */
-	.else
+.macro PUSH_REGS rdx=%rdx rax=%rax
 	pushq   %rdi		/* pt_regs->di */
 	pushq   %rsi		/* pt_regs->si */
-	.endif
 	pushq	\rdx		/* pt_regs->dx */
 	pushq   %rcx		/* pt_regs->cx */
 	pushq   \rax		/* pt_regs->ax */
@@ -86,10 +80,6 @@ For 32-bit we have the following conventions - kernel is built with
 	pushq	%r14		/* pt_regs->r14 */
 	pushq	%r15		/* pt_regs->r15 */
 	UNWIND_HINT_REGS
-
-	.if \save_ret
-	pushq	%rsi		/* return address on top of stack */
-	.endif
 .endm
 
 .macro CLEAR_REGS
@@ -114,8 +104,8 @@ For 32-bit we have the following conventions - kernel is built with
 
 .endm
 
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
-	PUSH_REGS rdx=\rdx, rax=\rax, save_ret=\save_ret
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax
+	PUSH_REGS rdx=\rdx, rax=\rax
 	CLEAR_REGS
 .endm
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH V6 49/49] x86/syscall/64: Move the checking for sysret to C code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (47 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 48/49] x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS Lai Jiangshan
@ 2021-11-26 10:12 ` Lai Jiangshan
  2021-11-27 17:46 ` [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into " Damian Tometzki
  2021-12-03  9:31 ` Lai Jiangshan
  50 siblings, 0 replies; 58+ messages in thread
From: Lai Jiangshan @ 2021-11-26 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Peter Collingbourne

From: Lai Jiangshan <laijs@linux.alibaba.com>

Like do_fast_syscall_32() which checks whether it can return to userspace
via fast instructions before the function returns, do_syscall_64()
also checks whether it can use sysret to return to userspace before
do_syscall_64() returns via C code.  And a bunch of ASM code can be
removed.

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/calling.h       | 10 +----
 arch/x86/entry/common.c        | 73 ++++++++++++++++++++++++++++++-
 arch/x86/entry/entry_64.S      | 78 ++--------------------------------
 arch/x86/include/asm/syscall.h |  2 +-
 4 files changed, 78 insertions(+), 85 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 6f9de1c6da73..05da3ef48ee4 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -109,27 +109,19 @@ For 32-bit we have the following conventions - kernel is built with
 	CLEAR_REGS
 .endm
 
-.macro POP_REGS pop_rdi=1 skip_r11rcx=0
+.macro POP_REGS pop_rdi=1
 	popq %r15
 	popq %r14
 	popq %r13
 	popq %r12
 	popq %rbp
 	popq %rbx
-	.if \skip_r11rcx
-	popq %rsi
-	.else
 	popq %r11
-	.endif
 	popq %r10
 	popq %r9
 	popq %r8
 	popq %rax
-	.if \skip_r11rcx
-	popq %rsi
-	.else
 	popq %rcx
-	.endif
 	popq %rdx
 	popq %rsi
 	.if \pop_rdi
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b33..718045b7a53c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -70,7 +70,77 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 	return false;
 }
 
-__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
+/*
+ * Change top bits to match the most significant bit (47th or 56th bit
+ * depending on paging mode) in the address to get canonical address.
+ *
+ * If width of "canonical tail" ever becomes variable, this will need
+ * to be updated to remain correct on both old and new CPUs.
+ */
+static __always_inline u64 canonical_address(u64 vaddr)
+{
+	if (IS_ENABLED(CONFIG_X86_5LEVEL) && static_cpu_has(X86_FEATURE_LA57))
+		return ((s64)vaddr << (64 - 57)) >> (64 - 57);
+	else
+		return ((s64)vaddr << (64 - 48)) >> (64 - 48);
+}
+
+/*
+ * Check if it can use SYSRET.
+ *
+ * Try to use SYSRET instead of IRET if we're returning to
+ * a completely clean 64-bit userspace context.
+ *
+ * Returns 0 to return using IRET or 1 to return using SYSRET.
+ */
+static __always_inline int can_sysret(struct pt_regs *regs)
+{
+	/* In the Xen PV case we must use iret anyway. */
+	if (static_cpu_has(X86_FEATURE_XENPV))
+		return 0;
+
+	/* SYSRET requires RCX == RIP && R11 == RFLAGS */
+	if (regs->ip != regs->cx || regs->flags != regs->r11)
+		return 0;
+
+	/* CS and SS must match SYSRET */
+	if (regs->cs != __USER_CS || regs->ss != __USER_DS)
+		return 0;
+
+	/*
+	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
+	 * in kernel space.  This essentially lets the user take over
+	 * the kernel, since userspace controls RSP.
+	 */
+	if (regs->cx != canonical_address(regs->cx))
+		return 0;
+
+	/*
+	 * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
+	 * restore RF properly. If the slowpath sets it for whatever reason, we
+	 * need to restore it correctly.
+	 *
+	 * SYSRET can restore TF, but unlike IRET, restoring TF results in a
+	 * trap from userspace immediately after SYSRET.  This would cause an
+	 * infinite loop whenever #DB happens with register state that satisfies
+	 * the opportunistic SYSRET conditions.  For example, single-stepping
+	 * this user code:
+	 *
+	 *           movq	$stuck_here, %rcx
+	 *           pushfq
+	 *           popq %r11
+	 *   stuck_here:
+	 *
+	 * would never get past 'stuck_here'.
+	 */
+	if (regs->r11 & (X86_EFLAGS_RF | X86_EFLAGS_TF))
+		return 0;
+
+	return 1;
+}
+
+/* Returns 0 to return using IRET or 1 to return using SYSRET. */
+__visible noinstr int do_syscall_64(struct pt_regs *regs, int nr)
 {
 	add_random_kstack_offset();
 	nr = syscall_enter_from_user_mode(regs, nr);
@@ -84,6 +154,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
 
 	instrumentation_end();
 	syscall_exit_to_user_mode(regs);
+	return can_sysret(regs);
 }
 #endif
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a583089e88c1..77e255e3c6c2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -112,85 +112,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
 	movslq	%eax, %rsi
 	call	do_syscall_64		/* returns with IRQs disabled */
 
-	/*
-	 * Try to use SYSRET instead of IRET if we're returning to
-	 * a completely clean 64-bit userspace context.  If we're not,
-	 * go to the slow exit path.
-	 * In the Xen PV case we must use iret anyway.
-	 */
-
-	ALTERNATIVE "", "jmp	swapgs_restore_regs_and_return_to_usermode", \
-		X86_FEATURE_XENPV
-
-	movq	RCX(%rsp), %rcx
-	movq	RIP(%rsp), %r11
-
-	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
-	jne	swapgs_restore_regs_and_return_to_usermode
+	testl	%eax, %eax
+	jz swapgs_restore_regs_and_return_to_usermode
 
 	/*
-	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
-	 * in kernel space.  This essentially lets the user take over
-	 * the kernel, since userspace controls RSP.
-	 *
-	 * If width of "canonical tail" ever becomes variable, this will need
-	 * to be updated to remain correct on both old and new CPUs.
-	 *
-	 * Change top bits to match most significant bit (47th or 56th bit
-	 * depending on paging mode) in the address.
-	 */
-#ifdef CONFIG_X86_5LEVEL
-	ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
-		"shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
-#else
-	shl	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
-	sar	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
-#endif
-
-	/* If this changed %rcx, it was not canonical */
-	cmpq	%rcx, %r11
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	movq	R11(%rsp), %r11
-	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	/*
-	 * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
-	 * restore RF properly. If the slowpath sets it for whatever reason, we
-	 * need to restore it correctly.
-	 *
-	 * SYSRET can restore TF, but unlike IRET, restoring TF results in a
-	 * trap from userspace immediately after SYSRET.  This would cause an
-	 * infinite loop whenever #DB happens with register state that satisfies
-	 * the opportunistic SYSRET conditions.  For example, single-stepping
-	 * this user code:
-	 *
-	 *           movq	$stuck_here, %rcx
-	 *           pushfq
-	 *           popq %r11
-	 *   stuck_here:
-	 *
-	 * would never get past 'stuck_here'.
-	 */
-	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
-	jnz	swapgs_restore_regs_and_return_to_usermode
-
-	/* nothing to check for RSP */
-
-	cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
-	jne	swapgs_restore_regs_and_return_to_usermode
-
-	/*
-	 * We win! This label is here just for ease of understanding
+	 * This label is here just for ease of understanding
 	 * perf profiles. Nothing jumps here.
 	 */
 syscall_return_via_sysret:
-	/* rcx and r11 are already restored (see code above) */
-	POP_REGS pop_rdi=0 skip_r11rcx=1
+	POP_REGS pop_rdi=0
 
 	/*
 	 * Now all regs are restored except RSP and RDI.
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 5b85987a5e97..efd50437c311 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -126,7 +126,7 @@ static inline int syscall_get_arch(struct task_struct *task)
 		? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
 }
 
-void do_syscall_64(struct pt_regs *regs, int nr);
+int do_syscall_64(struct pt_regs *regs, int nr);
 void do_int80_syscall_32(struct pt_regs *regs);
 long do_fast_syscall_32(struct pt_regs *regs);
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (48 preceding siblings ...)
  2021-11-26 10:12 ` [PATCH V6 49/49] x86/syscall/64: Move the checking for sysret to C code Lai Jiangshan
@ 2021-11-27 17:46 ` Damian Tometzki
  2021-12-03  9:31 ` Lai Jiangshan
  50 siblings, 0 replies; 58+ messages in thread
From: Damian Tometzki @ 2021-11-27 17:46 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, x86, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, H. Peter Anvin,
	Joerg Roedel

Hello Lai,

the patches in my point of view looks good. My qemue system boots with
this patches. From my side: 

reviewed-by: damian.tomezki <dtometzki@fedoraproject.org>

best regards
Damian


On Fri, 26. Nov 18:11, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Changed from V5:
> 	Fix the code order of FENCE_SWAPGS_KERNEL_ENTRY in patch1 and
> 	change the new corresponding C entry code to match the asm code.
> 
> 	Squash the patch of removing stack-protector from traps.c into
> 	a later patch that uses C entry code for #DB and #MCE
> 
> 	Kill .Lgs_change and use the new asm_load_gs_index_gs_change in
> 	_ASM_EXTABLE
> 
> 	s/ETNRY/ENTRY/g for DEFINE_IDTENTRY_IST_ENTRY macros
> ----
> 
> Many ASM code in entry_64.S can be rewritten in C if they can be written
> to be non-instrumentable and are called in the right order regarding to
> whether CR3/gsbase is changed to kernel CR3/gsbase.
> 
> The patchset covert some of them to C code.
> 
> The patch 23 converts the error_entry() to C code. And patch 1-23
> are fixes and preparation for it.
> 
> The patches 24-26 convert entry_INT80_compat and do cleanup.
> 
> The patches 27-45 convert the IST entry code to C code.  Many of them
> are preparation for the actual conversion.
> 
> The patches 46-48 do cleanup.
> 
> The patch 49 converts a small part of ASM code of syscall to C code which
> does the checking for whether it can use sysret to return to userspace.
> 
> Some other paths can be possible to be in C code, for example: the
> error exit, the syscall entry/exit.  The PTI handling for them can
> be in C code.  But it would required the pt_regs to be copied/pushed
> to the entry stack which means the C code would not be efficient.
> 
> When converting ASM to C, the most effort is to make them the same.
> Almost no creative was involved.  The code are kept as the same as ASM
> as possible and no functional change intended unless my misunderstanding
> in the ASM code was involved.  The functions called by the C entry code
> are checked to be ensured noinstr or __always_inline.  Some of them have
> more than one definitions and require some more cares from reviewers.
> The comments in the ASM are also copied in the right place in the C code.
> 
> Changed from V4:
> 	Move FENCE_SWAPGS_KERNEL_ENTRY up in the patch1. And change the
> 	corresponding C code in later patches to keep coherence.
> 
> 	Jmp to xenpv_restore_regs_and_return_to_usermode in
> 	swapgs_restore_regs_and_return_to_usermode instead of calling
> 	it everywhere.
> 
> 	Add Miguel Ojeda's Reviewed-by.
> 
> Changed from V3:
> 	Add a "Reviewed-by" for the xenpv fix
> 	Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> 	Change __attribute((__section__(section))) to __section(section)
> 
> 	Move a part of ist_paranoid_exit() as a new ist_restore_gsbase()
> 
> 	Add a new commit (patch 32) to change the ASM RESTORE_CR3, the
> 		corresponding C version ist_restore_cr3() is changed too.
> 
> Changed from V2:
> 	Fix two places with missed FENCE_SWAPGS_KERNEL_ENTRY.
> 
> 	Fix swapgs_restore_regs_and_return_to_usermode for XENPV.
> 
> 	Updates the C entry_error()/parnoid_entry() to use
> 		fence_swapgs_kernel_entry when with user gsbase
> 		in kernel CR3.
> 
> 	Simplify removing stack-protector in MAKEFILE.
> 
> 	Squash commits about removing stack-protector in MAKEFILE.
> 
> 	In V2 the C entry_error() checks xenpv first and uses natvie_swapgs
> 		but ASM entry_error() uses pv-aware SWAPGS.  In V3, the
> 		commit is split into 3 commit, so the conversion has no
> 		semantic change.
> 
> 	Move cld to the start of idtentry.
> 
> 	Use idtentry macro for entry_INT80_compat and remove the old one.
> 
> 	Add cleanup for PTI_USER_PGTABLE_BIT when it is moved to header
> 	file.
> 
> 	Remove pv-aware SWAPGS.
> 
> Changed from V1:
> 	Add a fix as the patch1.  Found by trying to applied Peterz's
> 		suggestion in patch11.
> 	The whole entry_error() is converted to C instead of partial.
> 	The whole parnoid_entry() is converted to C instead of partial.
> 	The asm code of "parnoid_entry() cfunc() parnoid_exit()" are
> 		converted to C as suggested by Peterz.
> 	Add entry64.c rather than move traps.c to arch/x86/entry/
> 	The order of some commits is changed.
> 	Remove two cleanups
> 
> [V1]: https://lore.kernel.org/all/20210831175025.27570-1-jiangshanlai@gmail.com/
> [V2]: https://lore.kernel.org/lkml/20210926150838.197719-1-jiangshanlai@gmail.com/
> [V3]: https://lore.kernel.org/lkml/20211014031413.14471-1-jiangshanlai@gmail.com/
> [V4]: https://lore.kernel.org/lkml/20211026141420.17138-1-jiangshanlai@gmail.com/
> [V5]: https://lore.kernel.org/lkml/20211110115736.3776-1-jiangshanlai@gmail.com/
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> 
> Lai Jiangshan (49):
>   x86/entry: Add fence for kernel entry swapgs in paranoid_entry()
>   x86/entry: Use the correct fence macro after swapgs in kernel CR3
>   x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
>   x86/entry: Use swapgs and native_iret directly in
>     swapgs_restore_regs_and_return_to_usermode
>   compiler_types.h: Add __noinstr_section() for noinstr
>   x86/entry: Introduce __entry_text for entry code written in C
>   x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h
>   x86: Remove unused kernel_to_user_p4dp() and user_to_kernel_p4dp()
>   x86: Replace PTI_PGTABLE_SWITCH_BIT with PTI_USER_PGTABLE_BIT
>   x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline
>   x86/traps: Move the declaration of native_irq_return_iret into proto.h
>   x86/entry: Add arch/x86/entry/entry64.c for C entry code
>   x86/entry: Expose the address of .Lgs_change to entry64.c
>   x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as
>     switch_to_kernel_cr3()
>   x86/traps: Add fence_swapgs_{user,kernel}_entry()
>   x86/entry: Add C user_entry_swapgs_and_fence()
>   x86/traps: Move pt_regs only in fixup_bad_iret()
>   x86/entry: Switch the stack after error_entry() returns
>   x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
>   x86/entry: Move cld to the start of idtentry
>   x86/entry: Don't call error_entry for XENPV
>   x86/entry: Convert SWAPGS to swapgs in error_entry()
>   x86/entry: Implement the whole error_entry() as C code
>   x86/entry: Use idtentry macro for entry_INT80_compat
>   x86/entry: Convert SWAPGS to swapgs in entry_SYSENTER_compat()
>   x86: Remove the definition of SWAPGS
>   x86/entry: Make paranoid_exit() callable
>   x86/entry: Call paranoid_exit() in asm_exc_nmi()
>   x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry
>   x86/entry: Add the C version ist_switch_to_kernel_cr3()
>   x86/entry: Skip CR3 write when the saved CR3 is kernel CR3 in
>     RESTORE_CR3
>   x86/entry: Add the C version ist_restore_cr3()
>   x86/entry: Add the C version get_percpu_base()
>   x86/entry: Add the C version ist_switch_to_kernel_gsbase()
>   x86/entry: Implement the C version ist_paranoid_entry()
>   x86/entry: Implement the C version ist_paranoid_exit()
>   x86/entry: Add a C macro to define the function body for IST in
>     .entry.text
>   x86/debug, mce: Use C entry code
>   x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up
>   x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi
>   x86/nmi: Use C entry code
>   x86/entry: Add a C macro to define the function body for IST in
>     .entry.text with an error code
>   x86/doublefault: Use C entry code
>   x86/sev: Add and use ist_vc_switch_off_ist()
>   x86/sev: Use C entry code
>   x86/entry: Remove ASM function paranoid_entry() and paranoid_exit()
>   x86/entry: Remove the unused ASM macros
>   x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS
>   x86/syscall/64: Move the checking for sysret to C code
> 
>  arch/x86/entry/Makefile                |   3 +-
>  arch/x86/entry/calling.h               | 142 +-------
>  arch/x86/entry/common.c                |  73 +++-
>  arch/x86/entry/entry64.c               | 348 +++++++++++++++++++
>  arch/x86/entry/entry_64.S              | 448 ++++---------------------
>  arch/x86/entry/entry_64_compat.S       | 104 +-----
>  arch/x86/include/asm/idtentry.h        | 111 +++++-
>  arch/x86/include/asm/irqflags.h        |   8 -
>  arch/x86/include/asm/pgtable.h         |  23 +-
>  arch/x86/include/asm/processor-flags.h |  15 +
>  arch/x86/include/asm/proto.h           |   5 +-
>  arch/x86/include/asm/special_insns.h   |   4 +-
>  arch/x86/include/asm/syscall.h         |   2 +-
>  arch/x86/include/asm/traps.h           |   6 +-
>  arch/x86/kernel/Makefile               |   3 +
>  arch/x86/kernel/cpu/mce/Makefile       |   3 +
>  arch/x86/kernel/nmi.c                  |   2 +-
>  arch/x86/kernel/traps.c                |  33 +-
>  arch/x86/xen/xen-asm.S                 |  20 ++
>  include/linux/compiler_types.h         |   8 +-
>  20 files changed, 677 insertions(+), 684 deletions(-)
>  create mode 100644 arch/x86/entry/entry64.c
> 
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code
  2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
                   ` (49 preceding siblings ...)
  2021-11-27 17:46 ` [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into " Damian Tometzki
@ 2021-12-03  9:31 ` Lai Jiangshan
  2021-12-03  9:39   ` Borislav Petkov
  50 siblings, 1 reply; 58+ messages in thread
From: Lai Jiangshan @ 2021-12-03  9:31 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel, x86, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	H. Peter Anvin, Joerg Roedel



On 2021/11/26 18:11, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Changed from V5:
> 	Fix the code order of FENCE_SWAPGS_KERNEL_ENTRY in patch1 and
> 	change the new corresponding C entry code to match the asm code.
> 
> 	Squash the patch of removing stack-protector from traps.c into
> 	a later patch that uses C entry code for #DB and #MCE
> 
> 	Kill .Lgs_change and use the new asm_load_gs_index_gs_change in
> 	_ASM_EXTABLE
> 
> 	s/ETNRY/ENTRY/g for DEFINE_IDTENTRY_IST_ENTRY macros
> ----


Ping.

Thanks
Lai

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code
  2021-12-03  9:31 ` Lai Jiangshan
@ 2021-12-03  9:39   ` Borislav Petkov
  2021-12-03 10:10     ` Lai Jiangshan
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2021-12-03  9:39 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, H. Peter Anvin, Joerg Roedel

On Fri, Dec 03, 2021 at 05:31:11PM +0800, Lai Jiangshan wrote:
> Ping.

Can you explain to me what's with all the pinging?

Does your patchset contain anything urgent that needs immediate review
and handling or is it something which is a nice idea but needs to be
reviewed very carefully because it is asm entry code which is always a
pain and careful review cannot be done when rushing people?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code
  2021-12-03  9:39   ` Borislav Petkov
@ 2021-12-03 10:10     ` Lai Jiangshan
  2021-12-03 10:18       ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Lai Jiangshan @ 2021-12-03 10:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Lai Jiangshan, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, H. Peter Anvin, Joerg Roedel



On 2021/12/3 17:39, Borislav Petkov wrote:
> On Fri, Dec 03, 2021 at 05:31:11PM +0800, Lai Jiangshan wrote:
>> Ping.
> 
> Can you explain to me what's with all the pinging?
> 
> Does your patchset contain anything urgent that needs immediate review
> and handling or is it something which is a nice idea but needs to be
> reviewed very carefully because it is asm entry code which is always a
> pain and careful review cannot be done when rushing people?
> 

Hello

It is not urgent nor it is something should be put in cold cellar.
Please consider queuing the first three patches at least.

It is cold for a week, I think a ping is proper than a resending.

The asm entry code is always a pain and this patchset gives a start in
future with reduced asm code and pain because some future changes might
be redirected from asm to the C hopefully.

Thanks
Lai

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code
  2021-12-03 10:10     ` Lai Jiangshan
@ 2021-12-03 10:18       ` Borislav Petkov
  0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2021-12-03 10:18 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, H. Peter Anvin, Joerg Roedel

On Fri, Dec 03, 2021 at 06:10:47PM +0800, Lai Jiangshan wrote:
> It is not urgent nor it is something should be put in cold cellar.
> Please consider queuing the first three patches at least.

So there are fixes in there, which should go now. It's what I thought
too when looking at those and was about to suggest to send them
separately but that's fine - I can pick them out.

> It is cold for a week, I think a ping is proper than a resending.

With such a huge patchset I don't think you need to ping or resend every
week but only after people have looked at it at least somewhat. But I'm
sure you can imagine people are busy as hell so looking at that takes
time so you'd need to be patient.

It might be even helpful if you could split it into more palatable
portions of maybe 10-ish patches each, if possible, and then send the
first portion, wait for review and only send the second portion after
the first has been applied, etc.

That would make life easier for everyone involved.

> The asm entry code is always a pain and this patchset gives a start
> in future with reduced asm code and pain because some future changes
> might be redirected from asm to the C hopefully.

Yes, I think we all agree on that.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [tip: x86/urgent] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
  2021-11-26 10:11 ` [PATCH V6 03/49] x86/xen: Add xenpv_restore_regs_and_return_to_usermode() Lai Jiangshan
@ 2021-12-04 11:45   ` tip-bot2 for Lai Jiangshan
  0 siblings, 0 replies; 58+ messages in thread
From: tip-bot2 for Lai Jiangshan @ 2021-12-04 11:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Lai Jiangshan, Borislav Petkov, Boris Ostrovsky, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     5c8f6a2e316efebb3ba93d8c1af258155dcf5632
Gitweb:        https://git.kernel.org/tip/5c8f6a2e316efebb3ba93d8c1af258155dcf5632
Author:        Lai Jiangshan <laijs@linux.alibaba.com>
AuthorDate:    Fri, 26 Nov 2021 18:11:23 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 03 Dec 2021 19:21:15 +01:00

x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

In the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
trampoline stack. But XEN pv doesn't use trampoline stack, so
PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.

In that case, source and destination stacks are identical, which means
that reusing swapgs_restore_regs_and_return_to_usermode() in XEN pv
would cause %rsp to move up to the top of the kernel stack and leave the
IRET frame below %rsp.

This is dangerous as it can be corrupted if #NMI / #MC hit as either of
these events occurring in the middle of the stack pushing would clobber
data on the (original) stack.

And, with  XEN pv, swapgs_restore_regs_and_return_to_usermode() pushing
the IRET frame on to the original address is useless and error-prone
when there is any future attempt to modify the code.

 [ bp: Massage commit message. ]

Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/20211126101209.8613-4-jiangshanlai@gmail.com
---
 arch/x86/entry/entry_64.S |  4 ++++
 arch/x86/xen/xen-asm.S    | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f9e1c06..97b1f84 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -574,6 +574,10 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 	ud2
 1:
 #endif
+#ifdef CONFIG_XEN_PV
+	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
+#endif
+
 	POP_REGS pop_rdi=0
 
 	/*
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 220dd96..444d824 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,6 +20,7 @@
 
 #include <linux/init.h>
 #include <linux/linkage.h>
+#include <../entry/calling.h>
 
 .pushsection .noinstr.text, "ax"
 /*
@@ -193,6 +194,25 @@ SYM_CODE_START(xen_iret)
 SYM_CODE_END(xen_iret)
 
 /*
+ * XEN pv doesn't use trampoline stack, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is
+ * also the kernel stack.  Reusing swapgs_restore_regs_and_return_to_usermode()
+ * in XEN pv would cause %rsp to move up to the top of the kernel stack and
+ * leave the IRET frame below %rsp, which is dangerous to be corrupted if #NMI
+ * interrupts. And swapgs_restore_regs_and_return_to_usermode() pushing the IRET
+ * frame at the same address is useless.
+ */
+SYM_CODE_START(xenpv_restore_regs_and_return_to_usermode)
+	UNWIND_HINT_REGS
+	POP_REGS
+
+	/* stackleak_erase() can work safely on the kernel stack. */
+	STACKLEAK_ERASE_NOCLOBBER
+
+	addq	$8, %rsp	/* skip regs->orig_ax */
+	jmp xen_iret
+SYM_CODE_END(xenpv_restore_regs_and_return_to_usermode)
+
+/*
  * Xen handles syscall callbacks much like ordinary exceptions, which
  * means we have:
  * - kernel gs

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [tip: x86/urgent] x86/entry: Use the correct fence macro after swapgs in kernel CR3
  2021-11-26 10:11 ` [PATCH V6 02/49] x86/entry: Use the correct fence macro after swapgs in kernel CR3 Lai Jiangshan
@ 2021-12-04 11:45   ` tip-bot2 for Lai Jiangshan
  0 siblings, 0 replies; 58+ messages in thread
From: tip-bot2 for Lai Jiangshan @ 2021-12-04 11:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Lai Jiangshan, Borislav Petkov, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     1367afaa2ee90d1c956dfc224e199fcb3ff3f8cc
Gitweb:        https://git.kernel.org/tip/1367afaa2ee90d1c956dfc224e199fcb3ff3f8cc
Author:        Lai Jiangshan <laijs@linux.alibaba.com>
AuthorDate:    Fri, 26 Nov 2021 18:11:22 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 03 Dec 2021 19:13:53 +01:00

x86/entry: Use the correct fence macro after swapgs in kernel CR3

The commit

  c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching")

removed a CR3 write in the faulting path of load_gs_index().

But the path's FENCE_SWAPGS_USER_ENTRY has no fence operation if PTI is
enabled, see spectre_v1_select_mitigation().

Rather, it depended on the serializing CR3 write of SWITCH_TO_KERNEL_CR3
and since it got removed, add a FENCE_SWAPGS_KERNEL_ENTRY call to make
sure speculation is blocked.

 [ bp: Massage commit message and comment. ]

Fixes: c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211126101209.8613-3-jiangshanlai@gmail.com
---
 arch/x86/entry/entry_64.S | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f1a8b5b..f9e1c06 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -987,11 +987,6 @@ SYM_CODE_START_LOCAL(error_entry)
 	pushq	%r12
 	ret
 
-.Lerror_entry_done_lfence:
-	FENCE_SWAPGS_KERNEL_ENTRY
-.Lerror_entry_done:
-	ret
-
 	/*
 	 * There are two places in the kernel that can potentially fault with
 	 * usergs. Handle them here.  B stepping K8s sometimes report a
@@ -1014,8 +1009,14 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * .Lgs_change's error handler with kernel gsbase.
 	 */
 	SWAPGS
-	FENCE_SWAPGS_USER_ENTRY
-	jmp .Lerror_entry_done
+
+	/*
+	 * Issue an LFENCE to prevent GS speculation, regardless of whether it is a
+	 * kernel or user gsbase.
+	 */
+.Lerror_entry_done_lfence:
+	FENCE_SWAPGS_KERNEL_ENTRY
+	ret
 
 .Lbstep_iret:
 	/* Fix truncated RIP */

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [tip: x86/urgent] x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
  2021-11-26 10:11 ` [PATCH V6 01/49] x86/entry: Add fence for kernel entry swapgs in paranoid_entry() Lai Jiangshan
@ 2021-12-04 11:45   ` tip-bot2 for Lai Jiangshan
  0 siblings, 0 replies; 58+ messages in thread
From: tip-bot2 for Lai Jiangshan @ 2021-12-04 11:45 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Lai Jiangshan, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     c07e45553da1808aa802e9f0ffa8108cfeaf7a17
Gitweb:        https://git.kernel.org/tip/c07e45553da1808aa802e9f0ffa8108cfeaf7a17
Author:        Lai Jiangshan <laijs@linux.alibaba.com>
AuthorDate:    Fri, 26 Nov 2021 18:11:21 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 03 Dec 2021 18:55:47 +01:00

x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()

Commit

  18ec54fdd6d18 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")

added FENCE_SWAPGS_{KERNEL|USER}_ENTRY for conditional SWAPGS. In
paranoid_entry(), it uses only FENCE_SWAPGS_KERNEL_ENTRY for both
branches. This is because the fence is required for both cases since the
CR3 write is conditional even when PTI is enabled.

But

  96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")

changed the order of SWAPGS and the CR3 write. And it missed the needed
FENCE_SWAPGS_KERNEL_ENTRY for the user gsbase case.

Add it back by changing the branches so that FENCE_SWAPGS_KERNEL_ENTRY
can cover both branches.

  [ bp: Massage, fix typos, remove obsolete comment while at it. ]

Fixes: 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211126101209.8613-2-jiangshanlai@gmail.com
---
 arch/x86/entry/entry_64.S | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e38a4cf..f1a8b5b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -890,6 +890,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 .Lparanoid_entry_checkgs:
 	/* EBX = 1 -> kernel GSBASE active, no restore required */
 	movl	$1, %ebx
+
 	/*
 	 * The kernel-enforced convention is a negative GSBASE indicates
 	 * a kernel value. No SWAPGS needed on entry and exit.
@@ -897,21 +898,14 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	movl	$MSR_GS_BASE, %ecx
 	rdmsr
 	testl	%edx, %edx
-	jns	.Lparanoid_entry_swapgs
-	ret
+	js	.Lparanoid_kernel_gsbase
 
-.Lparanoid_entry_swapgs:
+	/* EBX = 0 -> SWAPGS required on exit */
+	xorl	%ebx, %ebx
 	swapgs
+.Lparanoid_kernel_gsbase:
 
-	/*
-	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
-	 * unconditional CR3 write, even in the PTI case.  So do an lfence
-	 * to prevent GS speculation, regardless of whether PTI is enabled.
-	 */
 	FENCE_SWAPGS_KERNEL_ENTRY
-
-	/* EBX = 0 -> SWAPGS required on exit */
-	xorl	%ebx, %ebx
 	ret
 SYM_CODE_END(paranoid_entry)
 

^ permalink raw reply related	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2021-12-04 11:45 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26 10:11 [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into C code Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 01/49] x86/entry: Add fence for kernel entry swapgs in paranoid_entry() Lai Jiangshan
2021-12-04 11:45   ` [tip: x86/urgent] x86/entry: Add a fence for kernel entry SWAPGS " tip-bot2 for Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 02/49] x86/entry: Use the correct fence macro after swapgs in kernel CR3 Lai Jiangshan
2021-12-04 11:45   ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 03/49] x86/xen: Add xenpv_restore_regs_and_return_to_usermode() Lai Jiangshan
2021-12-04 11:45   ` [tip: x86/urgent] " tip-bot2 for Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 04/49] x86/entry: Use swapgs and native_iret directly in swapgs_restore_regs_and_return_to_usermode Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 05/49] compiler_types.h: Add __noinstr_section() for noinstr Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 06/49] x86/entry: Introduce __entry_text for entry code written in C Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 07/49] x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 08/49] x86: Remove unused kernel_to_user_p4dp() and user_to_kernel_p4dp() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 09/49] x86: Replace PTI_PGTABLE_SWITCH_BIT with PTI_USER_PGTABLE_BIT Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 10/49] x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 11/49] x86/traps: Move the declaration of native_irq_return_iret into proto.h Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 12/49] x86/entry: Add arch/x86/entry/entry64.c for C entry code Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 13/49] x86/entry: Expose the address of .Lgs_change to entry64.c Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 14/49] x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as switch_to_kernel_cr3() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 15/49] x86/traps: Add fence_swapgs_{user,kernel}_entry() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 16/49] x86/entry: Add C user_entry_swapgs_and_fence() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 17/49] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 18/49] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 19/49] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 20/49] x86/entry: Move cld to the start of idtentry Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 21/49] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 22/49] x86/entry: Convert SWAPGS to swapgs in error_entry() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 23/49] x86/entry: Implement the whole error_entry() as C code Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 24/49] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 25/49] x86/entry: Convert SWAPGS to swapgs in entry_SYSENTER_compat() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 26/49] x86: Remove the definition of SWAPGS Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 27/49] x86/entry: Make paranoid_exit() callable Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 28/49] x86/entry: Call paranoid_exit() in asm_exc_nmi() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 29/49] x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 30/49] x86/entry: Add the C version ist_switch_to_kernel_cr3() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 31/49] x86/entry: Skip CR3 write when the saved CR3 is kernel CR3 in RESTORE_CR3 Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 32/49] x86/entry: Add the C version ist_restore_cr3() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 33/49] x86/entry: Add the C version get_percpu_base() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 34/49] x86/entry: Add the C version ist_switch_to_kernel_gsbase() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 35/49] x86/entry: Implement the C version ist_paranoid_entry() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 36/49] x86/entry: Implement the C version ist_paranoid_exit() Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 37/49] x86/entry: Add a C macro to define the function body for IST in .entry.text Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 38/49] x86/debug, mce: Use C entry code Lai Jiangshan
2021-11-26 10:11 ` [PATCH V6 39/49] x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 40/49] x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 41/49] x86/nmi: Use C entry code Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 42/49] x86/entry: Add a C macro to define the function body for IST in .entry.text with an error code Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 43/49] x86/doublefault: Use C entry code Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 44/49] x86/sev: Add and use ist_vc_switch_off_ist() Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 45/49] x86/sev: Use C entry code Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 46/49] x86/entry: Remove ASM function paranoid_entry() and paranoid_exit() Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 47/49] x86/entry: Remove the unused ASM macros Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 48/49] x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS Lai Jiangshan
2021-11-26 10:12 ` [PATCH V6 49/49] x86/syscall/64: Move the checking for sysret to C code Lai Jiangshan
2021-11-27 17:46 ` [PATCH V6 00/49] x86/entry/64: Convert a bunch of ASM entry code into " Damian Tometzki
2021-12-03  9:31 ` Lai Jiangshan
2021-12-03  9:39   ` Borislav Petkov
2021-12-03 10:10     ` Lai Jiangshan
2021-12-03 10:18       ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).