linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32
@ 2020-10-01 20:58 Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 1/9] x86: events: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

TI_IA32 and TIF_X32 are not strictly necessary and they are only set at
task creation time, which doesn't fit with processes that transition
between 64/32 bits.  In addition, other reasons to drop these flags are
that we are running out of TI flags for x86 and it is generally a good
idea to reduce architecture specific TI flags, before move the generic
ones to common code.

Many of the ideas for this patchset came from Andy Lutomirski (Thank
you!)

The only difference of v2 from v1 is the addition of the final 3 patches
that solve the last 3 cases of TIF_IA32 and TIF_X32 usage, and actually
remove the TI flags.

Finally, the testing for this patchset was done exercising the code
paths of each case where the flags were used with x32, ia32 and x64
applications. Finally, x86 selftests showed no regressions.


Gabriel Krisman Bertazi (9):
  x86: events: Avoid TIF_IA32 when checking 64bit mode
  x86: Simplify compat syscall userspace allocation
  x86: oprofile: Avoid TIF_IA32 when checking 64bit mode
  x86: elf: Use e_machine to choose DLINFO in compat
  x86: elf: Use e_machine to select start_thread for x32
  x86: elf: Use e_machine to select setup_additional_pages for x32
  x86: Use current USER_CS to setup correct context on vmx entry
  x86: Convert mmu context ia32_compat into a proper flags field
  x86: Reclaim TIF_IA32 and TIF_X32

 arch/x86/entry/vdso/vma.c             | 21 ++++++++++-------
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/events/core.c                |  2 +-
 arch/x86/events/intel/ds.c            |  2 +-
 arch/x86/events/intel/lbr.c           |  2 +-
 arch/x86/include/asm/compat.h         | 15 ++++++------
 arch/x86/include/asm/elf.h            | 24 ++++++++++++++-----
 arch/x86/include/asm/mmu.h            |  6 +++--
 arch/x86/include/asm/mmu_context.h    |  2 +-
 arch/x86/include/asm/thread_info.h    |  4 ----
 arch/x86/kernel/perf_regs.c           |  2 +-
 arch/x86/kernel/process_64.c          | 34 ++++++++++++++-------------
 arch/x86/kvm/vmx/vmx.c                |  2 +-
 arch/x86/oprofile/backtrace.c         |  2 +-
 14 files changed, 67 insertions(+), 53 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/9] x86: events: Avoid TIF_IA32 when checking 64bit mode
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 2/9] x86: Simplify compat syscall userspace allocation Gabriel Krisman Bertazi
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

In preparation to remove TIF_IA32, stop using it in perf events code.

Tested by running perf on 32-bit, 64-bit and x32 applications.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/events/core.c      | 2 +-
 arch/x86/events/intel/ds.c  | 2 +-
 arch/x86/events/intel/lbr.c | 2 +-
 arch/x86/kernel/perf_regs.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1cbf57dc2ac8..4fe82d9d959b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2499,7 +2499,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
 	struct stack_frame_ia32 frame;
 	const struct stack_frame_ia32 __user *fp;
 
-	if (!test_thread_flag(TIF_IA32))
+	if (user_64bit_mode(regs))
 		return 0;
 
 	cs_base = get_segment_base(regs->cs);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 86848c57b55e..94bd0d3acd15 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1261,7 +1261,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
 		old_to = to;
 
 #ifdef CONFIG_X86_64
-		is_64bit = kernel_ip(to) || !test_thread_flag(TIF_IA32);
+		is_64bit = kernel_ip(to) || any_64bit_mode(regs);
 #endif
 		insn_init(&insn, kaddr, size, is_64bit);
 		insn_get_length(&insn);
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 8961653c5dd2..1aadb253d296 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1221,7 +1221,7 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
 	 * on 64-bit systems running 32-bit apps
 	 */
 #ifdef CONFIG_X86_64
-	is64 = kernel_ip((unsigned long)addr) || !test_thread_flag(TIF_IA32);
+	is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
 #endif
 	insn_init(&insn, addr, bytes_read, is64);
 	insn_get_opcode(&insn);
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index bb7e1132290b..9332c49a64a8 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -123,7 +123,7 @@ int perf_reg_validate(u64 mask)
 
 u64 perf_reg_abi(struct task_struct *task)
 {
-	if (test_tsk_thread_flag(task, TIF_IA32))
+	if (!user_64bit_mode(task_pt_regs(task)))
 		return PERF_SAMPLE_REGS_ABI_32;
 	else
 		return PERF_SAMPLE_REGS_ABI_64;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/9] x86: Simplify compat syscall userspace allocation
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 1/9] x86: events: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 3/9] x86: oprofile: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

When allocating user memory space for a compat system call, don't
consider whether the originating code is IA32 or X32, just allocate from
a safe region for both, beyond the redzone.  This should be safe for
IA32, and has the benefit of avoiding TIF_IA32, which we want to drop.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/include/asm/compat.h | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index d4edf281fff4..a4b5126dff4e 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -179,14 +179,13 @@ typedef struct user_regs_struct compat_elf_gregset_t;
 
 static inline void __user *arch_compat_alloc_user_space(long len)
 {
-	compat_uptr_t sp;
-
-	if (test_thread_flag(TIF_IA32)) {
-		sp = task_pt_regs(current)->sp;
-	} else {
-		/* -128 for the x32 ABI redzone */
-		sp = task_pt_regs(current)->sp - 128;
-	}
+	compat_uptr_t sp = task_pt_regs(current)->sp;
+
+	/*
+	 * -128 for the x32 ABI redzone.  For IA32, it is not strictly
+	 * necessary, but not harmful.
+	 */
+	sp -= 128;
 
 	return (void __user *)round_down(sp - len, 16);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/9] x86: oprofile: Avoid TIF_IA32 when checking 64bit mode
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 1/9] x86: events: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 2/9] x86: Simplify compat syscall userspace allocation Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 4/9] x86: elf: Use e_machine to choose DLINFO in compat Gabriel Krisman Bertazi
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

In preparation to remove TIF_IA32, stop using it in oprofile code.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/oprofile/backtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index a2488b6e27d6..1d8391fcca68 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -49,7 +49,7 @@ x86_backtrace_32(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame_ia32 *head;
 
 	/* User process is IA32 */
-	if (!current || !test_thread_flag(TIF_IA32))
+	if (!current || user_64bit_mode(regs))
 		return 0;
 
 	head = (struct stack_frame_ia32 *) regs->bp;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/9] x86: elf: Use e_machine to choose DLINFO in compat
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2020-10-01 20:58 ` [PATCH v2 3/9] x86: oprofile: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 21:46   ` Andy Lutomirski
  2020-10-01 20:58 ` [PATCH v2 5/9] x86: elf: Use e_machine to select start_thread for x32 Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

Since TIF_X32 is going away, avoid using it to find the ELF type on
ARCH_DLINFO.

According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
differentiate a x32 object from a IA32 object when loading ARCH_DLINFO
in compat mode.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/include/asm/elf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index b9a5d488f1a5..9220efc65d78 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -361,7 +361,7 @@ do {									\
 #define AT_SYSINFO		32
 
 #define COMPAT_ARCH_DLINFO						\
-if (test_thread_flag(TIF_X32))						\
+if (exec->e_machine == EM_X86_64)					\
 	ARCH_DLINFO_X32;						\
 else									\
 	ARCH_DLINFO_IA32
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/9] x86: elf: Use e_machine to select start_thread for x32
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2020-10-01 20:58 ` [PATCH v2 4/9] x86: elf: Use e_machine to choose DLINFO in compat Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 21:49   ` Andy Lutomirski
  2020-10-01 20:58 ` [PATCH v2 6/9] x86: elf: Use e_machine to select setup_additional_pages " Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

Since TIF_X32 is going away, avoid using it to find the ELF type on
compat_start_thread

According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
differentiate a x32 object from a IA32 object when executing
start_thread in compat mode.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/include/asm/elf.h   | 11 +++++++++--
 arch/x86/kernel/process_64.c | 11 +++++++----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 9220efc65d78..33c1c9be2e07 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -186,8 +186,15 @@ static inline void elf_common_init(struct thread_struct *t,
 #define	COMPAT_ELF_PLAT_INIT(regs, load_addr)		\
 	elf_common_init(&current->thread, regs, __USER_DS)
 
-void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp);
-#define compat_start_thread compat_start_thread
+void compat_start_thread_ia32(struct pt_regs *regs, u32 new_ip, u32 new_sp);
+void compat_start_thread_x32(struct pt_regs *regs, u32 new_ip, u32 new_sp);
+#define compat_start_thread(regs, new_ip, new_sp)			\
+do {									\
+	if (elf_ex->e_machine == EM_X86_64)				\
+		compat_start_thread_x32(regs, new_ip, new_sp);		\
+	else								\
+		compat_start_thread_ia32(regs, new_ip, new_sp);		\
+} while (0)
 
 void set_personality_ia32(bool);
 #define COMPAT_SET_PERSONALITY(ex)			\
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9afefe325acb..56e882c339e6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -511,12 +511,15 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 EXPORT_SYMBOL_GPL(start_thread);
 
 #ifdef CONFIG_COMPAT
-void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp)
+void compat_start_thread_ia32(struct pt_regs *regs, u32 new_ip, u32 new_sp)
 {
 	start_thread_common(regs, new_ip, new_sp,
-			    test_thread_flag(TIF_X32)
-			    ? __USER_CS : __USER32_CS,
-			    __USER_DS, __USER_DS);
+			    __USER32_CS, __USER_DS, __USER_DS);
+}
+void compat_start_thread_x32(struct pt_regs *regs, u32 new_ip, u32 new_sp)
+{
+	start_thread_common(regs, new_ip, new_sp,
+			    __USER_CS, __USER_DS, __USER_DS);
 }
 #endif
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/9] x86: elf: Use e_machine to select setup_additional_pages for x32
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2020-10-01 20:58 ` [PATCH v2 5/9] x86: elf: Use e_machine to select start_thread for x32 Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 21:50   ` Andy Lutomirski
  2020-10-01 20:58 ` [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

Since TIF_X32 is going away, avoid using it to find the ELF type when
choosing which additional pages to set up.

According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
differentiate a x32 object from a IA32 object when executing
start_thread in compat mode.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/entry/vdso/vma.c  | 21 ++++++++++++---------
 arch/x86/include/asm/elf.h | 11 ++++++++---
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 9185cb1d13b9..7a3cda8294a3 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -412,22 +412,25 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 }
 
 #ifdef CONFIG_COMPAT
-int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
-				       int uses_interp)
+int compat_arch_setup_additional_pages_ia32(struct linux_binprm *bprm,
+					    int uses_interp)
 {
-#ifdef CONFIG_X86_X32_ABI
-	if (test_thread_flag(TIF_X32)) {
-		if (!vdso64_enabled)
-			return 0;
-		return map_vdso_randomized(&vdso_image_x32);
-	}
-#endif
 #ifdef CONFIG_IA32_EMULATION
 	return load_vdso32();
 #else
 	return 0;
 #endif
 }
+
+int compat_arch_setup_additional_pages_x32(struct linux_binprm *bprm,
+					   int uses_interp)
+{
+#ifdef CONFIG_X86_X32_ABI
+	if (vdso64_enabled)
+		return map_vdso_randomized(&vdso_image_x32);
+#endif
+	return 0;
+}
 #endif
 #else
 int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 33c1c9be2e07..4d91f5b1079f 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -388,9 +388,14 @@ struct linux_binprm;
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
 extern int arch_setup_additional_pages(struct linux_binprm *bprm,
 				       int uses_interp);
-extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
-					      int uses_interp);
-#define compat_arch_setup_additional_pages compat_arch_setup_additional_pages
+extern int compat_arch_setup_additional_pages_ia32(struct linux_binprm *bprm,
+						   int uses_interp);
+extern int compat_arch_setup_additional_pages_x32(struct linux_binprm *bprm,
+						  int uses_interp);
+
+#define compat_arch_setup_additional_pages				\
+	((elf_ex->e_machine == EM_X86_64) ?				\
+	 compat_arch_setup_additional_pages_x32 : compat_arch_setup_additional_pages_ia32)
 
 /* Do not change the values. See get_align_mask() */
 enum align_flags {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
                   ` (5 preceding siblings ...)
  2020-10-01 20:58 ` [PATCH v2 6/9] x86: elf: Use e_machine to select setup_additional_pages " Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 21:52   ` Andy Lutomirski
  2020-10-01 20:58 ` [PATCH v2 8/9] x86: Convert mmu context ia32_compat into a proper flags field Gabriel Krisman Bertazi
  2020-10-01 20:58 ` [PATCH v2 9/9] x86: Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
  8 siblings, 1 reply; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
very specific use in uprobes.  Use the user_64bit_mode helper instead.
This reduces the usage of is_64bit_mm, which is awkward, since it relies
on the personality at load time, which is fine for uprobes, but doesn't
seem fine here.

I tested this by running VMs with 64 and 32 bits payloads from 64/32
programs.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/kvm/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7b2a068f08c1..b5aafd9e5f5d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	savesegment(es, host_state->es_sel);
 
 	gs_base = cpu_kernelmode_gs_base(cpu);
-	if (likely(is_64bit_mm(current->mm))) {
+	if (likely(user_64bit_mode(current_pt_regs()))) {
 		current_save_fsgs();
 		fs_sel = current->thread.fsindex;
 		gs_sel = current->thread.gsindex;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/9] x86: Convert mmu context ia32_compat into a proper flags field
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
                   ` (6 preceding siblings ...)
  2020-10-01 20:58 ` [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  2020-10-01 22:09   ` Andy Lutomirski
  2020-10-01 20:58 ` [PATCH v2 9/9] x86: Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
  8 siblings, 1 reply; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

The ia32_compat attribute is a weird thing.  It mirrors TIF_IA32 and
TIF_X32 and is used only in two very unrelated places: (1) to decide if
the vsyscall page is accessible (2) for uprobes to find whether the
patched instruction is 32 or 64 bit.  In preparation to remove the TI
flags, we want new values for ia32_compat, but given its odd semantics,
I'd rather make it a real flags field that configures these specific
behaviours.  So, set_personality_x64 can ask for the vsyscall page,
which is not available in x32/ia32 and set_personality_ia32 can
configure the uprobe code as needed.

uprobe cannot rely on other methods like user_64bit_mode() to decide how
to patch, so it needs some specific flag like this.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/include/asm/mmu.h            |  6 ++++--
 arch/x86/include/asm/mmu_context.h    |  2 +-
 arch/x86/kernel/process_64.c          | 17 +++++++++++------
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..20abc396dbe0 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -316,7 +316,7 @@ static struct vm_area_struct gate_vma __ro_after_init = {
 struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 {
 #ifdef CONFIG_COMPAT
-	if (!mm || mm->context.ia32_compat)
+	if (!mm || !(mm->context.flags & MM_CONTEXT_GATE_PAGE))
 		return NULL;
 #endif
 	if (vsyscall_mode == NONE)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 9257667d13c5..76ab742a0e39 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -7,6 +7,9 @@
 #include <linux/mutex.h>
 #include <linux/atomic.h>
 
+#define MM_CONTEXT_UPROBE_IA32	1 /* Uprobes on this MM assume 32-bit code */
+#define MM_CONTEXT_GATE_PAGE	2 /* Whether MM has gate page */
+
 /*
  * x86 has arch-specific MMU state beyond what lives in mm_struct.
  */
@@ -33,8 +36,7 @@ typedef struct {
 #endif
 
 #ifdef CONFIG_X86_64
-	/* True if mm supports a task running in 32 bit compatibility mode. */
-	unsigned short ia32_compat;
+	unsigned short flags;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index d98016b83755..054a79157323 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -177,7 +177,7 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
 static inline bool is_64bit_mm(struct mm_struct *mm)
 {
 	return	!IS_ENABLED(CONFIG_IA32_EMULATION) ||
-		!(mm->context.ia32_compat == TIF_IA32);
+		!(mm->context.flags & MM_CONTEXT_UPROBE_IA32);
 }
 #else
 static inline bool is_64bit_mm(struct mm_struct *mm)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 56e882c339e6..3226ceed409c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -650,10 +650,8 @@ void set_personality_64bit(void)
 	/* Pretend that this comes from a 64bit execve */
 	task_pt_regs(current)->orig_ax = __NR_execve;
 	current_thread_info()->status &= ~TS_COMPAT;
-
-	/* Ensure the corresponding mm is not marked. */
 	if (current->mm)
-		current->mm->context.ia32_compat = 0;
+		current->mm->context.flags = MM_CONTEXT_GATE_PAGE;
 
 	/* TBD: overwrites user setup. Should have two bits.
 	   But 64bit processes have always behaved this way,
@@ -668,7 +666,8 @@ static void __set_personality_x32(void)
 	clear_thread_flag(TIF_IA32);
 	set_thread_flag(TIF_X32);
 	if (current->mm)
-		current->mm->context.ia32_compat = TIF_X32;
+		current->mm->context.flags = 0;
+
 	current->personality &= ~READ_IMPLIES_EXEC;
 	/*
 	 * in_32bit_syscall() uses the presence of the x32 syscall bit
@@ -688,8 +687,14 @@ static void __set_personality_ia32(void)
 #ifdef CONFIG_IA32_EMULATION
 	set_thread_flag(TIF_IA32);
 	clear_thread_flag(TIF_X32);
-	if (current->mm)
-		current->mm->context.ia32_compat = TIF_IA32;
+	if (current->mm) {
+		/*
+		 * uprobes applied to this MM need to know this and
+		 * cannot use user_64bit_mode() at that time.
+		 */
+		current->mm->context.flags = MM_CONTEXT_UPROBE_IA32;
+	}
+
 	current->personality |= force_personality32;
 	/* Prepare the first "return" to user space */
 	task_pt_regs(current)->orig_ax = __NR_ia32_execve;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 9/9] x86: Reclaim TIF_IA32 and TIF_X32
  2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
                   ` (7 preceding siblings ...)
  2020-10-01 20:58 ` [PATCH v2 8/9] x86: Convert mmu context ia32_compat into a proper flags field Gabriel Krisman Bertazi
@ 2020-10-01 20:58 ` Gabriel Krisman Bertazi
  8 siblings, 0 replies; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-01 20:58 UTC (permalink / raw)
  To: luto, tglx
  Cc: hch, hpa, bp, rric, peterz, mingo, x86, linux-kernel,
	dave.hansen, Gabriel Krisman Bertazi, kernel

Now that these flags are no longer used, reclaim those TI bits.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/include/asm/thread_info.h | 4 ----
 arch/x86/kernel/process_64.c       | 6 ------
 2 files changed, 10 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..6888aa39c4d6 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -91,7 +91,6 @@ struct thread_info {
 #define TIF_NEED_FPU_LOAD	14	/* load FPU on return to userspace */
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
-#define TIF_IA32		17	/* IA32 compatibility process */
 #define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -101,7 +100,6 @@ struct thread_info {
 #define TIF_LAZY_MMU_UPDATES	27	/* task is updating the mmu lazily */
 #define TIF_SYSCALL_TRACEPOINT	28	/* syscall tracepoint instrumentation */
 #define TIF_ADDR32		29	/* 32-bit address space on 64 bits */
-#define TIF_X32			30	/* 32-bit native x86-64 binary */
 #define TIF_FSCHECK		31	/* Check FS is USER_DS on return */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
@@ -121,7 +119,6 @@ struct thread_info {
 #define _TIF_NEED_FPU_LOAD	(1 << TIF_NEED_FPU_LOAD)
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
-#define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -130,7 +127,6 @@ struct thread_info {
 #define _TIF_LAZY_MMU_UPDATES	(1 << TIF_LAZY_MMU_UPDATES)
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_ADDR32		(1 << TIF_ADDR32)
-#define _TIF_X32		(1 << TIF_X32)
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 
 /* flags to check in __switch_to() */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3226ceed409c..b557312aa9cb 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -644,9 +644,7 @@ void set_personality_64bit(void)
 	/* inherit personality from parent */
 
 	/* Make sure to be in 64bit mode */
-	clear_thread_flag(TIF_IA32);
 	clear_thread_flag(TIF_ADDR32);
-	clear_thread_flag(TIF_X32);
 	/* Pretend that this comes from a 64bit execve */
 	task_pt_regs(current)->orig_ax = __NR_execve;
 	current_thread_info()->status &= ~TS_COMPAT;
@@ -663,8 +661,6 @@ void set_personality_64bit(void)
 static void __set_personality_x32(void)
 {
 #ifdef CONFIG_X86_X32
-	clear_thread_flag(TIF_IA32);
-	set_thread_flag(TIF_X32);
 	if (current->mm)
 		current->mm->context.flags = 0;
 
@@ -685,8 +681,6 @@ static void __set_personality_x32(void)
 static void __set_personality_ia32(void)
 {
 #ifdef CONFIG_IA32_EMULATION
-	set_thread_flag(TIF_IA32);
-	clear_thread_flag(TIF_X32);
 	if (current->mm) {
 		/*
 		 * uprobes applied to this MM need to know this and
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/9] x86: elf: Use e_machine to choose DLINFO in compat
  2020-10-01 20:58 ` [PATCH v2 4/9] x86: elf: Use e_machine to choose DLINFO in compat Gabriel Krisman Bertazi
@ 2020-10-01 21:46   ` Andy Lutomirski
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-01 21:46 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Andrew Lutomirski, Thomas Gleixner, Christoph Hellwig,
	H. Peter Anvin, Borislav Petkov, Robert Richter, Peter Zijlstra,
	Ingo Molnar, X86 ML, LKML, Dave Hansen, kernel

On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Since TIF_X32 is going away, avoid using it to find the ELF type on
> ARCH_DLINFO.
>
> According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
> have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
> differentiate a x32 object from a IA32 object when loading ARCH_DLINFO
> in compat mode.
>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/9] x86: elf: Use e_machine to select start_thread for x32
  2020-10-01 20:58 ` [PATCH v2 5/9] x86: elf: Use e_machine to select start_thread for x32 Gabriel Krisman Bertazi
@ 2020-10-01 21:49   ` Andy Lutomirski
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-01 21:49 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Andrew Lutomirski, Thomas Gleixner, Christoph Hellwig,
	H. Peter Anvin, Borislav Petkov, Robert Richter, Peter Zijlstra,
	Ingo Molnar, X86 ML, LKML, Dave Hansen, kernel

On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Since TIF_X32 is going away, avoid using it to find the ELF type on
> compat_start_thread
>
> According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
> have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
> differentiate a x32 object from a IA32 object when executing
> start_thread in compat mode.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  arch/x86/include/asm/elf.h   | 11 +++++++++--
>  arch/x86/kernel/process_64.c | 11 +++++++----
>  2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 9220efc65d78..33c1c9be2e07 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -186,8 +186,15 @@ static inline void elf_common_init(struct thread_struct *t,
>  #define        COMPAT_ELF_PLAT_INIT(regs, load_addr)           \
>         elf_common_init(&current->thread, regs, __USER_DS)
>
> -void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp);
> -#define compat_start_thread compat_start_thread
> +void compat_start_thread_ia32(struct pt_regs *regs, u32 new_ip, u32 new_sp);
> +void compat_start_thread_x32(struct pt_regs *regs, u32 new_ip, u32 new_sp);
> +#define compat_start_thread(regs, new_ip, new_sp)                      \
> +do {                                                                   \
> +       if (elf_ex->e_machine == EM_X86_64)                             \
> +               compat_start_thread_x32(regs, new_ip, new_sp);          \
> +       else                                                            \
> +               compat_start_thread_ia32(regs, new_ip, new_sp);         \
> +} while (0)

This is evil -- it looks like a real function, but it's not.  Can you
instead add a const struct elf32_hdr *elf_ex parameter to all the
compat_start_thread implementations?  There appear to be only four of
them in the whole kernel.  For patches like this, it should be fine to
do it all as one patch as long as you Cc all the arch maintainers.

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/9] x86: elf: Use e_machine to select setup_additional_pages for x32
  2020-10-01 20:58 ` [PATCH v2 6/9] x86: elf: Use e_machine to select setup_additional_pages " Gabriel Krisman Bertazi
@ 2020-10-01 21:50   ` Andy Lutomirski
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-01 21:50 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Andrew Lutomirski, Thomas Gleixner, Christoph Hellwig,
	H. Peter Anvin, Borislav Petkov, Robert Richter, Peter Zijlstra,
	Ingo Molnar, X86 ML, LKML, Dave Hansen, kernel

On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Since TIF_X32 is going away, avoid using it to find the ELF type when
> choosing which additional pages to set up.
>
> According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
> have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
> differentiate a x32 object from a IA32 object when executing
> start_thread in compat mode.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  arch/x86/entry/vdso/vma.c  | 21 ++++++++++++---------
>  arch/x86/include/asm/elf.h | 11 ++++++++---
>  2 files changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 9185cb1d13b9..7a3cda8294a3 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -412,22 +412,25 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>  }
>
>  #ifdef CONFIG_COMPAT
> -int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
> -                                      int uses_interp)
> +int compat_arch_setup_additional_pages_ia32(struct linux_binprm *bprm,
> +                                           int uses_interp)
>  {
> -#ifdef CONFIG_X86_X32_ABI
> -       if (test_thread_flag(TIF_X32)) {
> -               if (!vdso64_enabled)
> -                       return 0;
> -               return map_vdso_randomized(&vdso_image_x32);
> -       }
> -#endif
>  #ifdef CONFIG_IA32_EMULATION
>         return load_vdso32();
>  #else
>         return 0;
>  #endif
>  }
> +
> +int compat_arch_setup_additional_pages_x32(struct linux_binprm *bprm,
> +                                          int uses_interp)
> +{
> +#ifdef CONFIG_X86_X32_ABI
> +       if (vdso64_enabled)
> +               return map_vdso_randomized(&vdso_image_x32);
> +#endif
> +       return 0;
> +}
>  #endif
>  #else
>  int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 33c1c9be2e07..4d91f5b1079f 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -388,9 +388,14 @@ struct linux_binprm;
>  #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
>  extern int arch_setup_additional_pages(struct linux_binprm *bprm,
>                                        int uses_interp);
> -extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
> -                                             int uses_interp);
> -#define compat_arch_setup_additional_pages compat_arch_setup_additional_pages
> +extern int compat_arch_setup_additional_pages_ia32(struct linux_binprm *bprm,
> +                                                  int uses_interp);
> +extern int compat_arch_setup_additional_pages_x32(struct linux_binprm *bprm,
> +                                                 int uses_interp);
> +
> +#define compat_arch_setup_additional_pages                             \
> +       ((elf_ex->e_machine == EM_X86_64) ?                             \
> +        compat_arch_setup_additional_pages_x32 : compat_arch_setup_additional_pages_ia32)
>

As in the previous patch, can you wire up the new argument for real, please?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-01 20:58 ` [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry Gabriel Krisman Bertazi
@ 2020-10-01 21:52   ` Andy Lutomirski
  2020-10-02 22:40     ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-01 21:52 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, Paolo Bonzini
  Cc: Andrew Lutomirski, Thomas Gleixner, Christoph Hellwig,
	H. Peter Anvin, Borislav Petkov, Robert Richter, Peter Zijlstra,
	Ingo Molnar, X86 ML, LKML, Dave Hansen, kernel

On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
> very specific use in uprobes.  Use the user_64bit_mode helper instead.
> This reduces the usage of is_64bit_mm, which is awkward, since it relies
> on the personality at load time, which is fine for uprobes, but doesn't
> seem fine here.
>
> I tested this by running VMs with 64 and 32 bits payloads from 64/32
> programs.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 7b2a068f08c1..b5aafd9e5f5d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>         savesegment(es, host_state->es_sel);
>
>         gs_base = cpu_kernelmode_gs_base(cpu);
> -       if (likely(is_64bit_mm(current->mm))) {
> +       if (likely(user_64bit_mode(current_pt_regs()))) {
>                 current_save_fsgs();
>                 fs_sel = current->thread.fsindex;
>                 gs_sel = current->thread.gsindex;

I disagree with this one.  This whole code path is nonsense.  Can you
just remove the condition entirely and use the 64-bit path
unconditionally?
]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] x86: Convert mmu context ia32_compat into a proper flags field
  2020-10-01 20:58 ` [PATCH v2 8/9] x86: Convert mmu context ia32_compat into a proper flags field Gabriel Krisman Bertazi
@ 2020-10-01 22:09   ` Andy Lutomirski
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-01 22:09 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Andrew Lutomirski, Thomas Gleixner, Christoph Hellwig,
	H. Peter Anvin, Borislav Petkov, Robert Richter, Peter Zijlstra,
	Ingo Molnar, X86 ML, LKML, Dave Hansen, kernel

On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> The ia32_compat attribute is a weird thing.  It mirrors TIF_IA32 and
> TIF_X32 and is used only in two very unrelated places: (1) to decide if
> the vsyscall page is accessible (2) for uprobes to find whether the
> patched instruction is 32 or 64 bit.  In preparation to remove the TI
> flags, we want new values for ia32_compat, but given its odd semantics,
> I'd rather make it a real flags field that configures these specific
> behaviours.  So, set_personality_x64 can ask for the vsyscall page,
> which is not available in x32/ia32 and set_personality_ia32 can
> configure the uprobe code as needed.
>
> uprobe cannot rely on other methods like user_64bit_mode() to decide how
> to patch, so it needs some specific flag like this.

I like this quite a bit, but can you rename MM_CONTEXT_GATE_PAGE to
MM_CONTEXT_HAS_VSYSCALL?

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-01 21:52   ` Andy Lutomirski
@ 2020-10-02 22:40     ` Sean Christopherson
  2020-10-02 23:11       ` Gabriel Krisman Bertazi
  2020-10-03  0:15       ` Andy Lutomirski
  0 siblings, 2 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-10-02 22:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Gabriel Krisman Bertazi, Paolo Bonzini, Thomas Gleixner,
	Christoph Hellwig, H. Peter Anvin, Borislav Petkov,
	Robert Richter, Peter Zijlstra, Ingo Molnar, X86 ML, LKML,
	Dave Hansen, kernel

On Thu, Oct 01, 2020 at 02:52:32PM -0700, Andy Lutomirski wrote:
> On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
> >
> > vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
> > very specific use in uprobes.  Use the user_64bit_mode helper instead.
> > This reduces the usage of is_64bit_mm, which is awkward, since it relies
> > on the personality at load time, which is fine for uprobes, but doesn't
> > seem fine here.
> >
> > I tested this by running VMs with 64 and 32 bits payloads from 64/32
> > programs.
> >
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 7b2a068f08c1..b5aafd9e5f5d 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> >         savesegment(es, host_state->es_sel);
> >
> >         gs_base = cpu_kernelmode_gs_base(cpu);
> > -       if (likely(is_64bit_mm(current->mm))) {
> > +       if (likely(user_64bit_mode(current_pt_regs()))) {
> >                 current_save_fsgs();
> >                 fs_sel = current->thread.fsindex;
> >                 gs_sel = current->thread.gsindex;
> 
> I disagree with this one.  This whole code path is nonsense.  Can you
> just remove the condition entirely and use the 64-bit path
> unconditionally?

I finally came back to this one with fresh eyes.  I've read through the code
a bajllion times and typed up half a dozen responses.  I think, finally, I
understand what's broken.

I believe your original assertion that the bug was misdiagnosed is correct
(can't link because LKML wasn't in the loop).  I'm pretty sure your analysis
that KVM's handling of things works mostly by coincidence is also correct.

The coincidence is that "real" VMMs all use arch_prctl(), and
do_arch_prctl_64() ensures thread.{fs,gs}base are accurate.  save_base_legacy()
detects sel==0 and intentionally does nothing, knowing the the base is already
accurate.

Userspaces that don't use arch_prctl(), in the bug report case a 32-bit compat
test, may or may not have accurate thread.{fs,gs}base values.  This is
especially true if sel!=0 as save_base_legacy() explicitly zeros the base in
this case, as load_seg_legacy() will restore the seg on the backend.

KVM on the other hand assumes thread.{fs,gs}base are always fresh.  When that
didn't hold true for userspace that didn't use arch_prctl(), the fix of
detecting a !64-bit mm just so happened to work because all 64-bit VMMs use
arch_prctl().

It's tempting to just open code this and use RD{FS,GS}BASE when possible,
i.e. avoid any guesswork.  Maybe with a module param that userspace can set
to tell KVM it doesn't do anything fancy with FS/GS base (will userspace still
use arch_prctl() even if FSGSABSE is available?).

        savesegment(fs, fs_sel);
        savesegment(gs, gs_sel);
	if (use_current_fsgs_base) {
		fs_base = current->thread.fsbase;
		vmx->msr_host_kernel_gs_base = current->thread.gsbase;
	} else if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
                fs_base = rdfsbase()
                vmx->msr_host_kernel_gs_base = __rdgsbase_inactive();
        } else {
                fs_base = read_msr(MSR_FS_BASE);
                vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
        }

I'll revisit this on Monday and run a patch by Paolo.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-02 22:40     ` Sean Christopherson
@ 2020-10-02 23:11       ` Gabriel Krisman Bertazi
  2020-10-03  0:15       ` Andy Lutomirski
  1 sibling, 0 replies; 21+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-10-02 23:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Paolo Bonzini, Thomas Gleixner,
	Christoph Hellwig, H. Peter Anvin, Borislav Petkov,
	Robert Richter, Peter Zijlstra, Ingo Molnar, X86 ML, LKML,
	Dave Hansen, kernel

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Thu, Oct 01, 2020 at 02:52:32PM -0700, Andy Lutomirski wrote:
>> On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
>> <krisman@collabora.com> wrote:
>> >
>> > vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
>> > very specific use in uprobes.  Use the user_64bit_mode helper instead.
>> > This reduces the usage of is_64bit_mm, which is awkward, since it relies
>> > on the personality at load time, which is fine for uprobes, but doesn't
>> > seem fine here.
>> >
>> > I tested this by running VMs with 64 and 32 bits payloads from 64/32
>> > programs.
>> >
>> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> > ---
>> >  arch/x86/kvm/vmx/vmx.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> > index 7b2a068f08c1..b5aafd9e5f5d 100644
>> > --- a/arch/x86/kvm/vmx/vmx.c
>> > +++ b/arch/x86/kvm/vmx/vmx.c
>> > @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>> >         savesegment(es, host_state->es_sel);
>> >
>> >         gs_base = cpu_kernelmode_gs_base(cpu);
>> > -       if (likely(is_64bit_mm(current->mm))) {
>> > +       if (likely(user_64bit_mode(current_pt_regs()))) {
>> >                 current_save_fsgs();
>> >                 fs_sel = current->thread.fsindex;
>> >                 gs_sel = current->thread.gsindex;
>> 
>> I disagree with this one.  This whole code path is nonsense.  Can you
>> just remove the condition entirely and use the 64-bit path
>> unconditionally?
>
> I finally came back to this one with fresh eyes.  I've read through the code
> a bajllion times and typed up half a dozen responses.  I think, finally, I
> understand what's broken.
>
> I believe your original assertion that the bug was misdiagnosed is correct
> (can't link because LKML wasn't in the loop).  I'm pretty sure your analysis
> that KVM's handling of things works mostly by coincidence is also correct.
>
> The coincidence is that "real" VMMs all use arch_prctl(), and
> do_arch_prctl_64() ensures thread.{fs,gs}base are accurate.  save_base_legacy()
> detects sel==0 and intentionally does nothing, knowing the the base is already
> accurate.
>
> Userspaces that don't use arch_prctl(), in the bug report case a 32-bit compat
> test, may or may not have accurate thread.{fs,gs}base values.  This is
> especially true if sel!=0 as save_base_legacy() explicitly zeros the base in
> this case, as load_seg_legacy() will restore the seg on the backend.
>
> KVM on the other hand assumes thread.{fs,gs}base are always fresh.  When that
> didn't hold true for userspace that didn't use arch_prctl(), the fix of
> detecting a !64-bit mm just so happened to work because all 64-bit VMMs use
> arch_prctl().
>
> It's tempting to just open code this and use RD{FS,GS}BASE when possible,
> i.e. avoid any guesswork.  Maybe with a module param that userspace can set
> to tell KVM it doesn't do anything fancy with FS/GS base (will userspace still
> use arch_prctl() even if FSGSABSE is available?).
>
>         savesegment(fs, fs_sel);
>         savesegment(gs, gs_sel);
> 	if (use_current_fsgs_base) {
> 		fs_base = current->thread.fsbase;
> 		vmx->msr_host_kernel_gs_base = current->thread.gsbase;
> 	} else if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>                 fs_base = rdfsbase()
>                 vmx->msr_host_kernel_gs_base = __rdgsbase_inactive();
>         } else {
>                 fs_base = read_msr(MSR_FS_BASE);
>                 vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
>         }
>
> I'll revisit this on Monday and run a patch by Paolo.

If this is the case, I will just drop the current patch from my series
and leave it to you.  Given that is_64bit_mm() is not going away, this
use case can be fixed separately.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-02 22:40     ` Sean Christopherson
  2020-10-02 23:11       ` Gabriel Krisman Bertazi
@ 2020-10-03  0:15       ` Andy Lutomirski
  2020-10-03 23:04         ` Andy Lutomirski
  1 sibling, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-03  0:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Gabriel Krisman Bertazi, Paolo Bonzini,
	Thomas Gleixner, Christoph Hellwig, H. Peter Anvin,
	Borislav Petkov, Robert Richter, Peter Zijlstra, Ingo Molnar,
	X86 ML, LKML, Dave Hansen, kernel

On Fri, Oct 2, 2020 at 3:40 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Oct 01, 2020 at 02:52:32PM -0700, Andy Lutomirski wrote:
> > On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> > >
> > > vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
> > > very specific use in uprobes.  Use the user_64bit_mode helper instead.
> > > This reduces the usage of is_64bit_mm, which is awkward, since it relies
> > > on the personality at load time, which is fine for uprobes, but doesn't
> > > seem fine here.
> > >
> > > I tested this by running VMs with 64 and 32 bits payloads from 64/32
> > > programs.
> > >
> > > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> > > ---
> > >  arch/x86/kvm/vmx/vmx.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 7b2a068f08c1..b5aafd9e5f5d 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > >         savesegment(es, host_state->es_sel);
> > >
> > >         gs_base = cpu_kernelmode_gs_base(cpu);
> > > -       if (likely(is_64bit_mm(current->mm))) {
> > > +       if (likely(user_64bit_mode(current_pt_regs()))) {
> > >                 current_save_fsgs();
> > >                 fs_sel = current->thread.fsindex;
> > >                 gs_sel = current->thread.gsindex;
> >
> > I disagree with this one.  This whole code path is nonsense.  Can you
> > just remove the condition entirely and use the 64-bit path
> > unconditionally?
>
> I finally came back to this one with fresh eyes.  I've read through the code
> a bajllion times and typed up half a dozen responses.  I think, finally, I
> understand what's broken.
>
> I believe your original assertion that the bug was misdiagnosed is correct
> (can't link because LKML wasn't in the loop).  I'm pretty sure your analysis
> that KVM's handling of things works mostly by coincidence is also correct.
>
> The coincidence is that "real" VMMs all use arch_prctl(), and
> do_arch_prctl_64() ensures thread.{fs,gs}base are accurate.  save_base_legacy()
> detects sel==0 and intentionally does nothing, knowing the the base is already
> accurate.
>
> Userspaces that don't use arch_prctl(), in the bug report case a 32-bit compat
> test, may or may not have accurate thread.{fs,gs}base values.  This is
> especially true if sel!=0 as save_base_legacy() explicitly zeros the base in
> this case, as load_seg_legacy() will restore the seg on the backend.
>
> KVM on the other hand assumes thread.{fs,gs}base are always fresh.  When that
> didn't hold true for userspace that didn't use arch_prctl(), the fix of
> detecting a !64-bit mm just so happened to work because all 64-bit VMMs use
> arch_prctl().
>
> It's tempting to just open code this and use RD{FS,GS}BASE when possible,
> i.e. avoid any guesswork.  Maybe with a module param that userspace can set
> to tell KVM it doesn't do anything fancy with FS/GS base (will userspace still
> use arch_prctl() even if FSGSABSE is available?).
>
>         savesegment(fs, fs_sel);
>         savesegment(gs, gs_sel);
>         if (use_current_fsgs_base) {
>                 fs_base = current->thread.fsbase;
>                 vmx->msr_host_kernel_gs_base = current->thread.gsbase;

I don't like this.  The FSGSBASE case is fast, and the !FSGSBASE case
is mostly obsolete.  I see no great reason to have a module parameter
asking for incorrect behavior.  There have been too many bugs in this
area -- let's not add more please.

>         } else if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>                 fs_base = rdfsbase()
>                 vmx->msr_host_kernel_gs_base = __rdgsbase_inactive();
>         } else {
>                 fs_base = read_msr(MSR_FS_BASE);
>                 vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
>         }

I'm okay with this, but I think you should fix the corresponding bugs
on the restore side as well.  The current code is:

        if (host_state->ldt_sel || (host_state->gs_sel & 7)) {
                kvm_load_ldt(host_state->ldt_sel);
#ifdef CONFIG_X86_64
                load_gs_index(host_state->gs_sel);
#else
                loadsegment(gs, host_state->gs_sel);
#endif
        }
        if (host_state->fs_sel & 7)
                loadsegment(fs, host_state->fs_sel);

which is blatantly wrong in the case where fs_set.TI == 1, gs_set.TI
== 0, and ldt_sel != 0.  But it's also more subtly wrong -- this
corrupts all the segment attributes in the case where a segment points
to the GDT and the GDT attributes are non-default.  So I would suggest
making the code radically simpler and more correct:

if (host_state->idt_sel)
  kvm_load_idt(host_state->idt_sel);  // see below
if (host_state->ds_sel)
  loadsegment(ds, host_state->ds_sel);
if (host_state->es_sel)
  loadsegment(es, host_state->es_sel);
if (host_state->fs_sel) {
  loadsegment(fs, host_state->fs_sel);
  x86_fsbase_write_cpu(host_state->fs_base);
}
if (host_state->gs_sel) {
  load_gs_index(host_state->gs_sel);
  x86_gsbase_write_cpu_inactive(host_state->msr_host_kernel_gs_base);
}

In the IMO unlikely event that any performance-critical KVM userspace
runs with these selectors != 0, you could also skip the load if they
are set to __USER_DS.

You can also simplify this crud:

        if (unlikely(fs_sel != host->fs_sel)) {
                if (!(fs_sel & 7))
                        vmcs_write16(HOST_FS_SELECTOR, fs_sel);
                else
                        vmcs_write16(HOST_FS_SELECTOR, 0);
                host->fs_sel = fs_sel;
        }

There is nothing special about segments with TI set according to the
SDM (AFAICT) nor is anything particularly special about them in
Linux's segment tables, so this code makes little sense.  It could
just be:

host->fs_sel = fs_sel;

and leave the VMCS field set to 0.  Or if you do the __USER_DS
optimization above, you could do:

if (unlikely(fs_sel != host->fs_sel)) {
  vmcs_write16(HOST_FS_SELECTOR, fs_sel == __USER_DS ? __USER_DS : 0);
  host->fs_sel = fs_sel;
}

I suspect that the only users who care about performance (or for whom
we should care about performance) are normal userspace, and normal
64-bit userspace has DS == ES == FS == GS == 0.


I would also be okay with making the KVM code match the context switch
code, but this may be distinctly nontrivial.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-03  0:15       ` Andy Lutomirski
@ 2020-10-03 23:04         ` Andy Lutomirski
  2020-10-05 18:56           ` Andy Lutomirski
  2020-10-05 19:54           ` Sean Christopherson
  0 siblings, 2 replies; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-03 23:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Gabriel Krisman Bertazi, Paolo Bonzini,
	Thomas Gleixner, Christoph Hellwig, H. Peter Anvin,
	Borislav Petkov, Robert Richter, Peter Zijlstra, Ingo Molnar,
	X86 ML, LKML, Dave Hansen, kernel

On Fri, Oct 2, 2020 at 5:15 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Fri, Oct 2, 2020 at 3:40 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Thu, Oct 01, 2020 at 02:52:32PM -0700, Andy Lutomirski wrote:
> > > On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
> > > <krisman@collabora.com> wrote:
> > > >
> > > > vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
> > > > very specific use in uprobes.  Use the user_64bit_mode helper instead.
> > > > This reduces the usage of is_64bit_mm, which is awkward, since it relies
> > > > on the personality at load time, which is fine for uprobes, but doesn't
> > > > seem fine here.
> > > >
> > > > I tested this by running VMs with 64 and 32 bits payloads from 64/32
> > > > programs.
> > > >
> > > > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/vmx.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > index 7b2a068f08c1..b5aafd9e5f5d 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > > >         savesegment(es, host_state->es_sel);
> > > >
> > > >         gs_base = cpu_kernelmode_gs_base(cpu);
> > > > -       if (likely(is_64bit_mm(current->mm))) {
> > > > +       if (likely(user_64bit_mode(current_pt_regs()))) {
> > > >                 current_save_fsgs();
> > > >                 fs_sel = current->thread.fsindex;
> > > >                 gs_sel = current->thread.gsindex;
> > >
> > > I disagree with this one.  This whole code path is nonsense.  Can you
> > > just remove the condition entirely and use the 64-bit path
> > > unconditionally?
> >
> > I finally came back to this one with fresh eyes.  I've read through the code
> > a bajllion times and typed up half a dozen responses.  I think, finally, I
> > understand what's broken.
> >
> > I believe your original assertion that the bug was misdiagnosed is correct
> > (can't link because LKML wasn't in the loop).  I'm pretty sure your analysis
> > that KVM's handling of things works mostly by coincidence is also correct.
> >
> > The coincidence is that "real" VMMs all use arch_prctl(), and
> > do_arch_prctl_64() ensures thread.{fs,gs}base are accurate.  save_base_legacy()
> > detects sel==0 and intentionally does nothing, knowing the the base is already
> > accurate.
> >
> > Userspaces that don't use arch_prctl(), in the bug report case a 32-bit compat
> > test, may or may not have accurate thread.{fs,gs}base values.  This is
> > especially true if sel!=0 as save_base_legacy() explicitly zeros the base in
> > this case, as load_seg_legacy() will restore the seg on the backend.
> >
> > KVM on the other hand assumes thread.{fs,gs}base are always fresh.  When that
> > didn't hold true for userspace that didn't use arch_prctl(), the fix of
> > detecting a !64-bit mm just so happened to work because all 64-bit VMMs use
> > arch_prctl().
> >
> > It's tempting to just open code this and use RD{FS,GS}BASE when possible,
> > i.e. avoid any guesswork.  Maybe with a module param that userspace can set
> > to tell KVM it doesn't do anything fancy with FS/GS base (will userspace still
> > use arch_prctl() even if FSGSABSE is available?).
> >
> >         savesegment(fs, fs_sel);
> >         savesegment(gs, gs_sel);
> >         if (use_current_fsgs_base) {
> >                 fs_base = current->thread.fsbase;
> >                 vmx->msr_host_kernel_gs_base = current->thread.gsbase;
>
> I don't like this.  The FSGSBASE case is fast, and the !FSGSBASE case
> is mostly obsolete.  I see no great reason to have a module parameter
> asking for incorrect behavior.  There have been too many bugs in this
> area -- let's not add more please.
>
> >         } else if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> >                 fs_base = rdfsbase()
> >                 vmx->msr_host_kernel_gs_base = __rdgsbase_inactive();
> >         } else {
> >                 fs_base = read_msr(MSR_FS_BASE);
> >                 vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> >         }
>
> I'm okay with this, but I think you should fix the corresponding bugs
> on the restore side as well.  The current code is:
>
>         if (host_state->ldt_sel || (host_state->gs_sel & 7)) {
>                 kvm_load_ldt(host_state->ldt_sel);
> #ifdef CONFIG_X86_64
>                 load_gs_index(host_state->gs_sel);
> #else
>                 loadsegment(gs, host_state->gs_sel);
> #endif
>         }
>         if (host_state->fs_sel & 7)
>                 loadsegment(fs, host_state->fs_sel);
>
> which is blatantly wrong in the case where fs_set.TI == 1, gs_set.TI
> == 0, and ldt_sel != 0.  But it's also more subtly wrong -- this
> corrupts all the segment attributes in the case where a segment points
> to the GDT and the GDT attributes are non-default.  So I would suggest
> making the code radically simpler and more correct:
>
> if (host_state->idt_sel)
>   kvm_load_idt(host_state->idt_sel);  // see below
> if (host_state->ds_sel)
>   loadsegment(ds, host_state->ds_sel);
> if (host_state->es_sel)
>   loadsegment(es, host_state->es_sel);
> if (host_state->fs_sel) {
>   loadsegment(fs, host_state->fs_sel);
>   x86_fsbase_write_cpu(host_state->fs_base);
> }
> if (host_state->gs_sel) {
>   load_gs_index(host_state->gs_sel);
>   x86_gsbase_write_cpu_inactive(host_state->msr_host_kernel_gs_base);
> }
>
> In the IMO unlikely event that any performance-critical KVM userspace
> runs with these selectors != 0, you could also skip the load if they
> are set to __USER_DS.
>
> You can also simplify this crud:
>
>         if (unlikely(fs_sel != host->fs_sel)) {
>                 if (!(fs_sel & 7))
>                         vmcs_write16(HOST_FS_SELECTOR, fs_sel);
>                 else
>                         vmcs_write16(HOST_FS_SELECTOR, 0);
>                 host->fs_sel = fs_sel;
>         }
>
> There is nothing special about segments with TI set according to the
> SDM (AFAICT) nor is anything particularly special about them in
> Linux's segment tables, so this code makes little sense.  It could
> just be:
>
> host->fs_sel = fs_sel;
>
> and leave the VMCS field set to 0.  Or if you do the __USER_DS
> optimization above, you could do:
>
> if (unlikely(fs_sel != host->fs_sel)) {
>   vmcs_write16(HOST_FS_SELECTOR, fs_sel == __USER_DS ? __USER_DS : 0);
>   host->fs_sel = fs_sel;
> }
>
> I suspect that the only users who care about performance (or for whom
> we should care about performance) are normal userspace, and normal
> 64-bit userspace has DS == ES == FS == GS == 0.
>
>
> I would also be okay with making the KVM code match the context switch
> code, but this may be distinctly nontrivial.

If you're okay waiting for a couple days, I'll just do this.  I have
it 2/3-done already, except I'm running into the utter catastrophe
that is 32-bit stackprotector, so I'm going to fix that first.  (Or
delete it if I get toosick of it.)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-03 23:04         ` Andy Lutomirski
@ 2020-10-05 18:56           ` Andy Lutomirski
  2020-10-05 19:54           ` Sean Christopherson
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2020-10-05 18:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Gabriel Krisman Bertazi, Paolo Bonzini,
	Thomas Gleixner, Christoph Hellwig, H. Peter Anvin,
	Borislav Petkov, Robert Richter, Peter Zijlstra, Ingo Molnar,
	X86 ML, LKML, Dave Hansen, kernel

On Sat, Oct 3, 2020 at 4:04 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Fri, Oct 2, 2020 at 5:15 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Fri, Oct 2, 2020 at 3:40 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Thu, Oct 01, 2020 at 02:52:32PM -0700, Andy Lutomirski wrote:
> > > > On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
> > > > <krisman@collabora.com> wrote:
> > > > >
> > > > > vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
> > > > > very specific use in uprobes.  Use the user_64bit_mode helper instead.
> > > > > This reduces the usage of is_64bit_mm, which is awkward, since it relies
> > > > > on the personality at load time, which is fine for uprobes, but doesn't
> > > > > seem fine here.
> > > > >
> > > > > I tested this by running VMs with 64 and 32 bits payloads from 64/32
> > > > > programs.
> > > > >
> > > > > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> > > > > ---
> > > > >  arch/x86/kvm/vmx/vmx.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > > index 7b2a068f08c1..b5aafd9e5f5d 100644
> > > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > > @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > > > >         savesegment(es, host_state->es_sel);
> > > > >
> > > > >         gs_base = cpu_kernelmode_gs_base(cpu);
> > > > > -       if (likely(is_64bit_mm(current->mm))) {
> > > > > +       if (likely(user_64bit_mode(current_pt_regs()))) {
> > > > >                 current_save_fsgs();
> > > > >                 fs_sel = current->thread.fsindex;
> > > > >                 gs_sel = current->thread.gsindex;
> > > >
> > > > I disagree with this one.  This whole code path is nonsense.  Can you
> > > > just remove the condition entirely and use the 64-bit path
> > > > unconditionally?
> > >
> > > I finally came back to this one with fresh eyes.  I've read through the code
> > > a bajllion times and typed up half a dozen responses.  I think, finally, I
> > > understand what's broken.
> > >
> > > I believe your original assertion that the bug was misdiagnosed is correct
> > > (can't link because LKML wasn't in the loop).  I'm pretty sure your analysis
> > > that KVM's handling of things works mostly by coincidence is also correct.
> > >
> > > The coincidence is that "real" VMMs all use arch_prctl(), and
> > > do_arch_prctl_64() ensures thread.{fs,gs}base are accurate.  save_base_legacy()
> > > detects sel==0 and intentionally does nothing, knowing the the base is already
> > > accurate.
> > >
> > > Userspaces that don't use arch_prctl(), in the bug report case a 32-bit compat
> > > test, may or may not have accurate thread.{fs,gs}base values.  This is
> > > especially true if sel!=0 as save_base_legacy() explicitly zeros the base in
> > > this case, as load_seg_legacy() will restore the seg on the backend.
> > >
> > > KVM on the other hand assumes thread.{fs,gs}base are always fresh.  When that
> > > didn't hold true for userspace that didn't use arch_prctl(), the fix of
> > > detecting a !64-bit mm just so happened to work because all 64-bit VMMs use
> > > arch_prctl().
> > >
> > > It's tempting to just open code this and use RD{FS,GS}BASE when possible,
> > > i.e. avoid any guesswork.  Maybe with a module param that userspace can set
> > > to tell KVM it doesn't do anything fancy with FS/GS base (will userspace still
> > > use arch_prctl() even if FSGSABSE is available?).
> > >
> > >         savesegment(fs, fs_sel);
> > >         savesegment(gs, gs_sel);
> > >         if (use_current_fsgs_base) {
> > >                 fs_base = current->thread.fsbase;
> > >                 vmx->msr_host_kernel_gs_base = current->thread.gsbase;
> >
> > I don't like this.  The FSGSBASE case is fast, and the !FSGSBASE case
> > is mostly obsolete.  I see no great reason to have a module parameter
> > asking for incorrect behavior.  There have been too many bugs in this
> > area -- let's not add more please.
> >
> > >         } else if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> > >                 fs_base = rdfsbase()
> > >                 vmx->msr_host_kernel_gs_base = __rdgsbase_inactive();
> > >         } else {
> > >                 fs_base = read_msr(MSR_FS_BASE);
> > >                 vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> > >         }
> >
> > I'm okay with this, but I think you should fix the corresponding bugs
> > on the restore side as well.  The current code is:
> >
> >         if (host_state->ldt_sel || (host_state->gs_sel & 7)) {
> >                 kvm_load_ldt(host_state->ldt_sel);
> > #ifdef CONFIG_X86_64
> >                 load_gs_index(host_state->gs_sel);
> > #else
> >                 loadsegment(gs, host_state->gs_sel);
> > #endif
> >         }
> >         if (host_state->fs_sel & 7)
> >                 loadsegment(fs, host_state->fs_sel);
> >
> > which is blatantly wrong in the case where fs_set.TI == 1, gs_set.TI
> > == 0, and ldt_sel != 0.  But it's also more subtly wrong -- this
> > corrupts all the segment attributes in the case where a segment points
> > to the GDT and the GDT attributes are non-default.  So I would suggest
> > making the code radically simpler and more correct:
> >
> > if (host_state->idt_sel)
> >   kvm_load_idt(host_state->idt_sel);  // see below
> > if (host_state->ds_sel)
> >   loadsegment(ds, host_state->ds_sel);
> > if (host_state->es_sel)
> >   loadsegment(es, host_state->es_sel);
> > if (host_state->fs_sel) {
> >   loadsegment(fs, host_state->fs_sel);
> >   x86_fsbase_write_cpu(host_state->fs_base);
> > }
> > if (host_state->gs_sel) {
> >   load_gs_index(host_state->gs_sel);
> >   x86_gsbase_write_cpu_inactive(host_state->msr_host_kernel_gs_base);
> > }
> >
> > In the IMO unlikely event that any performance-critical KVM userspace
> > runs with these selectors != 0, you could also skip the load if they
> > are set to __USER_DS.
> >
> > You can also simplify this crud:
> >
> >         if (unlikely(fs_sel != host->fs_sel)) {
> >                 if (!(fs_sel & 7))
> >                         vmcs_write16(HOST_FS_SELECTOR, fs_sel);
> >                 else
> >                         vmcs_write16(HOST_FS_SELECTOR, 0);
> >                 host->fs_sel = fs_sel;
> >         }
> >
> > There is nothing special about segments with TI set according to the
> > SDM (AFAICT) nor is anything particularly special about them in
> > Linux's segment tables, so this code makes little sense.  It could
> > just be:
> >
> > host->fs_sel = fs_sel;
> >
> > and leave the VMCS field set to 0.  Or if you do the __USER_DS
> > optimization above, you could do:
> >
> > if (unlikely(fs_sel != host->fs_sel)) {
> >   vmcs_write16(HOST_FS_SELECTOR, fs_sel == __USER_DS ? __USER_DS : 0);
> >   host->fs_sel = fs_sel;
> > }
> >
> > I suspect that the only users who care about performance (or for whom
> > we should care about performance) are normal userspace, and normal
> > 64-bit userspace has DS == ES == FS == GS == 0.
> >
> >
> > I would also be okay with making the KVM code match the context switch
> > code, but this may be distinctly nontrivial.
>
> If you're okay waiting for a couple days, I'll just do this.  I have
> it 2/3-done already, except I'm running into the utter catastrophe
> that is 32-bit stackprotector, so I'm going to fix that first.  (Or
> delete it if I get toosick of it.)

Rough draft.  This needs the rest of that branch applied to have any
chance of being correct.

I was contemplating whether we could get a meaningful speedup if we
declared that entering a VM would wipe segments, but I don't think
there's actually much speedup available.

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/unify_stack_canary&id=f5a2dd07cd222bb674bd947f5d9c698ab50ccb88

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry
  2020-10-03 23:04         ` Andy Lutomirski
  2020-10-05 18:56           ` Andy Lutomirski
@ 2020-10-05 19:54           ` Sean Christopherson
  1 sibling, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-10-05 19:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Gabriel Krisman Bertazi, Paolo Bonzini, Thomas Gleixner,
	Christoph Hellwig, H. Peter Anvin, Borislav Petkov,
	Robert Richter, Peter Zijlstra, Ingo Molnar, X86 ML, LKML,
	Dave Hansen, kernel

On Sat, Oct 03, 2020 at 04:04:22PM -0700, Andy Lutomirski wrote:
> On Fri, Oct 2, 2020 at 5:15 PM Andy Lutomirski <luto@kernel.org> wrote:
> > But it's also more subtly wrong -- this corrupts all the segment attributes
> > in the case where a segment points to the GDT and the GDT attributes are
> > non-default.

Part of me wants to ask if it's even possible to get into such a scenario,
but a much larger part of me doesn't want to think about segmentation any
more :-)

> > I would also be okay with making the KVM code match the context switch
> > code, but this may be distinctly nontrivial.

Ya.
 
> If you're okay waiting for a couple days, I'll just do this.  I have
> it 2/3-done already, except I'm running into the utter catastrophe
> that is 32-bit stackprotector, so I'm going to fix that first.  (Or
> delete it if I get toosick of it.)

By all means.  I dragged my feet for several months, I can certainly do
nothing for a few more days.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-10-05 20:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 20:58 [PATCH v2 0/9] Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi
2020-10-01 20:58 ` [PATCH v2 1/9] x86: events: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
2020-10-01 20:58 ` [PATCH v2 2/9] x86: Simplify compat syscall userspace allocation Gabriel Krisman Bertazi
2020-10-01 20:58 ` [PATCH v2 3/9] x86: oprofile: Avoid TIF_IA32 when checking 64bit mode Gabriel Krisman Bertazi
2020-10-01 20:58 ` [PATCH v2 4/9] x86: elf: Use e_machine to choose DLINFO in compat Gabriel Krisman Bertazi
2020-10-01 21:46   ` Andy Lutomirski
2020-10-01 20:58 ` [PATCH v2 5/9] x86: elf: Use e_machine to select start_thread for x32 Gabriel Krisman Bertazi
2020-10-01 21:49   ` Andy Lutomirski
2020-10-01 20:58 ` [PATCH v2 6/9] x86: elf: Use e_machine to select setup_additional_pages " Gabriel Krisman Bertazi
2020-10-01 21:50   ` Andy Lutomirski
2020-10-01 20:58 ` [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry Gabriel Krisman Bertazi
2020-10-01 21:52   ` Andy Lutomirski
2020-10-02 22:40     ` Sean Christopherson
2020-10-02 23:11       ` Gabriel Krisman Bertazi
2020-10-03  0:15       ` Andy Lutomirski
2020-10-03 23:04         ` Andy Lutomirski
2020-10-05 18:56           ` Andy Lutomirski
2020-10-05 19:54           ` Sean Christopherson
2020-10-01 20:58 ` [PATCH v2 8/9] x86: Convert mmu context ia32_compat into a proper flags field Gabriel Krisman Bertazi
2020-10-01 22:09   ` Andy Lutomirski
2020-10-01 20:58 ` [PATCH v2 9/9] x86: Reclaim TIF_IA32 and TIF_X32 Gabriel Krisman Bertazi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).