linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
@ 2020-04-07 11:02 Peter Zijlstra
  2020-04-07 11:02 ` [PATCH 1/4] module: Expose load_info to arch module loader code Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 11:02 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, peterz,
	jeyu, rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

Hi all,

Driven by the SLD vs VMX interaction, here are some patches that provide means
to analyze the text of out-of-tree modules.

The first user of that is refusing to load modules on VMX-SLD conflicts, but it
also has a second patch that refulses to load any module that tries to modify
CRn/DRn.

I'm thinking people will quickly come up with more and more elaborate tests to
which to subject out-of-tree modules.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/4] module: Expose load_info to arch module loader code
  2020-04-07 11:02 [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Peter Zijlstra
@ 2020-04-07 11:02 ` Peter Zijlstra
  2020-04-07 16:52   ` Kees Cook
  2020-04-07 11:02 ` [PATCH 2/4] module: Convert module_finalize() to load_info Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 11:02 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, peterz,
	jeyu, rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

From: Jessica Yu <jeyu@kernel.org>

The x86 module loader wants to check the value of a modinfo flag
(sld_safe), before proceeding to scan the module text for VMX
instructions. Unfortunately the arch module code currently does not have
access to load_info, but we can easily expose that via moduleloader.h,
which every arch module code must already include.

Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200406160420.14407-1-jeyu@kernel.org
---

 include/linux/moduleloader.h | 20 ++++++++++++++++++++
 kernel/module-internal.h     | 23 -----------------------
 kernel/module_signing.c      |  2 +-
 3 files changed, 21 insertions(+), 24 deletions(-)

Index: linux-2.6/include/linux/moduleloader.h
===================================================================
--- linux-2.6.orig/include/linux/moduleloader.h
+++ linux-2.6/include/linux/moduleloader.h
@@ -6,6 +6,26 @@
 #include <linux/module.h>
 #include <linux/elf.h>
 
+struct load_info {
+	const char *name;
+	/* pointer to module in temporary copy, freed at end of load_module() */
+	struct module *mod;
+	Elf_Ehdr *hdr;
+	unsigned long len;
+	Elf_Shdr *sechdrs;
+	char *secstrings, *strtab;
+	unsigned long symoffs, stroffs, init_typeoffs, core_typeoffs;
+	struct _ddebug *debug;
+	unsigned int num_debug;
+	bool sig_ok;
+#ifdef CONFIG_KALLSYMS
+	unsigned long mod_kallsyms_init_off;
+#endif
+	struct {
+		unsigned int sym, str, mod, vers, info, pcpu;
+	} index;
+};
+
 /* These may be implemented by architectures that need to hook into the
  * module loader code.  Architectures that don't need to do anything special
  * can just rely on the 'weak' default hooks defined in kernel/module.c.
Index: linux-2.6/kernel/module-internal.h
===================================================================
--- linux-2.6.orig/kernel/module-internal.h
+++ linux-2.6/kernel/module-internal.h
@@ -5,27 +5,4 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#include <linux/elf.h>
-#include <asm/module.h>
-
-struct load_info {
-	const char *name;
-	/* pointer to module in temporary copy, freed at end of load_module() */
-	struct module *mod;
-	Elf_Ehdr *hdr;
-	unsigned long len;
-	Elf_Shdr *sechdrs;
-	char *secstrings, *strtab;
-	unsigned long symoffs, stroffs, init_typeoffs, core_typeoffs;
-	struct _ddebug *debug;
-	unsigned int num_debug;
-	bool sig_ok;
-#ifdef CONFIG_KALLSYMS
-	unsigned long mod_kallsyms_init_off;
-#endif
-	struct {
-		unsigned int sym, str, mod, vers, info, pcpu;
-	} index;
-};
-
 extern int mod_verify_sig(const void *mod, struct load_info *info);
Index: linux-2.6/kernel/module_signing.c
===================================================================
--- linux-2.6.orig/kernel/module_signing.c
+++ linux-2.6/kernel/module_signing.c
@@ -8,11 +8,11 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/module.h>
+#include <linux/moduleloader.h>
 #include <linux/module_signature.h>
 #include <linux/string.h>
 #include <linux/verification.h>
 #include <crypto/public_key.h>
-#include "module-internal.h"
 
 /*
  * Verify the signature on a module.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 2/4] module: Convert module_finalize() to load_info
  2020-04-07 11:02 [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Peter Zijlstra
  2020-04-07 11:02 ` [PATCH 1/4] module: Expose load_info to arch module loader code Peter Zijlstra
@ 2020-04-07 11:02 ` Peter Zijlstra
  2020-04-07 16:53   ` Kees Cook
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 11:02 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, peterz,
	jeyu, rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

Provide load_info to module_finalize(), such that architectures might,
for example, use get_modinfo() in their implementation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arc/kernel/module.c        |    4 ++--
 arch/arm/kernel/module.c        |    5 +++--
 arch/arm64/kernel/module.c      |    6 +++---
 arch/ia64/kernel/module.c       |    3 +--
 arch/m68k/kernel/module.c       |    4 +---
 arch/microblaze/kernel/module.c |    3 +--
 arch/mips/kernel/module.c       |    6 +++---
 arch/nds32/kernel/module.c      |    4 +---
 arch/nios2/kernel/module.c      |    3 +--
 arch/parisc/kernel/module.c     |    6 +++---
 arch/powerpc/kernel/module.c    |    5 +++--
 arch/s390/kernel/module.c       |    6 +++---
 arch/sh/kernel/module.c         |    6 +++---
 arch/sparc/kernel/module.c      |    7 ++++---
 arch/x86/kernel/module.c        |    6 +++---
 include/linux/moduleloader.h    |    4 +---
 kernel/module.c                 |    6 ++----
 17 files changed, 38 insertions(+), 46 deletions(-)

--- a/arch/arc/kernel/module.c
+++ b/arch/arc/kernel/module.c
@@ -129,10 +129,10 @@ int apply_relocate_add(Elf32_Shdr *sechd
  * This couldn't be done in module_frob_arch_sections() because
  * relocations had not been applied by then
  */
-int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
-		    struct module *mod)
+int module_finalize(const struct load_info *info, struct module *mod)
 {
 #ifdef CONFIG_ARC_DW2_UNWIND
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	void *unw;
 	int unwsec = mod->arch.unw_sec_idx;
 
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -340,9 +340,10 @@ static const Elf_Shdr *find_mod_section(
 extern void fixup_pv_table(const void *, unsigned long);
 extern void fixup_smp(const void *, unsigned long);
 
-int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
-		    struct module *mod)
+int module_finalize(const struct load_info *info, struct module *mod)
 {
+	const Elf32_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	const Elf_Shdr *s = NULL;
 #ifdef CONFIG_ARM_UNWIND
 	const char *secstrs = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -515,10 +515,10 @@ static int module_init_ftrace_plt(const
 	return 0;
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	const Elf_Shdr *s;
 	s = find_section(hdr, sechdrs, ".altinstructions");
 	if (s)
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -902,8 +902,7 @@ register_unwind_table (struct module *mo
 	}
 }
 
-int
-module_finalize (const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod)
+int module_finalize(const struct load_info *info, struct module *mod)
 {
 	DEBUGP("%s: init: entry=%p\n", __func__, mod->init);
 	if (mod->arch.unwind)
--- a/arch/m68k/kernel/module.c
+++ b/arch/m68k/kernel/module.c
@@ -99,9 +99,7 @@ int apply_relocate_add(Elf32_Shdr *sechd
 	return 0;
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *mod)
+int module_finalize(const struct load_info *info, struct module *mod)
 {
 	module_fixup(mod, mod->arch.fixup_start, mod->arch.fixup_end);
 	return 0;
--- a/arch/microblaze/kernel/module.c
+++ b/arch/microblaze/kernel/module.c
@@ -114,8 +114,7 @@ int apply_relocate_add(Elf32_Shdr *sechd
 	return 0;
 }
 
-int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
-		struct module *module)
+int module_finalize(const struct load_info *info, struct module *module)
 {
 	flush_dcache();
 	return 0;
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -427,10 +427,10 @@ const struct exception_table_entry *sear
 }
 
 /* Put in dbe list if necessary. */
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	const Elf_Shdr *s;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
--- a/arch/nds32/kernel/module.c
+++ b/arch/nds32/kernel/module.c
@@ -266,9 +266,7 @@ apply_relocate_add(Elf32_Shdr * sechdrs,
 	return 0;
 }
 
-int
-module_finalize(const Elf32_Ehdr * hdr, const Elf_Shdr * sechdrs,
-		struct module *module)
+int module_finalize(const struct load_info *info, struct module *module)
 {
 	return 0;
 }
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -130,8 +130,7 @@ int apply_relocate_add(Elf32_Shdr *sechd
 	return 0;
 }
 
-int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
-			struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
 	flush_cache_all();
 	return 0;
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -854,10 +854,10 @@ deregister_unwind_table(struct module *m
 		unwind_table_remove(me->arch.unwind);
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	int i;
 	unsigned long nsyms;
 	const char *strtab = NULL;
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -31,9 +31,10 @@ static const Elf_Shdr *find_section(cons
 	return NULL;
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		const Elf_Shdr *sechdrs, struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	const Elf_Shdr *sect;
 	int rc;
 
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -437,10 +437,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs
 	return 0;
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	const Elf_Shdr *s;
 	char *secstrings, *secname;
 	void *aseg;
--- a/arch/sh/kernel/module.c
+++ b/arch/sh/kernel/module.c
@@ -96,10 +96,10 @@ int apply_relocate_add(Elf32_Shdr *sechd
 	return 0;
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	int ret = 0;
 
 	ret |= module_dwarf_finalize(hdr, sechdrs, me);
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -204,10 +204,11 @@ static void do_patch_sections(const Elf_
 	}
 }
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
+
 	/* make jump label nops */
 	jump_label_apply_nops(me);
 
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -217,10 +217,10 @@ int apply_relocate_add(Elf64_Shdr *sechd
 }
 #endif
 
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *me)
+int module_finalize(const struct load_info *info, struct module *me)
 {
+	const Elf_Ehdr *hdr = info->hdr;
+	const Elf_Shdr *sechdrs = info->sechdrs;
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
 		*para = NULL, *orc = NULL, *orc_ip = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -101,9 +101,7 @@ static inline int apply_relocate_add(Elf
 #endif
 
 /* Any final processing of module before access.  Return -error or 0. */
-int module_finalize(const Elf_Ehdr *hdr,
-		    const Elf_Shdr *sechdrs,
-		    struct module *mod);
+int module_finalize(const struct load_info *info, struct module *mod);
 
 /* Any cleanup needed when module leaves. */
 void module_arch_cleanup(struct module *mod);
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3425,9 +3425,7 @@ static void module_deallocate(struct mod
 	module_memfree(mod->core_layout.base);
 }
 
-int __weak module_finalize(const Elf_Ehdr *hdr,
-			   const Elf_Shdr *sechdrs,
-			   struct module *me)
+int __weak module_finalize(const struct load_info *info, struct module *me)
 {
 	return 0;
 }
@@ -3445,7 +3443,7 @@ static int post_relocation(struct module
 	add_kallsyms(mod, info);
 
 	/* Arch-specific module finalizing. */
-	return module_finalize(info->hdr, info->sechdrs, mod);
+	return module_finalize(info, mod);
 }
 
 /* Is this module of this name done loading?  No locks held. */



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Peter Zijlstra
  2020-04-07 11:02 ` [PATCH 1/4] module: Expose load_info to arch module loader code Peter Zijlstra
  2020-04-07 11:02 ` [PATCH 2/4] module: Convert module_finalize() to load_info Peter Zijlstra
@ 2020-04-07 11:02 ` Peter Zijlstra
  2020-04-07 14:35   ` Greg KH
                     ` (5 more replies)
  2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
  2020-04-07 17:23 ` [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Andrew Cooper
  4 siblings, 6 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 11:02 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, peterz,
	jeyu, rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

It turns out that with Split-Lock-Detect enabled (default) any VMX
hypervisor needs at least a little modification in order to not blindly
inject the #AC into the guest without the guest being ready for it.

Since there is no telling which module implements a hypervisor, scan
all out-of-tree modules' text and look for VMX instructions and refuse
to load it when SLD is enabled (default) and the module isn't marked
'sld_safe'.

Hypervisors, which have been modified and are known to work correctly,
can add:

  MODULE_INFO(sld_safe, "Y");

to explicitly tell the module loader they're good.

Reported-by: "Kenneth R. Crudup" <kenny@panix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200402124205.242674296@linutronix.de
---
 arch/x86/include/asm/cpu.h   |    5 ++
 arch/x86/kernel/cpu/intel.c  |    5 ++
 arch/x86/kernel/module.c     |   87 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/moduleloader.h |    2 
 kernel/module.c              |    3 -
 5 files changed, 99 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int s
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern bool split_lock_enabled(void);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -51,5 +52,9 @@ static inline bool handle_user_split_loc
 {
 	return false;
 }
+static inline bool split_lock_enabled(void)
+{
+	return false;
+}
 #endif
 #endif /* _ASM_X86_CPU_H */
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1061,6 +1061,11 @@ static void sld_update_msr(bool on)
 	wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
 }
 
+bool split_lock_enabled(void)
+{
+	return sld_state != sld_off;
+}
+
 static void split_lock_init(void)
 {
 	split_lock_verify_msr(sld_state != sld_off);
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -24,6 +24,8 @@
 #include <asm/pgtable.h>
 #include <asm/setup.h>
 #include <asm/unwind.h>
+#include <asm/cpu.h>
+#include <asm/insn.h>
 
 #if 0
 #define DEBUGP(fmt, ...)				\
@@ -217,6 +219,78 @@ int apply_relocate_add(Elf64_Shdr *sechd
 }
 #endif
 
+static bool insn_is_vmx(struct insn *insn)
+{
+	u8 modrm = insn->modrm.bytes[0];
+	u8 modrm_mod = X86_MODRM_MOD(modrm);
+	u8 modrm_reg = X86_MODRM_REG(modrm);
+
+	u8 prefix = insn->prefixes.bytes[0];
+
+	if (insn->opcode.bytes[0] != 0x0f)
+		return false;
+
+	switch (insn->opcode.bytes[1]) {
+	case 0x01:
+		switch (insn->opcode.bytes[2]) {
+		case 0xc1: /* VMCALL */
+		case 0xc2: /* VMLAUNCH */
+		case 0xc3: /* VMRESUME */
+		case 0xc4: /* VMXOFF */
+			return true;
+
+		default:
+			break;
+		}
+		break;
+
+	case 0x78: /* VMREAD */
+	case 0x79: /* VMWRITE */
+		return true;
+
+	case 0xc7:
+		/* VMPTRLD, VMPTRST, VMCLEAR, VMXON */
+		if (modrm_mod == 0x03)
+			break;
+
+		if ((modrm_reg == 6 && (!prefix || prefix == 0x66 || prefix == 0xf3)) ||
+		    (modrm_reg == 7 && (!prefix || prefix == 0xf3)))
+			return true;
+
+		break;
+
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
+{
+	bool allow_vmx = sld_safe || !split_lock_enabled();
+	struct insn insn;
+
+	while (text < text_end) {
+		kernel_insn_init(&insn, text, text_end - text);
+		insn_get_length(&insn);
+
+		if (WARN_ON_ONCE(!insn_complete(&insn))) {
+			pr_err("Module text malformed: %s\n", mod->name);
+			return -ENOEXEC;
+		}
+
+		if (!allow_vmx && insn_is_vmx(&insn)) {
+			pr_err("Module has VMX instructions and is not marked 'sld_safe', boot with: 'split_lock_detect=off': %s\n", mod->name);
+			return -ENOEXEC;
+		}
+
+		text += insn.length;
+	}
+
+	return 0;
+}
+
 int module_finalize(const struct load_info *info, struct module *me)
 {
 	const Elf_Ehdr *hdr = info->hdr;
@@ -253,6 +327,16 @@ int module_finalize(const struct load_in
 					    tseg, tseg + text->sh_size);
 	}
 
+	if (text && !get_modinfo(info, "intree")) {
+		bool sld_safe = get_modinfo(info, "sld_safe");
+		void *tseg = (void *)text->sh_addr;
+		int ret;
+
+		ret = decode_module(me, tseg, tseg + text->sh_size, sld_safe);
+		if (ret)
+			return ret;
+	}
+
 	if (para) {
 		void *pseg = (void *)para->sh_addr;
 		apply_paravirt(pseg, pseg + para->sh_size);
@@ -261,9 +345,10 @@ int module_finalize(const struct load_in
 	/* make jump label nops */
 	jump_label_apply_nops(me);
 
-	if (orc && orc_ip)
+	if (orc && orc_ip) {
 		unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size,
 				   (void *)orc->sh_addr, orc->sh_size);
+	}
 
 	return 0;
 }
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,8 @@ struct load_info {
 	} index;
 };
 
+extern char *get_modinfo(const struct load_info *info, const char *tag);
+
 /* These may be implemented by architectures that need to hook into the
  * module loader code.  Architectures that don't need to do anything special
  * can just rely on the 'weak' default hooks defined in kernel/module.c.
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1393,7 +1393,6 @@ static inline int same_magic(const char
 }
 #endif /* CONFIG_MODVERSIONS */
 
-static char *get_modinfo(const struct load_info *info, const char *tag);
 static char *get_next_modinfo(const struct load_info *info, const char *tag,
 			      char *prev);
 
@@ -2521,7 +2520,7 @@ static char *get_next_modinfo(const stru
 	return NULL;
 }
 
-static char *get_modinfo(const struct load_info *info, const char *tag)
+char *get_modinfo(const struct load_info *info, const char *tag)
 {
 	return get_next_modinfo(info, tag, NULL);
 }



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 11:02 [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Peter Zijlstra
                   ` (2 preceding siblings ...)
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
@ 2020-04-07 11:02 ` Peter Zijlstra
  2020-04-07 17:01   ` Kees Cook
                     ` (3 more replies)
  2020-04-07 17:23 ` [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Andrew Cooper
  4 siblings, 4 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 11:02 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, peterz,
	jeyu, rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

Since we now have infrastructure to analyze module text, disallow
modules that write to CRn and DRn registers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/module.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
 	return false;
 }
 
+static bool insn_is_mov_CRn(struct insn *insn)
+{
+	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
+		return true;
+
+	return false;
+}
+
+static bool insn_is_mov_DRn(struct insn *insn)
+{
+	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x23)
+		return true;
+
+	return false;
+}
+
 static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
 {
 	bool allow_vmx = sld_safe || !split_lock_enabled();
@@ -285,6 +301,11 @@ static int decode_module(struct module *
 			return -ENOEXEC;
 		}
 
+		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
+			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
+			return -ENOEXEC;
+		}
+
 		text += insn.length;
 	}
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
@ 2020-04-07 14:35   ` Greg KH
  2020-04-07 14:44     ` Paolo Bonzini
                       ` (2 more replies)
  2020-04-07 16:51   ` Masami Hiramatsu
                     ` (4 subsequent siblings)
  5 siblings, 3 replies; 74+ messages in thread
From: Greg KH @ 2020-04-07 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 01:02:39PM +0200, Peter Zijlstra wrote:
> It turns out that with Split-Lock-Detect enabled (default) any VMX
> hypervisor needs at least a little modification in order to not blindly
> inject the #AC into the guest without the guest being ready for it.
> 
> Since there is no telling which module implements a hypervisor, scan
> all out-of-tree modules' text and look for VMX instructions and refuse
> to load it when SLD is enabled (default) and the module isn't marked
> 'sld_safe'.
> 
> Hypervisors, which have been modified and are known to work correctly,
> can add:
> 
>   MODULE_INFO(sld_safe, "Y");
> 
> to explicitly tell the module loader they're good.

What's to keep any out-of-tree module from adding this same module info
"flag" and just lie about it?  Isn't that what you are trying to catch
here, or is it a case of, "if you lie, your code will break" as well?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 14:35   ` Greg KH
@ 2020-04-07 14:44     ` Paolo Bonzini
  2020-04-07 14:55       ` Greg KH
  2020-04-07 14:49     ` Steven Rostedt
  2020-04-07 15:24     ` Peter Zijlstra
  2 siblings, 1 reply; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-07 14:44 UTC (permalink / raw)
  To: Greg KH, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, jannh, keescook,
	David.Laight, dcovelli, mhiramat

On 07/04/20 16:35, Greg KH wrote:
> On Tue, Apr 07, 2020 at 01:02:39PM +0200, Peter Zijlstra wrote:
>> It turns out that with Split-Lock-Detect enabled (default) any VMX
>> hypervisor needs at least a little modification in order to not blindly
>> inject the #AC into the guest without the guest being ready for it.
>>
>> Since there is no telling which module implements a hypervisor, scan
>> all out-of-tree modules' text and look for VMX instructions and refuse
>> to load it when SLD is enabled (default) and the module isn't marked
>> 'sld_safe'.
>>
>> Hypervisors, which have been modified and are known to work correctly,
>> can add:
>>
>>   MODULE_INFO(sld_safe, "Y");
>>
>> to explicitly tell the module loader they're good.
> 
> What's to keep any out-of-tree module from adding this same module info
> "flag" and just lie about it?  Isn't that what you are trying to catch
> here, or is it a case of, "if you lie, your code will break" as well?

It's the latter.  Basically it's doing _the users_ of out-of-tree
modules a favor by avoiding crashes of their virtual machines;
developers need to fix them anyway.

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 14:35   ` Greg KH
  2020-04-07 14:44     ` Paolo Bonzini
@ 2020-04-07 14:49     ` Steven Rostedt
  2020-04-07 15:24     ` Peter Zijlstra
  2 siblings, 0 replies; 74+ messages in thread
From: Steven Rostedt @ 2020-04-07 14:49 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Tue, 7 Apr 2020 16:35:43 +0200
Greg KH <gregkh@linuxfoundation.org> wrote:

> > Hypervisors, which have been modified and are known to work correctly,
> > can add:
> > 
> >   MODULE_INFO(sld_safe, "Y");
> > 
> > to explicitly tell the module loader they're good.  
> 
> What's to keep any out-of-tree module from adding this same module info
> "flag" and just lie about it?  Isn't that what you are trying to catch
> here, or is it a case of, "if you lie, your code will break" as well?

Keeping with the analogy to module kabi breakage, that would basically be
the same as an out of tree module fixing the api but not using it properly.
It will break.

All this is doing is to make sure VM modules that haven't been updated to
handle split lock detection, wont be loaded if split lock detection is
enabled. Saying you can handle SLD and not handling it is just broken code
and we can't really protect against that.

-- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 14:44     ` Paolo Bonzini
@ 2020-04-07 14:55       ` Greg KH
  0 siblings, 0 replies; 74+ messages in thread
From: Greg KH @ 2020-04-07 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 04:44:57PM +0200, Paolo Bonzini wrote:
> On 07/04/20 16:35, Greg KH wrote:
> > On Tue, Apr 07, 2020 at 01:02:39PM +0200, Peter Zijlstra wrote:
> >> It turns out that with Split-Lock-Detect enabled (default) any VMX
> >> hypervisor needs at least a little modification in order to not blindly
> >> inject the #AC into the guest without the guest being ready for it.
> >>
> >> Since there is no telling which module implements a hypervisor, scan
> >> all out-of-tree modules' text and look for VMX instructions and refuse
> >> to load it when SLD is enabled (default) and the module isn't marked
> >> 'sld_safe'.
> >>
> >> Hypervisors, which have been modified and are known to work correctly,
> >> can add:
> >>
> >>   MODULE_INFO(sld_safe, "Y");
> >>
> >> to explicitly tell the module loader they're good.
> > 
> > What's to keep any out-of-tree module from adding this same module info
> > "flag" and just lie about it?  Isn't that what you are trying to catch
> > here, or is it a case of, "if you lie, your code will break" as well?
> 
> It's the latter.  Basically it's doing _the users_ of out-of-tree
> modules a favor by avoiding crashes of their virtual machines;
> developers need to fix them anyway.

Ok, seems kind of a heavy hammer, but oh well...

thanks for the explanation.

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 14:35   ` Greg KH
  2020-04-07 14:44     ` Paolo Bonzini
  2020-04-07 14:49     ` Steven Rostedt
@ 2020-04-07 15:24     ` Peter Zijlstra
  2020-04-07 15:28       ` Paolo Bonzini
  2020-04-07 15:44       ` Greg KH
  2 siblings, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 15:24 UTC (permalink / raw)
  To: Greg KH
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 04:35:43PM +0200, Greg KH wrote:
> On Tue, Apr 07, 2020 at 01:02:39PM +0200, Peter Zijlstra wrote:
> > It turns out that with Split-Lock-Detect enabled (default) any VMX
> > hypervisor needs at least a little modification in order to not blindly
> > inject the #AC into the guest without the guest being ready for it.
> > 
> > Since there is no telling which module implements a hypervisor, scan
> > all out-of-tree modules' text and look for VMX instructions and refuse
> > to load it when SLD is enabled (default) and the module isn't marked
> > 'sld_safe'.
> > 
> > Hypervisors, which have been modified and are known to work correctly,
> > can add:
> > 
> >   MODULE_INFO(sld_safe, "Y");
> > 
> > to explicitly tell the module loader they're good.
> 
> What's to keep any out-of-tree module from adding this same module info
> "flag" and just lie about it?  Isn't that what you are trying to catch
> here, or is it a case of, "if you lie, your code will break" as well?

If they lie they get to keep both pieces.

The thing I worry about is them lying about "intree", is there anything
that avoids that?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 15:24     ` Peter Zijlstra
@ 2020-04-07 15:28       ` Paolo Bonzini
  2020-04-07 15:44       ` Greg KH
  1 sibling, 0 replies; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-07 15:28 UTC (permalink / raw)
  To: Peter Zijlstra, Greg KH
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, rostedt, jannh, keescook,
	David.Laight, dcovelli, mhiramat

On 07/04/20 17:24, Peter Zijlstra wrote:
> The thing I worry about is them lying about "intree", is there anything
> that avoids that?

In QEMU we generate a random-ish number on every build (actually an SHA
of the version and a bunch of other things) and include that in the
modules as the "in tree" marker.

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 15:24     ` Peter Zijlstra
  2020-04-07 15:28       ` Paolo Bonzini
@ 2020-04-07 15:44       ` Greg KH
  1 sibling, 0 replies; 74+ messages in thread
From: Greg KH @ 2020-04-07 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 05:24:12PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 04:35:43PM +0200, Greg KH wrote:
> > On Tue, Apr 07, 2020 at 01:02:39PM +0200, Peter Zijlstra wrote:
> > > It turns out that with Split-Lock-Detect enabled (default) any VMX
> > > hypervisor needs at least a little modification in order to not blindly
> > > inject the #AC into the guest without the guest being ready for it.
> > > 
> > > Since there is no telling which module implements a hypervisor, scan
> > > all out-of-tree modules' text and look for VMX instructions and refuse
> > > to load it when SLD is enabled (default) and the module isn't marked
> > > 'sld_safe'.
> > > 
> > > Hypervisors, which have been modified and are known to work correctly,
> > > can add:
> > > 
> > >   MODULE_INFO(sld_safe, "Y");
> > > 
> > > to explicitly tell the module loader they're good.
> > 
> > What's to keep any out-of-tree module from adding this same module info
> > "flag" and just lie about it?  Isn't that what you are trying to catch
> > here, or is it a case of, "if you lie, your code will break" as well?
> 
> If they lie they get to keep both pieces.
> 
> The thing I worry about is them lying about "intree", is there anything
> that avoids that?

Yeah, the build system should be enforcing that.  I haven't looked in a
while if that really is able to be "faked", but no distro would do that
so it shouldn't be an issue except for custom systems.

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
  2020-04-07 14:35   ` Greg KH
@ 2020-04-07 16:51   ` Masami Hiramatsu
  2020-04-07 17:16     ` Andrew Cooper
  2020-04-08  7:25     ` Masami Hiramatsu
  2020-04-07 18:26   ` kbuild test robot
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 74+ messages in thread
From: Masami Hiramatsu @ 2020-04-07 16:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

Hi Peter,

On Tue, 07 Apr 2020 13:02:39 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> +static bool insn_is_vmx(struct insn *insn)
> +{
> +	u8 modrm = insn->modrm.bytes[0];
> +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> +	u8 modrm_reg = X86_MODRM_REG(modrm);
> +
> +	u8 prefix = insn->prefixes.bytes[0];
> +
> +	if (insn->opcode.bytes[0] != 0x0f)
> +		return false;
> +
> +	switch (insn->opcode.bytes[1]) {
> +	case 0x01:
> +		switch (insn->opcode.bytes[2]) {
> +		case 0xc1: /* VMCALL */
> +		case 0xc2: /* VMLAUNCH */
> +		case 0xc3: /* VMRESUME */
> +		case 0xc4: /* VMXOFF */
> +			return true;
> +
> +		default:
> +			break;
> +		}
> +		break;
> +
> +	case 0x78: /* VMREAD */
> +	case 0x79: /* VMWRITE */
> +		return true;
> +
> +	case 0xc7:
> +		/* VMPTRLD, VMPTRST, VMCLEAR, VMXON */
> +		if (modrm_mod == 0x03)
> +			break;
> +
> +		if ((modrm_reg == 6 && (!prefix || prefix == 0x66 || prefix == 0xf3)) ||
> +		    (modrm_reg == 7 && (!prefix || prefix == 0xf3)))
> +			return true;
> +
> +		break;
> +
> +	default:
> +		break;
> +	}
> +
> +	return false;
> +}

OK, so here is what you need ;)

From 36f4f6aec623b0190fde95c8630a6a1d8c23ffc9 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat@kernel.org>
Date: Wed, 8 Apr 2020 01:04:41 +0900
Subject: [PATCH] x86: insn: Add insn_is_vmx()

Add insn_is_vmx() to identify the given instruction is
for VMX or not. This is simply identifying those instructions
by mnemonic pattern.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/include/asm/inat.h                | 1 +
 arch/x86/include/asm/insn.h                | 7 +++++++
 arch/x86/tools/gen-insn-attr-x86.awk       | 6 ++++++
 tools/arch/x86/include/asm/inat.h          | 1 +
 tools/arch/x86/include/asm/insn.h          | 7 +++++++
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 6 ++++++
 6 files changed, 28 insertions(+)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index ffce45178c08..599876801ae8 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -79,6 +79,7 @@
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
 #define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
 #define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
+#define INAT_VMX	(1 << (INAT_FLAG_OFFS + 10))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 1752c54d2103..57e81013836d 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -141,6 +141,13 @@ static inline int insn_is_fpu(struct insn *insn)
 	return 0;
 }
 
+static inline int insn_is_vmx(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	return (insn->attr & INAT_VMX) && !insn_is_fpu(insn);
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index d74d9e605723..ade80796453c 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -70,6 +70,8 @@ BEGIN {
 	mmx_expr = "^(emms|fxsave|fxrstor|ldmxcsr|stmxcsr)" # MMX/SSE nmemonics lacking operands
 	fpu_expr = "^x87"
 
+	vmx_expr = "^VM.*" # All mnemonic start with "VM" are VMX instructions
+
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
 	lprefix3_expr = "\\((F2|!F3|66&F2)\\)"
@@ -328,6 +330,10 @@ function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 		if (match(ext, force64_expr))
 			flags = add_flags(flags, "INAT_FORCE64")
 
+		# check VMX related opcode
+		if (match(opcode, vmx_expr))
+			flags = add_flags(flags, "INAT_VMX")
+
 		# check REX prefix
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 2e6a05290efd..af393952916c 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -79,6 +79,7 @@
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
 #define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
 #define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
+#define INAT_VMX	(1 << (INAT_FLAG_OFFS + 10))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index d9f6bd9059c1..d18ce4683d8e 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -141,6 +141,13 @@ static inline int insn_is_fpu(struct insn *insn)
 	return 0;
 }
 
+static inline int insn_is_vmx(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	return (insn->attr & INAT_VMX) && !insn_is_fpu(insn);
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index d74d9e605723..ade80796453c 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -70,6 +70,8 @@ BEGIN {
 	mmx_expr = "^(emms|fxsave|fxrstor|ldmxcsr|stmxcsr)" # MMX/SSE nmemonics lacking operands
 	fpu_expr = "^x87"
 
+	vmx_expr = "^VM.*" # All mnemonic start with "VM" are VMX instructions
+
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
 	lprefix3_expr = "\\((F2|!F3|66&F2)\\)"
@@ -328,6 +330,10 @@ function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 		if (match(ext, force64_expr))
 			flags = add_flags(flags, "INAT_FORCE64")
 
+		# check VMX related opcode
+		if (match(opcode, vmx_expr))
+			flags = add_flags(flags, "INAT_VMX")
+
 		# check REX prefix
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
-- 
2.20.1



-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/4] module: Expose load_info to arch module loader code
  2020-04-07 11:02 ` [PATCH 1/4] module: Expose load_info to arch module loader code Peter Zijlstra
@ 2020-04-07 16:52   ` Kees Cook
  0 siblings, 0 replies; 74+ messages in thread
From: Kees Cook @ 2020-04-07 16:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 01:02:37PM +0200, Peter Zijlstra wrote:
> From: Jessica Yu <jeyu@kernel.org>
> 
> The x86 module loader wants to check the value of a modinfo flag
> (sld_safe), before proceeding to scan the module text for VMX
> instructions. Unfortunately the arch module code currently does not have
> access to load_info, but we can easily expose that via moduleloader.h,
> which every arch module code must already include.
> 
> Signed-off-by: Jessica Yu <jeyu@kernel.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20200406160420.14407-1-jeyu@kernel.org
> ---
> 
>  include/linux/moduleloader.h | 20 ++++++++++++++++++++
>  kernel/module-internal.h     | 23 -----------------------
>  kernel/module_signing.c      |  2 +-
>  3 files changed, 21 insertions(+), 24 deletions(-)
> 
> Index: linux-2.6/include/linux/moduleloader.h
> ===================================================================
> --- linux-2.6.orig/include/linux/moduleloader.h
> +++ linux-2.6/include/linux/moduleloader.h
> @@ -6,6 +6,26 @@
>  #include <linux/module.h>
>  #include <linux/elf.h>
>  
> +struct load_info {
> +	const char *name;
> +	/* pointer to module in temporary copy, freed at end of load_module() */
> +	struct module *mod;
> +	Elf_Ehdr *hdr;
> +	unsigned long len;
> +	Elf_Shdr *sechdrs;
> +	char *secstrings, *strtab;
> +	unsigned long symoffs, stroffs, init_typeoffs, core_typeoffs;
> +	struct _ddebug *debug;
> +	unsigned int num_debug;
> +	bool sig_ok;
> +#ifdef CONFIG_KALLSYMS
> +	unsigned long mod_kallsyms_init_off;
> +#endif
> +	struct {
> +		unsigned int sym, str, mod, vers, info, pcpu;
> +	} index;
> +};
> +
>  /* These may be implemented by architectures that need to hook into the
>   * module loader code.  Architectures that don't need to do anything special
>   * can just rely on the 'weak' default hooks defined in kernel/module.c.
> Index: linux-2.6/kernel/module-internal.h
> ===================================================================
> --- linux-2.6.orig/kernel/module-internal.h
> +++ linux-2.6/kernel/module-internal.h
> @@ -5,27 +5,4 @@
>   * Written by David Howells (dhowells@redhat.com)
>   */
>  
> -#include <linux/elf.h>
> -#include <asm/module.h>
> -
> -struct load_info {
> -	const char *name;
> -	/* pointer to module in temporary copy, freed at end of load_module() */
> -	struct module *mod;
> -	Elf_Ehdr *hdr;
> -	unsigned long len;
> -	Elf_Shdr *sechdrs;
> -	char *secstrings, *strtab;
> -	unsigned long symoffs, stroffs, init_typeoffs, core_typeoffs;
> -	struct _ddebug *debug;
> -	unsigned int num_debug;
> -	bool sig_ok;
> -#ifdef CONFIG_KALLSYMS
> -	unsigned long mod_kallsyms_init_off;
> -#endif
> -	struct {
> -		unsigned int sym, str, mod, vers, info, pcpu;
> -	} index;
> -};
> -
>  extern int mod_verify_sig(const void *mod, struct load_info *info);
> Index: linux-2.6/kernel/module_signing.c
> ===================================================================
> --- linux-2.6.orig/kernel/module_signing.c
> +++ linux-2.6/kernel/module_signing.c
> @@ -8,11 +8,11 @@
>  #include <linux/kernel.h>
>  #include <linux/errno.h>
>  #include <linux/module.h>
> +#include <linux/moduleloader.h>
>  #include <linux/module_signature.h>
>  #include <linux/string.h>
>  #include <linux/verification.h>
>  #include <crypto/public_key.h>
> -#include "module-internal.h"
>  
>  /*
>   * Verify the signature on a module.
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/4] module: Convert module_finalize() to load_info
  2020-04-07 11:02 ` [PATCH 2/4] module: Convert module_finalize() to load_info Peter Zijlstra
@ 2020-04-07 16:53   ` Kees Cook
  0 siblings, 0 replies; 74+ messages in thread
From: Kees Cook @ 2020-04-07 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 01:02:38PM +0200, Peter Zijlstra wrote:
> Provide load_info to module_finalize(), such that architectures might,
> for example, use get_modinfo() in their implementation.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/arc/kernel/module.c        |    4 ++--
>  arch/arm/kernel/module.c        |    5 +++--
>  arch/arm64/kernel/module.c      |    6 +++---
>  arch/ia64/kernel/module.c       |    3 +--
>  arch/m68k/kernel/module.c       |    4 +---
>  arch/microblaze/kernel/module.c |    3 +--
>  arch/mips/kernel/module.c       |    6 +++---
>  arch/nds32/kernel/module.c      |    4 +---
>  arch/nios2/kernel/module.c      |    3 +--
>  arch/parisc/kernel/module.c     |    6 +++---
>  arch/powerpc/kernel/module.c    |    5 +++--
>  arch/s390/kernel/module.c       |    6 +++---
>  arch/sh/kernel/module.c         |    6 +++---
>  arch/sparc/kernel/module.c      |    7 ++++---
>  arch/x86/kernel/module.c        |    6 +++---
>  include/linux/moduleloader.h    |    4 +---
>  kernel/module.c                 |    6 ++----
>  17 files changed, 38 insertions(+), 46 deletions(-)
> 
> --- a/arch/arc/kernel/module.c
> +++ b/arch/arc/kernel/module.c
> @@ -129,10 +129,10 @@ int apply_relocate_add(Elf32_Shdr *sechd
>   * This couldn't be done in module_frob_arch_sections() because
>   * relocations had not been applied by then
>   */
> -int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
> -		    struct module *mod)
> +int module_finalize(const struct load_info *info, struct module *mod)
>  {
>  #ifdef CONFIG_ARC_DW2_UNWIND
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	void *unw;
>  	int unwsec = mod->arch.unw_sec_idx;
>  
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -340,9 +340,10 @@ static const Elf_Shdr *find_mod_section(
>  extern void fixup_pv_table(const void *, unsigned long);
>  extern void fixup_smp(const void *, unsigned long);
>  
> -int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
> -		    struct module *mod)
> +int module_finalize(const struct load_info *info, struct module *mod)
>  {
> +	const Elf32_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	const Elf_Shdr *s = NULL;
>  #ifdef CONFIG_ARM_UNWIND
>  	const char *secstrs = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -515,10 +515,10 @@ static int module_init_ftrace_plt(const
>  	return 0;
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	const Elf_Shdr *s;
>  	s = find_section(hdr, sechdrs, ".altinstructions");
>  	if (s)
> --- a/arch/ia64/kernel/module.c
> +++ b/arch/ia64/kernel/module.c
> @@ -902,8 +902,7 @@ register_unwind_table (struct module *mo
>  	}
>  }
>  
> -int
> -module_finalize (const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod)
> +int module_finalize(const struct load_info *info, struct module *mod)
>  {
>  	DEBUGP("%s: init: entry=%p\n", __func__, mod->init);
>  	if (mod->arch.unwind)
> --- a/arch/m68k/kernel/module.c
> +++ b/arch/m68k/kernel/module.c
> @@ -99,9 +99,7 @@ int apply_relocate_add(Elf32_Shdr *sechd
>  	return 0;
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *mod)
> +int module_finalize(const struct load_info *info, struct module *mod)
>  {
>  	module_fixup(mod, mod->arch.fixup_start, mod->arch.fixup_end);
>  	return 0;
> --- a/arch/microblaze/kernel/module.c
> +++ b/arch/microblaze/kernel/module.c
> @@ -114,8 +114,7 @@ int apply_relocate_add(Elf32_Shdr *sechd
>  	return 0;
>  }
>  
> -int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
> -		struct module *module)
> +int module_finalize(const struct load_info *info, struct module *module)
>  {
>  	flush_dcache();
>  	return 0;
> --- a/arch/mips/kernel/module.c
> +++ b/arch/mips/kernel/module.c
> @@ -427,10 +427,10 @@ const struct exception_table_entry *sear
>  }
>  
>  /* Put in dbe list if necessary. */
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	const Elf_Shdr *s;
>  	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
>  
> --- a/arch/nds32/kernel/module.c
> +++ b/arch/nds32/kernel/module.c
> @@ -266,9 +266,7 @@ apply_relocate_add(Elf32_Shdr * sechdrs,
>  	return 0;
>  }
>  
> -int
> -module_finalize(const Elf32_Ehdr * hdr, const Elf_Shdr * sechdrs,
> -		struct module *module)
> +int module_finalize(const struct load_info *info, struct module *module)
>  {
>  	return 0;
>  }
> --- a/arch/nios2/kernel/module.c
> +++ b/arch/nios2/kernel/module.c
> @@ -130,8 +130,7 @@ int apply_relocate_add(Elf32_Shdr *sechd
>  	return 0;
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
> -			struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
>  	flush_cache_all();
>  	return 0;
> --- a/arch/parisc/kernel/module.c
> +++ b/arch/parisc/kernel/module.c
> @@ -854,10 +854,10 @@ deregister_unwind_table(struct module *m
>  		unwind_table_remove(me->arch.unwind);
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	int i;
>  	unsigned long nsyms;
>  	const char *strtab = NULL;
> --- a/arch/powerpc/kernel/module.c
> +++ b/arch/powerpc/kernel/module.c
> @@ -31,9 +31,10 @@ static const Elf_Shdr *find_section(cons
>  	return NULL;
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		const Elf_Shdr *sechdrs, struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	const Elf_Shdr *sect;
>  	int rc;
>  
> --- a/arch/s390/kernel/module.c
> +++ b/arch/s390/kernel/module.c
> @@ -437,10 +437,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs
>  	return 0;
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	const Elf_Shdr *s;
>  	char *secstrings, *secname;
>  	void *aseg;
> --- a/arch/sh/kernel/module.c
> +++ b/arch/sh/kernel/module.c
> @@ -96,10 +96,10 @@ int apply_relocate_add(Elf32_Shdr *sechd
>  	return 0;
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	int ret = 0;
>  
>  	ret |= module_dwarf_finalize(hdr, sechdrs, me);
> --- a/arch/sparc/kernel/module.c
> +++ b/arch/sparc/kernel/module.c
> @@ -204,10 +204,11 @@ static void do_patch_sections(const Elf_
>  	}
>  }
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
> +
>  	/* make jump label nops */
>  	jump_label_apply_nops(me);
>  
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -217,10 +217,10 @@ int apply_relocate_add(Elf64_Shdr *sechd
>  }
>  #endif
>  
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *me)
> +int module_finalize(const struct load_info *info, struct module *me)
>  {
> +	const Elf_Ehdr *hdr = info->hdr;
> +	const Elf_Shdr *sechdrs = info->sechdrs;
>  	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
>  		*para = NULL, *orc = NULL, *orc_ip = NULL;
>  	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -101,9 +101,7 @@ static inline int apply_relocate_add(Elf
>  #endif
>  
>  /* Any final processing of module before access.  Return -error or 0. */
> -int module_finalize(const Elf_Ehdr *hdr,
> -		    const Elf_Shdr *sechdrs,
> -		    struct module *mod);
> +int module_finalize(const struct load_info *info, struct module *mod);
>  
>  /* Any cleanup needed when module leaves. */
>  void module_arch_cleanup(struct module *mod);
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3425,9 +3425,7 @@ static void module_deallocate(struct mod
>  	module_memfree(mod->core_layout.base);
>  }
>  
> -int __weak module_finalize(const Elf_Ehdr *hdr,
> -			   const Elf_Shdr *sechdrs,
> -			   struct module *me)
> +int __weak module_finalize(const struct load_info *info, struct module *me)
>  {
>  	return 0;
>  }
> @@ -3445,7 +3443,7 @@ static int post_relocation(struct module
>  	add_kallsyms(mod, info);
>  
>  	/* Arch-specific module finalizing. */
> -	return module_finalize(info->hdr, info->sechdrs, mod);
> +	return module_finalize(info, mod);
>  }
>  
>  /* Is this module of this name done loading?  No locks held. */
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
@ 2020-04-07 17:01   ` Kees Cook
  2020-04-07 18:13     ` Peter Zijlstra
  2020-04-07 18:55   ` Nadav Amit
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 74+ messages in thread
From: Kees Cook @ 2020-04-07 17:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 01:02:40PM +0200, Peter Zijlstra wrote:
> Since we now have infrastructure to analyze module text, disallow
> modules that write to CRn and DRn registers.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/module.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
>  	return false;
>  }
>  
> +static bool insn_is_mov_CRn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
> +		return true;

I always cringe at numeric literals. Would it be overkill to add defines
for these (and the others that have comments next to them in 3/4)? It
makes stuff easier to grep, etc. (e.g. we have register names in
arch/x86/include/asm/asm.h, do we need instruction names somewhere else?
I assume objtool has a bunch of this too...)

-Kees

> +
> +	return false;
> +}
> +
> +static bool insn_is_mov_DRn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x23)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
>  {
>  	bool allow_vmx = sld_safe || !split_lock_enabled();
> @@ -285,6 +301,11 @@ static int decode_module(struct module *
>  			return -ENOEXEC;
>  		}
>  
> +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
> +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
> +			return -ENOEXEC;
> +		}
> +
>  		text += insn.length;
>  	}
>  
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 16:51   ` Masami Hiramatsu
@ 2020-04-07 17:16     ` Andrew Cooper
  2020-04-07 23:59       ` Masami Hiramatsu
  2020-04-08  7:25     ` Masami Hiramatsu
  1 sibling, 1 reply; 74+ messages in thread
From: Andrew Cooper @ 2020-04-07 17:16 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli

On 07/04/2020 17:51, Masami Hiramatsu wrote:
> diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> index d74d9e605723..ade80796453c 100644
> --- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -70,6 +70,8 @@ BEGIN {
>  	mmx_expr = "^(emms|fxsave|fxrstor|ldmxcsr|stmxcsr)" # MMX/SSE nmemonics lacking operands
>  	fpu_expr = "^x87"
>  
> +	vmx_expr = "^VM.*" # All mnemonic start with "VM" are VMX instructions

Not really.

VMMCALL, VMLOAD, VMSAVE and VMRUN are SVM instructions.

VMASKMOV is a AVX instruction.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 11:02 [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Peter Zijlstra
                   ` (3 preceding siblings ...)
  2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
@ 2020-04-07 17:23 ` Andrew Cooper
  2020-04-07 19:41   ` Peter Zijlstra
  4 siblings, 1 reply; 74+ messages in thread
From: Andrew Cooper @ 2020-04-07 17:23 UTC (permalink / raw)
  To: Peter Zijlstra, tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, jeyu,
	rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li, nadav.amit,
	thellstrom, tony.luck, rostedt, gregkh, jannh, keescook,
	David.Laight, dcovelli, mhiramat

On 07/04/2020 12:02, Peter Zijlstra wrote:
> Hi all,
>
> Driven by the SLD vs VMX interaction, here are some patches that provide means
> to analyze the text of out-of-tree modules.
>
> The first user of that is refusing to load modules on VMX-SLD conflicts, but it
> also has a second patch that refulses to load any module that tries to modify
> CRn/DRn.
>
> I'm thinking people will quickly come up with more and more elaborate tests to
> which to subject out-of-tree modules.

Anything playing with LGDT & friends?  Shouldn't be substantially more
elaborate than CR/DR to check for.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 17:01   ` Kees Cook
@ 2020-04-07 18:13     ` Peter Zijlstra
  2020-04-07 18:49       ` Kees Cook
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 18:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 10:01:04AM -0700, Kees Cook wrote:
> On Tue, Apr 07, 2020 at 01:02:40PM +0200, Peter Zijlstra wrote:
> > Since we now have infrastructure to analyze module text, disallow
> > modules that write to CRn and DRn registers.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/module.c |   21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > --- a/arch/x86/kernel/module.c
> > +++ b/arch/x86/kernel/module.c
> > @@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
> >  	return false;
> >  }
> >  
> > +static bool insn_is_mov_CRn(struct insn *insn)
> > +{
> > +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
> > +		return true;
> 
> I always cringe at numeric literals. Would it be overkill to add defines
> for these (and the others that have comments next to them in 3/4)? It
> makes stuff easier to grep, etc. (e.g. we have register names in
> arch/x86/include/asm/asm.h, do we need instruction names somewhere else?
> I assume objtool has a bunch of this too...)

objtool does not, have a peek at tools/objtool/arch/x86/decode.c

I'm not sure what the best way is here, the x86 opcode map is a
disaster. Even the mnemonic doesn't help us here, as that's just MOV :/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
  2020-04-07 14:35   ` Greg KH
  2020-04-07 16:51   ` Masami Hiramatsu
@ 2020-04-07 18:26   ` kbuild test robot
  2020-04-07 21:25   ` David Laight
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 74+ messages in thread
From: kbuild test robot @ 2020-04-07 18:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kbuild-all, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, peterz, jeyu, rasmus.villemoes,
	pbonzini, fenghua.yu, xiaoyao.li, nadav.amit, thellstrom,
	tony.luck, rostedt, gregkh, jannh, keescook, David.Laight,
	dcovelli, mhiramat

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

Hi Peter,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on linus/master next-20200407]
[cannot apply to jeyu/modules-next tip/x86/core v5.6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Peter-Zijlstra/x86-module-Out-of-tree-module-decode-and-sanitize/20200408-011239
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ee8bac724cc7767dcf9480afb656318994f22c3d
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   /usr/bin/ld: arch/x86/um/../kernel/module.o: in function `decode_module':
>> module.c:(.text+0x26): undefined reference to `split_lock_enabled'
>> /usr/bin/ld: module.c:(.text+0x39): undefined reference to `insn_init'
>> /usr/bin/ld: module.c:(.text+0x6b): undefined reference to `insn_get_length'
   collect2: error: ld returned 1 exit status

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8501 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 18:13     ` Peter Zijlstra
@ 2020-04-07 18:49       ` Kees Cook
  0 siblings, 0 replies; 74+ messages in thread
From: Kees Cook @ 2020-04-07 18:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 08:13:25PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 10:01:04AM -0700, Kees Cook wrote:
> > On Tue, Apr 07, 2020 at 01:02:40PM +0200, Peter Zijlstra wrote:
> > > Since we now have infrastructure to analyze module text, disallow
> > > modules that write to CRn and DRn registers.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/kernel/module.c |   21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > > 
> > > --- a/arch/x86/kernel/module.c
> > > +++ b/arch/x86/kernel/module.c
> > > @@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
> > >  	return false;
> > >  }
> > >  
> > > +static bool insn_is_mov_CRn(struct insn *insn)
> > > +{
> > > +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
> > > +		return true;
> > 
> > I always cringe at numeric literals. Would it be overkill to add defines
> > for these (and the others that have comments next to them in 3/4)? It
> > makes stuff easier to grep, etc. (e.g. we have register names in
> > arch/x86/include/asm/asm.h, do we need instruction names somewhere else?
> > I assume objtool has a bunch of this too...)
> 
> objtool does not, have a peek at tools/objtool/arch/x86/decode.c

Eek.

> I'm not sure what the best way is here, the x86 opcode map is a
> disaster. Even the mnemonic doesn't help us here, as that's just MOV :/

Yeah, I'm not sure either. I guess leave this as-is.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
  2020-04-07 17:01   ` Kees Cook
@ 2020-04-07 18:55   ` Nadav Amit
  2020-04-07 19:38     ` Peter Zijlstra
  2020-04-07 21:48   ` Steven Rostedt
  2020-04-08 13:27   ` Steven Rostedt
  3 siblings, 1 reply; 74+ messages in thread
From: Nadav Amit @ 2020-04-07 18:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

> On Apr 7, 2020, at 4:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> Since we now have infrastructure to analyze module text, disallow
> modules that write to CRn and DRn registers.

Assuming the kernel is built without CONFIG_PARAVIRT, what is the right way
for out-of-tree modules to write to CRs? Let’s say CR2?


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 18:55   ` Nadav Amit
@ 2020-04-07 19:38     ` Peter Zijlstra
  2020-04-07 20:27       ` Nadav Amit
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 19:38 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

On Tue, Apr 07, 2020 at 11:55:21AM -0700, Nadav Amit wrote:
> > On Apr 7, 2020, at 4:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > Since we now have infrastructure to analyze module text, disallow
> > modules that write to CRn and DRn registers.
> 
> Assuming the kernel is built without CONFIG_PARAVIRT, what is the right way
> for out-of-tree modules to write to CRs? Let’s say CR2?

Most of them there is no real justification for ever writing to. CR2 I
suppose we can have an exception for given a sane rationale for why
you'd need to rewrite the fault address.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 17:23 ` [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Andrew Cooper
@ 2020-04-07 19:41   ` Peter Zijlstra
  2020-04-07 20:11     ` Andrew Cooper
  2020-04-07 20:21     ` Andrew Cooper
  0 siblings, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 19:41 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 06:23:27PM +0100, Andrew Cooper wrote:
> On 07/04/2020 12:02, Peter Zijlstra wrote:
> > Hi all,
> >
> > Driven by the SLD vs VMX interaction, here are some patches that provide means
> > to analyze the text of out-of-tree modules.
> >
> > The first user of that is refusing to load modules on VMX-SLD conflicts, but it
> > also has a second patch that refulses to load any module that tries to modify
> > CRn/DRn.
> >
> > I'm thinking people will quickly come up with more and more elaborate tests to
> > which to subject out-of-tree modules.
> 
> Anything playing with LGDT & friends?  Shouldn't be substantially more
> elaborate than CR/DR to check for.

More friends? (I wasn't sure on the Sxxx instructions, they appear
harmless, but what do I know..)

I was also eyeing LSL LTR LSS, none of which I figured a module has any
business of using. Are there more?

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -282,6 +282,50 @@ static bool insn_is_mov_DRn(struct insn
 	return false;
 }
 
+static bool insn_is_LxDT(struct insn *insn)
+{
+	u8 modrm = insn->modrm.bytes[0];
+	u8 modrm_mod = X86_MODRM_MOD(modrm);
+	u8 modrm_reg = X86_MODRM_REG(modrm);
+
+	if (insn->opcode.bytes[0] != 0x0f)
+		return false;
+
+	switch (insn->opcode.bytes[1]) {
+	case 0x00:
+		if (modrm_mod != 0x03)
+			break;
+
+		switch (modrm_reg) {
+		case 0x0: /* SLDT */
+		case 0x2: /* LLDT */
+			return true;
+
+		default:
+			break;
+		}
+		break;
+
+	case 0x01:
+		switch (modrm_reg) {
+		case 0x0: /* SGDT */
+		case 0x1: /* SIDT */
+		case 0x2: /* LGDT */
+		case 0x3: /* LIDT */
+			return true;
+
+		default:
+			break;
+		}
+		break;
+
+	default:
+		break;
+	}
+
+	return false;
+}
+
 static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
 {
 	bool allow_vmx = sld_safe || !split_lock_enabled();


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 19:41   ` Peter Zijlstra
@ 2020-04-07 20:11     ` Andrew Cooper
  2020-04-07 20:45       ` Peter Zijlstra
  2020-04-07 20:21     ` Andrew Cooper
  1 sibling, 1 reply; 74+ messages in thread
From: Andrew Cooper @ 2020-04-07 20:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On 07/04/2020 20:41, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 06:23:27PM +0100, Andrew Cooper wrote:
>> On 07/04/2020 12:02, Peter Zijlstra wrote:
>>> Hi all,
>>>
>>> Driven by the SLD vs VMX interaction, here are some patches that provide means
>>> to analyze the text of out-of-tree modules.
>>>
>>> The first user of that is refusing to load modules on VMX-SLD conflicts, but it
>>> also has a second patch that refulses to load any module that tries to modify
>>> CRn/DRn.
>>>
>>> I'm thinking people will quickly come up with more and more elaborate tests to
>>> which to subject out-of-tree modules.
>> Anything playing with LGDT & friends?  Shouldn't be substantially more
>> elaborate than CR/DR to check for.
> More friends? (I wasn't sure on the Sxxx instructions, they appear
> harmless, but what do I know..)
>
> I was also eyeing LSL LTR LSS, none of which I figured a module has any
> business of using. Are there more?

Sorry - should have been more clear.  By friends, I meant LGDT, LIDT,
LLDT and LTR which are the 4 system table loading instructions.  LLDT
and LTR depend on being able to write into the GDT, but still have no
business being used.

Also, LMSW if you care about it, but its utility is somewhere close to 0
these days, so probably not worth the cycles searching for.

The Sxxx instructions have no business being used, but are also harmless
and similarly, probably not worth spending cycles searching for.

L{D,E,F,S}S are functional equivalents to "MOV val1, %sreg; mov val2,
%reg"  so harmless (also mode specific as to whether they are useable).


Other things to consider, while we're on a roll:

WRMSR and RDMSR:  There is a lot of damage which can be done with these,
and at least forcing people to use the regular hooks will get proper
paravirt support and/or exception support.  That said, this might cause
large carnage to out-of-tree modules which are a device driver for
random platform things.

POPF: Don't really want someone being able to set IOPL3.  However, this
might quite easily show up as a false positive depending on how the
irqsafe infrastructure gets inlined.

SYSRET/SYSEXIT/IRET: Don't want a module returning to userspace behind
the kernels back.  IRET may be a false positive for serialising
purposes, as may be a write to CR2 for that matter.

Looking over the list of other privileged instructions, CLTS,
{,WB,WBNO}INVD, INVLPG and HLT might be candidates for "clearly doing
something which shouldn't be done".  Not on the list is INVPCID which
falls into the same category.

Come to think about it, it might be easier to gauge on CPL0 instructions
and whitelist the ok ones, such as VMX and SVM for out-of-tree hypervisors.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 19:41   ` Peter Zijlstra
  2020-04-07 20:11     ` Andrew Cooper
@ 2020-04-07 20:21     ` Andrew Cooper
  2020-04-07 20:48       ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Andrew Cooper @ 2020-04-07 20:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On 07/04/2020 20:41, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 06:23:27PM +0100, Andrew Cooper wrote:
>> On 07/04/2020 12:02, Peter Zijlstra wrote:
>>> Hi all,
>>>
>>> Driven by the SLD vs VMX interaction, here are some patches that provide means
>>> to analyze the text of out-of-tree modules.
>>>
>>> The first user of that is refusing to load modules on VMX-SLD conflicts, but it
>>> also has a second patch that refulses to load any module that tries to modify
>>> CRn/DRn.
>>>
>>> I'm thinking people will quickly come up with more and more elaborate tests to
>>> which to subject out-of-tree modules.
>> Anything playing with LGDT & friends?  Shouldn't be substantially more
>> elaborate than CR/DR to check for.
> More friends? (I wasn't sure on the Sxxx instructions, they appear
> harmless, but what do I know..)
>
> I was also eyeing LSL LTR LSS, none of which I figured a module has any
> business of using. Are there more?
>
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -282,6 +282,50 @@ static bool insn_is_mov_DRn(struct insn
>  	return false;
>  }
>  
> +static bool insn_is_LxDT(struct insn *insn)
> +{
> +	u8 modrm = insn->modrm.bytes[0];
> +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> +	u8 modrm_reg = X86_MODRM_REG(modrm);
> +
> +	if (insn->opcode.bytes[0] != 0x0f)
> +		return false;
> +
> +	switch (insn->opcode.bytes[1]) {
> +	case 0x00:
> +		if (modrm_mod != 0x03)
> +			break;
> +

Apologies - missed this before.  LLDT and LTR can be encoded with a
memory operand, so you need to drop the modrm_mod check to spot all
instances.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 19:38     ` Peter Zijlstra
@ 2020-04-07 20:27       ` Nadav Amit
  2020-04-07 20:50         ` Peter Zijlstra
  0 siblings, 1 reply; 74+ messages in thread
From: Nadav Amit @ 2020-04-07 20:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

> On Apr 7, 2020, at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Apr 07, 2020 at 11:55:21AM -0700, Nadav Amit wrote:
>>> On Apr 7, 2020, at 4:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> Since we now have infrastructure to analyze module text, disallow
>>> modules that write to CRn and DRn registers.
>> 
>> Assuming the kernel is built without CONFIG_PARAVIRT, what is the right way
>> for out-of-tree modules to write to CRs? Let’s say CR2?
> 
> Most of them there is no real justification for ever writing to. CR2 I
> suppose we can have an exception for given a sane rationale for why
> you'd need to rewrite the fault address.

For the same reason that KVM writes to CR2 - to restore CR2 before entering
a guest, since CR2 not architecturally loaded from the VMCS. I suspect there
are additional use-cases which are not covered by the kernel interfaces.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 20:11     ` Andrew Cooper
@ 2020-04-07 20:45       ` Peter Zijlstra
  2020-04-07 21:21         ` Andrew Cooper
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 20:45 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 09:11:17PM +0100, Andrew Cooper wrote:

> Sorry - should have been more clear.  By friends, I meant LGDT, LIDT,
> LLDT and LTR which are the 4 system table loading instructions.  LLDT
> and LTR depend on being able to write into the GDT, but still have no
> business being used.
> 
> Also, LMSW if you care about it, but its utility is somewhere close to 0
> these days, so probably not worth the cycles searching for.
> 
> The Sxxx instructions have no business being used, but are also harmless
> and similarly, probably not worth spending cycles searching for.
> 
> L{D,E,F,S}S are functional equivalents to "MOV val1, %sreg; mov val2,
> %reg"  so harmless (also mode specific as to whether they are useable).

OK, LxDT + LTR it is.

> Other things to consider, while we're on a roll:
> 
> WRMSR and RDMSR:  There is a lot of damage which can be done with these,
> and at least forcing people to use the regular hooks will get proper
> paravirt support and/or exception support.  That said, this might cause
> large carnage to out-of-tree modules which are a device driver for
> random platform things.

Yeah, I had already considered that, didn't want to touch that just yet.

> POPF: Don't really want someone being able to set IOPL3.  However, this
> might quite easily show up as a false positive depending on how the
> irqsafe infrastructure gets inlined.

local_irq_restore() will be a POPF :/

> SYSRET/SYSEXIT/IRET: Don't want a module returning to userspace behind
> the kernels back. 

Good thinking, let me add this.

> IRET may be a false positive for serialising
> purposes, as may be a write to CR2 for that matter.

We can out-of-line and export sync_core() for that.

> Looking over the list of other privileged instructions, CLTS,
> {,WB,WBNO}INVD, INVLPG and HLT might be candidates for "clearly doing
> something which shouldn't be done".  Not on the list is INVPCID which
> falls into the same category.
> 
> Come to think about it, it might be easier to gauge on CPL0 instructions
> and whitelist the ok ones, such as VMX and SVM for out-of-tree hypervisors.

Fair enough, I'll go over those tomorrow.

For now I ended up with:

---
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -282,6 +282,68 @@ static bool insn_is_mov_DRn(struct insn
 	return false;
 }

+static bool insn_is_GDT_modifier(struct insn *insn)
+{
+	u8 modrm = insn->modrm.bytes[0];
+	u8 modrm_mod = X86_MODRM_MOD(modrm);
+	u8 modrm_reg = X86_MODRM_REG(modrm);
+
+	if (insn->opcode.bytes[0] != 0x0f)
+		return false;
+
+	switch (insn->opcode.bytes[1]) {
+	case 0x00: /* Grp6 */
+		switch (modrm_reg) {
+		/* case 0x0: SLDT */
+		case 0x2: /* LLDT */
+		case 0x3: /* LTR */
+			return true;
+
+		default:
+			break;
+		}
+		break;
+
+	case 0x01: /* Grp7 */
+		if (modrm_mod == 0x03)
+			break;
+
+		switch (modrm_reg) {
+		/* case 0x0: SGDT */
+		/* case 0x1: SIDT */
+		case 0x2: /* LGDT */
+		case 0x3: /* LIDT */
+			return true;
+
+		default:
+			break;
+		}
+		break;
+
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static bool insn_is_xRET(struct insn *insn)
+{
+	u8 op1 = insn->opcode.bytes[0];
+	u8 op2 = insn->opcode.bytes[1];
+
+	if (op1 == 0xcf) /* IRET */
+		return true;
+
+	if (op1 != 0x0f)
+		return false;
+
+	if (op2 == 0x07 || op2 == 0x35) /* SYSRET, SYSEXIT */
+		return true;
+
+	return false;
+}
+
 static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
 {
 	bool allow_vmx = sld_safe || !split_lock_enabled();


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 20:21     ` Andrew Cooper
@ 2020-04-07 20:48       ` Peter Zijlstra
  0 siblings, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 20:48 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 09:21:31PM +0100, Andrew Cooper wrote:
> > +	switch (insn->opcode.bytes[1]) {
> > +	case 0x00:
> > +		if (modrm_mod != 0x03)
> > +			break;
> > +
> 
> Apologies - missed this before.  LLDT and LTR can be encoded with a
> memory operand, so you need to drop the modrm_mod check to spot all
> instances.

I spotted the same, already fixed. Sorry for the mistake, reading opcode
tables it a pain at the best of times :/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 20:27       ` Nadav Amit
@ 2020-04-07 20:50         ` Peter Zijlstra
  2020-04-07 21:22           ` Nadav Amit
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 20:50 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

On Tue, Apr 07, 2020 at 01:27:45PM -0700, Nadav Amit wrote:
> > On Apr 7, 2020, at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Tue, Apr 07, 2020 at 11:55:21AM -0700, Nadav Amit wrote:
> >>> On Apr 7, 2020, at 4:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>> 
> >>> Since we now have infrastructure to analyze module text, disallow
> >>> modules that write to CRn and DRn registers.
> >> 
> >> Assuming the kernel is built without CONFIG_PARAVIRT, what is the right way
> >> for out-of-tree modules to write to CRs? Let’s say CR2?
> > 
> > Most of them there is no real justification for ever writing to. CR2 I
> > suppose we can have an exception for given a sane rationale for why
> > you'd need to rewrite the fault address.
> 
> For the same reason that KVM writes to CR2 - to restore CR2 before entering
> a guest, since CR2 not architecturally loaded from the VMCS. I suspect there
> are additional use-cases which are not covered by the kernel interfaces.

So I'm not much of a virt guy (clearly), and *groan*, that's horrible.
I'll go make an exception for CR2.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize
  2020-04-07 20:45       ` Peter Zijlstra
@ 2020-04-07 21:21         ` Andrew Cooper
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Cooper @ 2020-04-07 21:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On 07/04/2020 21:45, Peter Zijlstra wrote:
>> POPF: Don't really want someone being able to set IOPL3.  However, this
>> might quite easily show up as a false positive depending on how the
>> irqsafe infrastructure gets inlined.
> local_irq_restore() will be a POPF :/

Ok.  Something to consider in an orthogonal direction.  A while ago, I
put this into Xen as a security fix:

iret_exit_to_guest:
        andl  $~(X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_VM),24(%rsp)
        orl   $X86_EFLAGS_IF,24(%rsp)
        addq  $8,%rsp
.Lft0:  iretq

which unconditionally fixes up the unsafe flags even if something
manages to slips through (e.g. local_irq_restore() against stack
rubble).  It turns out that it has saved us several CVEs in the
intervening time.

Is this the kind of things the hardening folk would be interested in?

> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -282,6 +282,68 @@ static bool insn_is_mov_DRn(struct insn
>  	return false;
>  }
>
> +static bool insn_is_GDT_modifier(struct insn *insn)
> +{
> +	u8 modrm = insn->modrm.bytes[0];
> +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> +	u8 modrm_reg = X86_MODRM_REG(modrm);
> +
> +	if (insn->opcode.bytes[0] != 0x0f)
> +		return false;
> +
> +	switch (insn->opcode.bytes[1]) {
> +	case 0x00: /* Grp6 */
> +		switch (modrm_reg) {
> +		/* case 0x0: SLDT */
> +		case 0x2: /* LLDT */
> +		case 0x3: /* LTR */
> +			return true;

Come to think of it, if you include the Sxxx variants, a sufficiently
clever compiler can collapse this entire switch statement into a single
"and $~3, modrm_reg" instruction, rather than being forced to use "and
$~1, modrm_reg; cmp $2, ...".

Probably on the extreme end of micro-optimising however.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 20:50         ` Peter Zijlstra
@ 2020-04-07 21:22           ` Nadav Amit
  2020-04-07 21:27             ` Peter Zijlstra
  2020-04-07 23:15             ` Andrew Cooper
  0 siblings, 2 replies; 74+ messages in thread
From: Nadav Amit @ 2020-04-07 21:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

> On Apr 7, 2020, at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Apr 07, 2020 at 01:27:45PM -0700, Nadav Amit wrote:
>>> On Apr 7, 2020, at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> On Tue, Apr 07, 2020 at 11:55:21AM -0700, Nadav Amit wrote:
>>>>> On Apr 7, 2020, at 4:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>> 
>>>>> Since we now have infrastructure to analyze module text, disallow
>>>>> modules that write to CRn and DRn registers.
>>>> 
>>>> Assuming the kernel is built without CONFIG_PARAVIRT, what is the right way
>>>> for out-of-tree modules to write to CRs? Let’s say CR2?
>>> 
>>> Most of them there is no real justification for ever writing to. CR2 I
>>> suppose we can have an exception for given a sane rationale for why
>>> you'd need to rewrite the fault address.
>> 
>> For the same reason that KVM writes to CR2 - to restore CR2 before entering
>> a guest, since CR2 not architecturally loaded from the VMCS. I suspect there
>> are additional use-cases which are not covered by the kernel interfaces.
> 
> So I'm not much of a virt guy (clearly), and *groan*, that's horrible.
> I'll go make an exception for CR2.

Clearly you are not a virt guy if you think that this is the horrible part
in x86 virtualization ;-)

Anyhow, I do not think it is the only use-case which is not covered by your
patches (even considering CRs/DRs alone). For example, there is no kernel
function to turn on CR4.VMXE, which is required to run hypervisors on x86.

I think a thorough analysis of existing software is needed to figure out
which use-cases are valid, and to exclude them during module scanning or to
provide alternative kernel interfaces to enable them. This may require a
transition phase in which module scanning would only issue warnings and
would not prevent the module from being loaded.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
                     ` (2 preceding siblings ...)
  2020-04-07 18:26   ` kbuild test robot
@ 2020-04-07 21:25   ` David Laight
  2020-04-07 23:15     ` Kees Cook
  2020-04-08  2:10   ` Xiaoyao Li
  2020-04-08  8:09   ` Masami Hiramatsu
  5 siblings, 1 reply; 74+ messages in thread
From: David Laight @ 2020-04-07 21:25 UTC (permalink / raw)
  To: 'Peter Zijlstra', tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, jeyu,
	rasmus.villemoes, pbonzini, fenghua.yu, xiaoyao.li, nadav.amit,
	thellstrom, tony.luck, rostedt, gregkh, jannh, keescook,
	dcovelli, mhiramat

From: Peter Zijlstra
> Sent: 07 April 2020 12:03
> 
> It turns out that with Split-Lock-Detect enabled (default) any VMX
> hypervisor needs at least a little modification in order to not blindly
> inject the #AC into the guest without the guest being ready for it.
> 
> Since there is no telling which module implements a hypervisor, scan
> all out-of-tree modules' text and look for VMX instructions and refuse
> to load it when SLD is enabled (default) and the module isn't marked
> 'sld_safe'.
...
> +	while (text < text_end) {
> +		kernel_insn_init(&insn, text, text_end - text);
> +		insn_get_length(&insn);
> +
> +		if (WARN_ON_ONCE(!insn_complete(&insn))) {
> +			pr_err("Module text malformed: %s\n", mod->name);
> +			return -ENOEXEC;
> +		}
> +
> +		if (!allow_vmx && insn_is_vmx(&insn)) {
> +			pr_err("Module has VMX instructions and is not marked 'sld_safe', boot with:
> 'split_lock_detect=off': %s\n", mod->name);
> +			return -ENOEXEC;
> +		}
> +
> +		text += insn.length;
> +	}

There is a slight flaw in the above.
A malicious module can hide the required instruction by jumping into the
middle of a long instruction.

Even checking branch targets hit instruction barriers isn't enough,
an indirect jump could be used.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 21:22           ` Nadav Amit
@ 2020-04-07 21:27             ` Peter Zijlstra
  2020-04-07 22:12               ` Paolo Bonzini
  2020-04-08  5:18               ` Christoph Hellwig
  2020-04-07 23:15             ` Andrew Cooper
  1 sibling, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-07 21:27 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

On Tue, Apr 07, 2020 at 02:22:11PM -0700, Nadav Amit wrote:
> Anyhow, I do not think it is the only use-case which is not covered by your
> patches (even considering CRs/DRs alone). For example, there is no kernel
> function to turn on CR4.VMXE, which is required to run hypervisors on x86.

That needs an exported function; there is no way we'll allow random
writes to CR4, there's too much dodgy stuff in there.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
  2020-04-07 17:01   ` Kees Cook
  2020-04-07 18:55   ` Nadav Amit
@ 2020-04-07 21:48   ` Steven Rostedt
  2020-04-08  5:58     ` Jan Kiszka
  2020-04-08 13:27   ` Steven Rostedt
  3 siblings, 1 reply; 74+ messages in thread
From: Steven Rostedt @ 2020-04-07 21:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat, Jan Kiszka,
	Wolfgang Mauerer

On Tue, 07 Apr 2020 13:02:40 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Since we now have infrastructure to analyze module text, disallow
> modules that write to CRn and DRn registers.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/module.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
>  	return false;
>  }
>  
> +static bool insn_is_mov_CRn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
> +		return true;
> +
> +	return false;
> +}
> +
> +static bool insn_is_mov_DRn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x23)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
>  {
>  	bool allow_vmx = sld_safe || !split_lock_enabled();
> @@ -285,6 +301,11 @@ static int decode_module(struct module *
>  			return -ENOEXEC;
>  		}
>  
> +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
> +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
> +			return -ENOEXEC;
> +		}

Hmm, wont this break jailhouse?

-- Steve

> +
>  		text += insn.length;
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 21:27             ` Peter Zijlstra
@ 2020-04-07 22:12               ` Paolo Bonzini
  2020-04-07 23:51                 ` Nadav Amit
  2020-04-08  8:45                 ` Peter Zijlstra
  2020-04-08  5:18               ` Christoph Hellwig
  1 sibling, 2 replies; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-07 22:12 UTC (permalink / raw)
  To: Peter Zijlstra, Nadav Amit
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes, Fenghua Yu,
	Xiaoyao Li, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook, David.Laight, Doug Covelli,
	mhiramat

On 07/04/20 23:27, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 02:22:11PM -0700, Nadav Amit wrote:
>> Anyhow, I do not think it is the only use-case which is not covered by your
>> patches (even considering CRs/DRs alone). For example, there is no kernel
>> function to turn on CR4.VMXE, which is required to run hypervisors on x86.
> That needs an exported function; there is no way we'll allow random
> writes to CR4, there's too much dodgy stuff in there.

native_write_cr4 and pv_ops (through which you can do write_cr4) are
both exported, and so is cpu_tlbstate which is used by __cr4_set_bits
and friends.  Am I missing something glaringly obvious?

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 21:22           ` Nadav Amit
  2020-04-07 21:27             ` Peter Zijlstra
@ 2020-04-07 23:15             ` Andrew Cooper
  2020-04-08  0:22               ` Paolo Bonzini
  1 sibling, 1 reply; 74+ messages in thread
From: Andrew Cooper @ 2020-04-07 23:15 UTC (permalink / raw)
  To: Nadav Amit, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

On 07/04/2020 22:22, Nadav Amit wrote:
>> On Apr 7, 2020, at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Tue, Apr 07, 2020 at 01:27:45PM -0700, Nadav Amit wrote:
>>>> On Apr 7, 2020, at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>> On Tue, Apr 07, 2020 at 11:55:21AM -0700, Nadav Amit wrote:
>>>>>> On Apr 7, 2020, at 4:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>
>>>>>> Since we now have infrastructure to analyze module text, disallow
>>>>>> modules that write to CRn and DRn registers.
>>>>> Assuming the kernel is built without CONFIG_PARAVIRT, what is the right way
>>>>> for out-of-tree modules to write to CRs? Let’s say CR2?
>>>> Most of them there is no real justification for ever writing to. CR2 I
>>>> suppose we can have an exception for given a sane rationale for why
>>>> you'd need to rewrite the fault address.
>>> For the same reason that KVM writes to CR2 - to restore CR2 before entering
>>> a guest, since CR2 not architecturally loaded from the VMCS. I suspect there
>>> are additional use-cases which are not covered by the kernel interfaces.
>> So I'm not much of a virt guy (clearly), and *groan*, that's horrible.
>> I'll go make an exception for CR2.
> Clearly you are not a virt guy if you think that this is the horrible part
> in x86 virtualization ;-)

Very definitely seconded :)

> Anyhow, I do not think it is the only use-case which is not covered by your
> patches (even considering CRs/DRs alone). For example, there is no kernel
> function to turn on CR4.VMXE, which is required to run hypervisors on x86.

How about taking this opportunity to see if there is a way to improve on
the status quo for co-existing hypervisor modules?

ISTR there are a large number of hoops to jump through if you're not
sure if anything else in the system is using VMX, going as far as
needing to do a full VMXON/VMXOFF cycle each context.

Enabling CR4.VMXE might want to be done proactively by the kernel. 
Amongst other things, it gives protection against stray INIT IPIs in the
system, although the interaction with SMX (tboot / trenchboot) wants
considering carefully.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 21:25   ` David Laight
@ 2020-04-07 23:15     ` Kees Cook
  0 siblings, 0 replies; 74+ messages in thread
From: Kees Cook @ 2020-04-07 23:15 UTC (permalink / raw)
  To: David Laight
  Cc: 'Peter Zijlstra',
	tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, dcovelli, mhiramat

On Tue, Apr 07, 2020 at 09:25:56PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 07 April 2020 12:03
> > 
> > It turns out that with Split-Lock-Detect enabled (default) any VMX
> > hypervisor needs at least a little modification in order to not blindly
> > inject the #AC into the guest without the guest being ready for it.
> > 
> > Since there is no telling which module implements a hypervisor, scan
> > all out-of-tree modules' text and look for VMX instructions and refuse
> > to load it when SLD is enabled (default) and the module isn't marked
> > 'sld_safe'.
> ...
> > +	while (text < text_end) {
> > +		kernel_insn_init(&insn, text, text_end - text);
> > +		insn_get_length(&insn);
> > +
> > +		if (WARN_ON_ONCE(!insn_complete(&insn))) {
> > +			pr_err("Module text malformed: %s\n", mod->name);
> > +			return -ENOEXEC;
> > +		}
> > +
> > +		if (!allow_vmx && insn_is_vmx(&insn)) {
> > +			pr_err("Module has VMX instructions and is not marked 'sld_safe', boot with:
> > 'split_lock_detect=off': %s\n", mod->name);
> > +			return -ENOEXEC;
> > +		}
> > +
> > +		text += insn.length;
> > +	}
> 
> There is a slight flaw in the above.
> A malicious module can hide the required instruction by jumping into the
> middle of a long instruction.
> 
> Even checking branch targets hit instruction barriers isn't enough,
> an indirect jump could be used.

If I understand the goals here, it's to provide feedback for good actors
doing things that they don't realize aren't safe. Trying to stop a
malicious module from doing malicious things is basically impossible:
it can just load a data blob and self-modify, etc. :)

Though, Peter, this does get me thinking: if this is meant to be helpful
for module authors tripping over things they shouldn't be touching,
perhaps every test needs to include explicit recommendations? It's
_kind_ of doing this already. Maybe the above should be:

pr_err("%s: contains VMX instructions but is not marked 'sld_safe'. Please see <url> or boot with: 'split_lock_detect=off' to ignore.\n", mod->name);

?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 22:12               ` Paolo Bonzini
@ 2020-04-07 23:51                 ` Nadav Amit
  2020-04-08  8:45                 ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Nadav Amit @ 2020-04-07 23:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, hch, Sean Christopherson,
	mingo, bp, hpa, x86, Kenneth R. Crudup, Jessica Yu,
	Rasmus Villemoes, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

> On Apr 7, 2020, at 3:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 07/04/20 23:27, Peter Zijlstra wrote:
>> On Tue, Apr 07, 2020 at 02:22:11PM -0700, Nadav Amit wrote:
>>> Anyhow, I do not think it is the only use-case which is not covered by your
>>> patches (even considering CRs/DRs alone). For example, there is no kernel
>>> function to turn on CR4.VMXE, which is required to run hypervisors on x86.
>> That needs an exported function; there is no way we'll allow random
>> writes to CR4, there's too much dodgy stuff in there.
> 
> native_write_cr4 and pv_ops (through which you can do write_cr4) are
> both exported, and so is cpu_tlbstate which is used by __cr4_set_bits
> and friends.  Am I missing something glaringly obvious?

No, I was the one who missed the obvious thing.

Having said that, I still think there are additional cases that need to be
handled. For instance, I see that is_erratum_383() in KVM (AMD) flushes the
local TLB by writing to CR3 the previous value. I am not familiar with the
erratum. Maybe I am missing something again, but I do not see an appropriate
exported alternative in the kernel.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 17:16     ` Andrew Cooper
@ 2020-04-07 23:59       ` Masami Hiramatsu
  0 siblings, 0 replies; 74+ messages in thread
From: Masami Hiramatsu @ 2020-04-07 23:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Peter Zijlstra, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	rostedt, gregkh, jannh, keescook, David.Laight, dcovelli

Hi Andrew,

On Tue, 7 Apr 2020 18:16:58 +0100
Andrew Cooper <andrew.cooper3@citrix.com> wrote:

> On 07/04/2020 17:51, Masami Hiramatsu wrote:
> > diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> > index d74d9e605723..ade80796453c 100644
> > --- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
> > +++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> > @@ -70,6 +70,8 @@ BEGIN {
> >  	mmx_expr = "^(emms|fxsave|fxrstor|ldmxcsr|stmxcsr)" # MMX/SSE nmemonics lacking operands
> >  	fpu_expr = "^x87"
> >  
> > +	vmx_expr = "^VM.*" # All mnemonic start with "VM" are VMX instructions
> 
> Not really.
> 
> VMMCALL, VMLOAD, VMSAVE and VMRUN are SVM instructions.

Here VMX will include SVM instructions. Would we need to distinguish them in this context?
(Or INAT_VIRT might be politically correct :) )

> VMASKMOV is a AVX instruction.

Good point. That instruction is written in lowercase "vmaskmov" in x86-opcode-map.txt.
(Maybe it is better to note it in x86-opcode-map.txt)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 23:15             ` Andrew Cooper
@ 2020-04-08  0:22               ` Paolo Bonzini
  2020-04-08  8:37                 ` Peter Zijlstra
  2020-04-08  9:52                 ` Andrew Cooper
  0 siblings, 2 replies; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-08  0:22 UTC (permalink / raw)
  To: Andrew Cooper, Nadav Amit, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes, Fenghua Yu,
	Xiaoyao Li, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook, David.Laight, Doug Covelli,
	mhiramat

On 08/04/20 01:15, Andrew Cooper wrote:
>> Anyhow, I do not think it is the only use-case which is not covered by your
>> patches (even considering CRs/DRs alone). For example, there is no kernel
>> function to turn on CR4.VMXE, which is required to run hypervisors on x86.
> How about taking this opportunity to see if there is a way to improve on
> the status quo for co-existing hypervisor modules?

Almost serious question: why?  I can understand VMware, but why can't at
least VirtualBox use KVM on Linux?  I am not sure if they are still
running device emulation in ring zero, but if so do you really want to
do that these days?

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
                     ` (3 preceding siblings ...)
  2020-04-07 21:25   ` David Laight
@ 2020-04-08  2:10   ` Xiaoyao Li
  2020-04-08  8:09   ` Masami Hiramatsu
  5 siblings, 0 replies; 74+ messages in thread
From: Xiaoyao Li @ 2020-04-08  2:10 UTC (permalink / raw)
  To: Peter Zijlstra, tglx, linux-kernel
  Cc: hch, sean.j.christopherson, mingo, bp, hpa, x86, kenny, jeyu,
	rasmus.villemoes, pbonzini, fenghua.yu, nadav.amit, thellstrom,
	tony.luck, rostedt, gregkh, jannh, keescook, David.Laight,
	dcovelli, mhiramat

On 4/7/2020 7:02 PM, Peter Zijlstra wrote:
> It turns out that with Split-Lock-Detect enabled (default) any VMX
> hypervisor needs at least a little modification in order to not blindly
> inject the #AC into the guest without the guest being ready for it.
> 
> Since there is no telling which module implements a hypervisor, scan
> all out-of-tree modules' text and look for VMX instructions and refuse
> to load it when SLD is enabled (default) and the module isn't marked
> 'sld_safe'.
> 
> Hypervisors, which have been modified and are known to work correctly,
> can add:
> 
>    MODULE_INFO(sld_safe, "Y");
> 
> to explicitly tell the module loader they're good.
> 

I'm thinking that it helps nothing other than telling the possible 
hypervisors "we have a new feature SLD enabled in kernel, but you seem 
not aware of it. To avoid something wrong with you and your VMs, you are 
not allowed to be loaded. Please tell me sld_safe as your assurance to 
get approval"

It's actually the bug/issue of hypervisor and it does no harm to (host) 
kernel. We can just leave it to hypervisor developer that they need to 
fix the bug in their hypervisor.

If we go the way like this patch, then whenever someone reports a 
similar bug due to new feature introduced and enabled in the future, we 
add a new xxx_safe module info?


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 21:27             ` Peter Zijlstra
  2020-04-07 22:12               ` Paolo Bonzini
@ 2020-04-08  5:18               ` Christoph Hellwig
  1 sibling, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2020-04-08  5:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nadav Amit, Thomas Gleixner, LKML, hch, Sean Christopherson,
	mingo, bp, hpa, x86, Kenneth R. Crudup, Jessica Yu,
	Rasmus Villemoes, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Thomas Hellstrom, Tony Luck, Steven Rostedt, Greg Kroah-Hartman,
	jannh, keescook, David.Laight, Doug Covelli, mhiramat

On Tue, Apr 07, 2020 at 11:27:54PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 02:22:11PM -0700, Nadav Amit wrote:
> > Anyhow, I do not think it is the only use-case which is not covered by your
> > patches (even considering CRs/DRs alone). For example, there is no kernel
> > function to turn on CR4.VMXE, which is required to run hypervisors on x86.
> 
> That needs an exported function; there is no way we'll allow random
> writes to CR4, there's too much dodgy stuff in there.

And this clearly shows while trying to cater to anyone doing hardware
virt out of tree is just a disaster and we need to stick to our
traditional line that out of tree modules don't matter, if you care
about your module bring it upstream.  Especially as we have a perfectly
fine upstream module for just about every variant of hardware
virtualization that can be extended for the needs of other hypervisors.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 21:48   ` Steven Rostedt
@ 2020-04-08  5:58     ` Jan Kiszka
  2020-04-08  8:03       ` Paolo Bonzini
  2020-04-08  8:51       ` Peter Zijlstra
  0 siblings, 2 replies; 74+ messages in thread
From: Jan Kiszka @ 2020-04-08  5:58 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat, Wolfgang Mauerer

On 07.04.20 23:48, Steven Rostedt wrote:
> On Tue, 07 Apr 2020 13:02:40 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> Since we now have infrastructure to analyze module text, disallow
>> modules that write to CRn and DRn registers.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> ---
>>   arch/x86/kernel/module.c |   21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> --- a/arch/x86/kernel/module.c
>> +++ b/arch/x86/kernel/module.c
>> @@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
>>   	return false;
>>   }
>>   
>> +static bool insn_is_mov_CRn(struct insn *insn)
>> +{
>> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static bool insn_is_mov_DRn(struct insn *insn)
>> +{
>> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x23)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>   static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
>>   {
>>   	bool allow_vmx = sld_safe || !split_lock_enabled();
>> @@ -285,6 +301,11 @@ static int decode_module(struct module *
>>   			return -ENOEXEC;
>>   		}
>>   
>> +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
>> +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
>> +			return -ENOEXEC;
>> +		}
> 
> Hmm, wont this break jailhouse?

Yes, possibly. We load the hypervisor binary via request_firmware into 
executable memory and then jump into it. So most of the "suspicious" 
code is there - except two cr4_init_shadow() calls to propagate the 
non-transparent update of VMXE into that shadow. We could hide that CR4 
flag, but that could mislead root Linux to try to use VMX while in jail.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 16:51   ` Masami Hiramatsu
  2020-04-07 17:16     ` Andrew Cooper
@ 2020-04-08  7:25     ` Masami Hiramatsu
  1 sibling, 0 replies; 74+ messages in thread
From: Masami Hiramatsu @ 2020-04-08  7:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	rostedt, gregkh, jannh, keescook, David.Laight, dcovelli

On Wed, 8 Apr 2020 01:51:24 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Hi Peter,
> 
> On Tue, 07 Apr 2020 13:02:39 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > +static bool insn_is_vmx(struct insn *insn)
> > +{
> > +	u8 modrm = insn->modrm.bytes[0];
> > +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> > +	u8 modrm_reg = X86_MODRM_REG(modrm);
> > +
> > +	u8 prefix = insn->prefixes.bytes[0];
> > +
> > +	if (insn->opcode.bytes[0] != 0x0f)
> > +		return false;
> > +
> > +	switch (insn->opcode.bytes[1]) {
> > +	case 0x01:
> > +		switch (insn->opcode.bytes[2]) {
> > +		case 0xc1: /* VMCALL */
> > +		case 0xc2: /* VMLAUNCH */
> > +		case 0xc3: /* VMRESUME */
> > +		case 0xc4: /* VMXOFF */
> > +			return true;
> > +
> > +		default:
> > +			break;
> > +		}
> > +		break;
> > +
> > +	case 0x78: /* VMREAD */
> > +	case 0x79: /* VMWRITE */
> > +		return true;
> > +
> > +	case 0xc7:
> > +		/* VMPTRLD, VMPTRST, VMCLEAR, VMXON */
> > +		if (modrm_mod == 0x03)
> > +			break;
> > +
> > +		if ((modrm_reg == 6 && (!prefix || prefix == 0x66 || prefix == 0xf3)) ||
> > +		    (modrm_reg == 7 && (!prefix || prefix == 0xf3)))
> > +			return true;
> > +
> > +		break;
> > +
> > +	default:
> > +		break;
> > +	}
> > +
> > +	return false;
> > +}
> 
> OK, so here is what you need ;)
> 
> From 36f4f6aec623b0190fde95c8630a6a1d8c23ffc9 Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Wed, 8 Apr 2020 01:04:41 +0900
> Subject: [PATCH] x86: insn: Add insn_is_vmx()
> 
> Add insn_is_vmx() to identify the given instruction is
> for VMX or not. This is simply identifying those instructions
> by mnemonic pattern.
> 

Hmm, I found that this is still not enough... since the inat_tables
mixes the instruction attributes on same entry in group tables.
It distinguishes opcodes by last-prefix variants, but not by
MOD & R/M bits, since it is designed only for decoding instructions.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  5:58     ` Jan Kiszka
@ 2020-04-08  8:03       ` Paolo Bonzini
  2020-04-08  8:58         ` Jan Kiszka
  2020-04-08  8:51       ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-08  8:03 UTC (permalink / raw)
  To: Jan Kiszka, Steven Rostedt, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, gregkh, jannh, keescook,
	David.Laight, dcovelli, mhiramat, Wolfgang Mauerer

On 08/04/20 07:58, Jan Kiszka wrote:
>>>
>>>   +        if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
>>> +            pr_err("Module writes to CRn or DRn, please use the
>>> proper accessors: %s\n", mod->name);
>>> +            return -ENOEXEC;
>>> +        }
>>
>> Hmm, wont this break jailhouse?
> 
> Yes, possibly. We load the hypervisor binary via request_firmware into
> executable memory and then jump into it. So most of the "suspicious"
> code is there - except two cr4_init_shadow() calls to propagate the
> non-transparent update of VMXE into that shadow. We could hide that CR4
> flag, but that could mislead root Linux to try to use VMX while in jail.

Why not contribute the Jailhouse loader into Linux?

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
                     ` (4 preceding siblings ...)
  2020-04-08  2:10   ` Xiaoyao Li
@ 2020-04-08  8:09   ` Masami Hiramatsu
  2020-04-08  9:56     ` Peter Zijlstra
  5 siblings, 1 reply; 74+ messages in thread
From: Masami Hiramatsu @ 2020-04-08  8:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli, mhiramat

On Tue, 07 Apr 2020 13:02:39 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> +static bool insn_is_vmx(struct insn *insn)
> +{
> +	u8 modrm = insn->modrm.bytes[0];
> +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> +	u8 modrm_reg = X86_MODRM_REG(modrm);
> +
> +	u8 prefix = insn->prefixes.bytes[0];

This should be the last prefix,

	u8 prefix = insn->prefixes.bytes[3];

(The last prefix always copied on the bytes[3])

> +
> +	if (insn->opcode.bytes[0] != 0x0f)
> +		return false;
> +
> +	switch (insn->opcode.bytes[1]) {
> +	case 0x01:
> +		switch (insn->opcode.bytes[2]) {

Sorry, VMCALL etc. is in Grp7 (0f 01), the 3rd code is embedded
in modrm instead of opcode. Thus it should be,

		switch (insn->modrm.value) {

> +		case 0xc1: /* VMCALL */
> +		case 0xc2: /* VMLAUNCH */
> +		case 0xc3: /* VMRESUME */
> +		case 0xc4: /* VMXOFF */

		case 0xd4:	/* VMFUNC */

> +			return true;
> +
> +		default:
> +			break;
> +		}
> +		break;
> +
> +	case 0x78: /* VMREAD */
> +	case 0x79: /* VMWRITE */

		return !insn_is_evex(insn);

With EVEX prefix, these becomes vcvt* instructions.

> +	case 0xc7:
> +		/* VMPTRLD, VMPTRST, VMCLEAR, VMXON */
> +		if (modrm_mod == 0x03)
> +			break;
> +
> +		if ((modrm_reg == 6 && (!prefix || prefix == 0x66 || prefix == 0xf3)) ||
> +		    (modrm_reg == 7 && (!prefix || prefix == 0xf3)))
> +			return true;
> +
> +		break;
> +
> +	default:
> +		break;
> +	}
> +
> +	return false;
> +}


Thank you,



-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  0:22               ` Paolo Bonzini
@ 2020-04-08  8:37                 ` Peter Zijlstra
  2020-04-08  9:52                 ` Andrew Cooper
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-08  8:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Cooper, Nadav Amit, Thomas Gleixner, LKML, hch,
	Sean Christopherson, mingo, bp, hpa, x86, Kenneth R. Crudup,
	Jessica Yu, Rasmus Villemoes, Fenghua Yu, Xiaoyao Li,
	Thomas Hellstrom, Tony Luck, Steven Rostedt, Greg Kroah-Hartman,
	jannh, keescook, David.Laight, Doug Covelli, mhiramat

On Wed, Apr 08, 2020 at 02:22:45AM +0200, Paolo Bonzini wrote:
> On 08/04/20 01:15, Andrew Cooper wrote:
> >> Anyhow, I do not think it is the only use-case which is not covered by your
> >> patches (even considering CRs/DRs alone). For example, there is no kernel
> >> function to turn on CR4.VMXE, which is required to run hypervisors on x86.
> > How about taking this opportunity to see if there is a way to improve on
> > the status quo for co-existing hypervisor modules?
> 
> Almost serious question: why?  I can understand VMware, but why can't at
> least VirtualBox use KVM on Linux?  I am not sure if they are still
> running device emulation in ring zero, but if so do you really want to
> do that these days?

Having had the 'joy' of looking at the virtual-puke^Wbox code recently,
nobody with half a hair of sense on their head will want to ever touch
that thing again.

It's a security nightmare, and that's the best part of it.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 22:12               ` Paolo Bonzini
  2020-04-07 23:51                 ` Nadav Amit
@ 2020-04-08  8:45                 ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-08  8:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Nadav Amit, Thomas Gleixner, LKML, hch, Sean Christopherson,
	mingo, bp, hpa, x86, Kenneth R. Crudup, Jessica Yu,
	Rasmus Villemoes, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Steven Rostedt, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

On Wed, Apr 08, 2020 at 12:12:14AM +0200, Paolo Bonzini wrote:
> On 07/04/20 23:27, Peter Zijlstra wrote:
> > On Tue, Apr 07, 2020 at 02:22:11PM -0700, Nadav Amit wrote:
> >> Anyhow, I do not think it is the only use-case which is not covered by your
> >> patches (even considering CRs/DRs alone). For example, there is no kernel
> >> function to turn on CR4.VMXE, which is required to run hypervisors on x86.
> > That needs an exported function; there is no way we'll allow random
> > writes to CR4, there's too much dodgy stuff in there.
> 
> native_write_cr4 and pv_ops (through which you can do write_cr4) are
> both exported, and so is cpu_tlbstate which is used by __cr4_set_bits
> and friends.  Am I missing something glaringly obvious?

cpu_tlbstate is going away, but yes, native_write_cr4() is the right
interface to use, or rather the cr4_{set,clear,toggle}_bits() things
are.

That gives us control over which CR4 bits are available, and, a possible
means of arbitrating that VMX bit.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  5:58     ` Jan Kiszka
  2020-04-08  8:03       ` Paolo Bonzini
@ 2020-04-08  8:51       ` Peter Zijlstra
  2020-04-08  8:59         ` Jan Kiszka
  2020-04-08  9:13         ` Peter Zijlstra
  1 sibling, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-08  8:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat,
	Wolfgang Mauerer

On Wed, Apr 08, 2020 at 07:58:53AM +0200, Jan Kiszka wrote:
> On 07.04.20 23:48, Steven Rostedt wrote:

> > Hmm, wont this break jailhouse?

Breaking it isn't a problem, it's out of tree and it should be fixable.

> Yes, possibly. We load the hypervisor binary via request_firmware into
> executable memory and then jump into it. So most of the "suspicious" code is

W.T.H. does the firmware loader have the ability to give executable
memory? We need to kill that too. /me goes find.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  8:03       ` Paolo Bonzini
@ 2020-04-08  8:58         ` Jan Kiszka
  2020-04-08  9:04           ` Paolo Bonzini
  0 siblings, 1 reply; 74+ messages in thread
From: Jan Kiszka @ 2020-04-08  8:58 UTC (permalink / raw)
  To: Paolo Bonzini, Steven Rostedt, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, gregkh, jannh, keescook,
	David.Laight, dcovelli, mhiramat, Wolfgang Mauerer

On 08.04.20 10:03, Paolo Bonzini wrote:
> On 08/04/20 07:58, Jan Kiszka wrote:
>>>>
>>>>    +        if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
>>>> +            pr_err("Module writes to CRn or DRn, please use the
>>>> proper accessors: %s\n", mod->name);
>>>> +            return -ENOEXEC;
>>>> +        }
>>>
>>> Hmm, wont this break jailhouse?
>>
>> Yes, possibly. We load the hypervisor binary via request_firmware into
>> executable memory and then jump into it. So most of the "suspicious"
>> code is there - except two cr4_init_shadow() calls to propagate the
>> non-transparent update of VMXE into that shadow. We could hide that CR4
>> flag, but that could mislead root Linux to try to use VMX while in jail.
> 
> Why not contribute the Jailhouse loader into Linux?
> 

Definitely planned. But right now it would add the burden of managing 
the interface between loader and hypervisor carefully. Currently it is 
internal to Jailhouse and maintained in lock-step, without any backward 
compatibility.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  8:51       ` Peter Zijlstra
@ 2020-04-08  8:59         ` Jan Kiszka
  2020-04-08  9:25           ` David Laight
  2020-04-08  9:13         ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Jan Kiszka @ 2020-04-08  8:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat,
	Wolfgang Mauerer

On 08.04.20 10:51, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 07:58:53AM +0200, Jan Kiszka wrote:
>> On 07.04.20 23:48, Steven Rostedt wrote:
> 
>>> Hmm, wont this break jailhouse?
> 
> Breaking it isn't a problem, it's out of tree and it should be fixable.
> 
>> Yes, possibly. We load the hypervisor binary via request_firmware into
>> executable memory and then jump into it. So most of the "suspicious" code is
> 
> W.T.H. does the firmware loader have the ability to give executable
> memory? We need to kill that too. /me goes find.
> 

At the risk of cutting our own branch off: That's not the firmware 
loader, it's ioremap with PAGE_KERNEL_EXEC.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  8:58         ` Jan Kiszka
@ 2020-04-08  9:04           ` Paolo Bonzini
  2020-04-08 10:45             ` Jan Kiszka
  0 siblings, 1 reply; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-08  9:04 UTC (permalink / raw)
  To: Jan Kiszka, Steven Rostedt, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, gregkh, jannh, keescook,
	David.Laight, dcovelli, mhiramat, Wolfgang Mauerer

On 08/04/20 10:58, Jan Kiszka wrote:
>> Why not contribute the Jailhouse loader into Linux?
> 
> Definitely planned. But right now it would add the burden of managing
> the interface between loader and hypervisor carefully. Currently it is
> internal to Jailhouse and maintained in lock-step, without any backward
> compatibility.

How often does that change?

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  8:51       ` Peter Zijlstra
  2020-04-08  8:59         ` Jan Kiszka
@ 2020-04-08  9:13         ` Peter Zijlstra
  2020-04-08 10:50           ` Jan Kiszka
  1 sibling, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-08  9:13 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat,
	Wolfgang Mauerer

On Wed, Apr 08, 2020 at 10:51:38AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 07:58:53AM +0200, Jan Kiszka wrote:
> > On 07.04.20 23:48, Steven Rostedt wrote:
> 
> > > Hmm, wont this break jailhouse?
> 
> Breaking it isn't a problem, it's out of tree and it should be fixable.
> 
> > Yes, possibly. We load the hypervisor binary via request_firmware into
> > executable memory and then jump into it. So most of the "suspicious" code is
> 
> W.T.H. does the firmware loader have the ability to give executable
> memory? We need to kill that too. /me goes find.

AFAICT the firmware loader only provides PAGE_KERNEL_RO, so how do you
get it executable?

I'm thinking the patches Christoph has lined up will take care of this.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  8:59         ` Jan Kiszka
@ 2020-04-08  9:25           ` David Laight
  2020-04-08 11:13             ` Jan Kiszka
  0 siblings, 1 reply; 74+ messages in thread
From: David Laight @ 2020-04-08  9:25 UTC (permalink / raw)
  To: 'Jan Kiszka', Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, dcovelli, mhiramat, Wolfgang Mauerer

From: Jan Kiszka
> Sent: 08 April 2020 09:59
...
> At the risk of cutting our own branch off: That's not the firmware
> loader, it's ioremap with PAGE_KERNEL_EXEC.

You could link the 'blob' into the .text part of a normal
kernel module and then load that.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  0:22               ` Paolo Bonzini
  2020-04-08  8:37                 ` Peter Zijlstra
@ 2020-04-08  9:52                 ` Andrew Cooper
  1 sibling, 0 replies; 74+ messages in thread
From: Andrew Cooper @ 2020-04-08  9:52 UTC (permalink / raw)
  To: Paolo Bonzini, Nadav Amit, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, hch, Sean Christopherson, mingo, bp, hpa,
	x86, Kenneth R. Crudup, Jessica Yu, Rasmus Villemoes, Fenghua Yu,
	Xiaoyao Li, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook, David.Laight, Doug Covelli,
	mhiramat

On 08/04/2020 01:22, Paolo Bonzini wrote:
> On 08/04/20 01:15, Andrew Cooper wrote:
>>> Anyhow, I do not think it is the only use-case which is not covered by your
>>> patches (even considering CRs/DRs alone). For example, there is no kernel
>>> function to turn on CR4.VMXE, which is required to run hypervisors on x86.
>> How about taking this opportunity to see if there is a way to improve on
>> the status quo for co-existing hypervisor modules?
> Almost serious question: why?  I can understand VMware, but why can't at
> least VirtualBox use KVM on Linux?  I am not sure if they are still
> running device emulation in ring zero, but if so do you really want to
> do that these days?

I see a lot of good reasons not to use the VirtualBox out-of-tree module
specifically, but there are plenty of other out-of-tree hypervisors,
including Jailhouse and Bareflank which come to mind.

I'm not suggesting bending over backwards for them, but at the point
you're already breaking all of them anyway, it seems silly not to try
and address some of the other robustness issues.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-08  8:09   ` Masami Hiramatsu
@ 2020-04-08  9:56     ` Peter Zijlstra
  2020-04-08 10:15       ` Andrew Cooper
  2020-04-10 11:25       ` Masami Hiramatsu
  0 siblings, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-08  9:56 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli

On Wed, Apr 08, 2020 at 05:09:34PM +0900, Masami Hiramatsu wrote:
> On Tue, 07 Apr 2020 13:02:39 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > +static bool insn_is_vmx(struct insn *insn)
> > +{
> > +	u8 modrm = insn->modrm.bytes[0];
> > +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> > +	u8 modrm_reg = X86_MODRM_REG(modrm);
> > +
> > +	u8 prefix = insn->prefixes.bytes[0];
> 
> This should be the last prefix,
> 
> 	u8 prefix = insn->prefixes.bytes[3];
> 
> (The last prefix always copied on the bytes[3])

And that is 0 on no-prefix, right?

> > +
> > +	if (insn->opcode.bytes[0] != 0x0f)
> > +		return false;
> > +
> > +	switch (insn->opcode.bytes[1]) {
> > +	case 0x01:
> > +		switch (insn->opcode.bytes[2]) {
> 
> Sorry, VMCALL etc. is in Grp7 (0f 01), the 3rd code is embedded
> in modrm instead of opcode. Thus it should be,
> 
> 		switch (insn->modrm.value) {

Indeed, I was hoping (I really should've checked) that that byte was
duplicated in opcodes.

Also, since I already have modrm = insn->modrm.bytes[0], I should
probably use that anyway.

> > +		case 0xc1: /* VMCALL */
> > +		case 0xc2: /* VMLAUNCH */
> > +		case 0xc3: /* VMRESUME */
> > +		case 0xc4: /* VMXOFF */
> 
> 		case 0xd4:	/* VMFUNC */

As per Andrew, VMCALL and VMFUNC are SMV, and I really only need VMX in
this case. Including SMV is probably harmless, but I'm thinking a
smaller function is better.

> > +			return true;
> > +
> > +		default:
> > +			break;
> > +		}
> > +		break;
> > +
> > +	case 0x78: /* VMREAD */
> > +	case 0x79: /* VMWRITE */
> 
> 		return !insn_is_evex(insn);
> 
> With EVEX prefix, these becomes vcvt* instructions.

Thanks!

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-08  9:56     ` Peter Zijlstra
@ 2020-04-08 10:15       ` Andrew Cooper
  2020-04-10 11:25       ` Masami Hiramatsu
  1 sibling, 0 replies; 74+ messages in thread
From: Andrew Cooper @ 2020-04-08 10:15 UTC (permalink / raw)
  To: Peter Zijlstra, Masami Hiramatsu
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli

On 08/04/2020 10:56, Peter Zijlstra wrote:
>>> +		case 0xc1: /* VMCALL */
>>> +		case 0xc2: /* VMLAUNCH */
>>> +		case 0xc3: /* VMRESUME */
>>> +		case 0xc4: /* VMXOFF */
>> 		case 0xd4:	/* VMFUNC */
> As per Andrew, VMCALL and VMFUNC are SMV, and I really only need VMX in
> this case. Including SMV is probably harmless, but I'm thinking a
> smaller function is better.

VMCALL and VMFUNC are both VMX instructions.  VMMCALL (count the M's -
yes it is a different instruction) is SVM, and I forgot VMGEXIT from the
list yesterday which is also SVM.

However, searching for them is probably not helpful.  They are all
guest-only instructions and you wouldn't expect to see them in
hypervisor code.

~Andrew

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  9:04           ` Paolo Bonzini
@ 2020-04-08 10:45             ` Jan Kiszka
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kiszka @ 2020-04-08 10:45 UTC (permalink / raw)
  To: Paolo Bonzini, Steven Rostedt, Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	nadav.amit, thellstrom, tony.luck, gregkh, jannh, keescook,
	David.Laight, dcovelli, mhiramat, Wolfgang Mauerer

On 08.04.20 11:04, Paolo Bonzini wrote:
> On 08/04/20 10:58, Jan Kiszka wrote:
>>> Why not contribute the Jailhouse loader into Linux?
>>
>> Definitely planned. But right now it would add the burden of managing
>> the interface between loader and hypervisor carefully. Currently it is
>> internal to Jailhouse and maintained in lock-step, without any backward
>> compatibility.
> 
> How often does that change?

Not often, but right now we are at a critical point, starting to explore 
booting Jailhouse before Linux [1]. That may actually help to settle the 
interface and move things forward.

Another to-do is overcoming the need for having to map the hypervisor at 
a fixed location into the kernel address space. Not needed on arm64, 
still required on 32-bit ARM (well...) and x86 (more important). I would 
dislike pushing such legacy to upstream.

Jan

[1] 
https://www.mail-archive.com/jailhouse-dev@googlegroups.com/msg08389.html

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  9:13         ` Peter Zijlstra
@ 2020-04-08 10:50           ` Jan Kiszka
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kiszka @ 2020-04-08 10:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat,
	Wolfgang Mauerer

On 08.04.20 11:13, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 10:51:38AM +0200, Peter Zijlstra wrote:
>> On Wed, Apr 08, 2020 at 07:58:53AM +0200, Jan Kiszka wrote:
>>> On 07.04.20 23:48, Steven Rostedt wrote:
>>
>>>> Hmm, wont this break jailhouse?
>>
>> Breaking it isn't a problem, it's out of tree and it should be fixable.
>>
>>> Yes, possibly. We load the hypervisor binary via request_firmware into
>>> executable memory and then jump into it. So most of the "suspicious" code is
>>
>> W.T.H. does the firmware loader have the ability to give executable
>> memory? We need to kill that too. /me goes find.
> 
> AFAICT the firmware loader only provides PAGE_KERNEL_RO, so how do you
> get it executable?

memcpy(ioremapped_exec_region, firmware_image)

We only use the loader for getting the blob, not for running it. It has 
to be put at a location that Linux will lose control over anyway.

> 
> I'm thinking the patches Christoph has lined up will take care of this.
> 

It would make sense from a certain POV...

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08  9:25           ` David Laight
@ 2020-04-08 11:13             ` Jan Kiszka
  2020-04-08 11:17               ` David Laight
  0 siblings, 1 reply; 74+ messages in thread
From: Jan Kiszka @ 2020-04-08 11:13 UTC (permalink / raw)
  To: David Laight, Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, dcovelli, mhiramat, Wolfgang Mauerer

On 08.04.20 11:25, David Laight wrote:
> From: Jan Kiszka
>> Sent: 08 April 2020 09:59
> ...
>> At the risk of cutting our own branch off: That's not the firmware
>> loader, it's ioremap with PAGE_KERNEL_EXEC.
> 
> You could link the 'blob' into the .text part of a normal
> kernel module and then load that.

Sure, also possible. Was just more convenient so far to replace the 
hypervisor binary without having to recompile the driver module.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 11:13             ` Jan Kiszka
@ 2020-04-08 11:17               ` David Laight
  0 siblings, 0 replies; 74+ messages in thread
From: David Laight @ 2020-04-08 11:17 UTC (permalink / raw)
  To: 'Jan Kiszka', Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, dcovelli, mhiramat, Wolfgang Mauerer

From: Jan Kiszka 
> Sent: 08 April 2020 12:13

> 
> On 08.04.20 11:25, David Laight wrote:
> > From: Jan Kiszka
> >> Sent: 08 April 2020 09:59
> > ...
> >> At the risk of cutting our own branch off: That's not the firmware
> >> loader, it's ioremap with PAGE_KERNEL_EXEC.
> >
> > You could link the 'blob' into the .text part of a normal
> > kernel module and then load that.
> 
> Sure, also possible. Was just more convenient so far to replace the
> hypervisor binary without having to recompile the driver module.

I was thinking of a separate module that contained nothing else.
If it depended on the driver it's 'init' function could call back
into the driver code to pass an entrypoint array (etc).

You can easily to a version check at that point as well.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
                     ` (2 preceding siblings ...)
  2020-04-07 21:48   ` Steven Rostedt
@ 2020-04-08 13:27   ` Steven Rostedt
  2020-04-08 15:44     ` Peter Zijlstra
  3 siblings, 1 reply; 74+ messages in thread
From: Steven Rostedt @ 2020-04-08 13:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Tue, 07 Apr 2020 13:02:40 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Since we now have infrastructure to analyze module text, disallow
> modules that write to CRn and DRn registers.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/module.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -266,6 +266,22 @@ static bool insn_is_vmx(struct insn *ins
>  	return false;
>  }
>  
> +static bool insn_is_mov_CRn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x22)
> +		return true;
> +
> +	return false;
> +}
> +
> +static bool insn_is_mov_DRn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f && insn->opcode.bytes[1] == 0x23)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int decode_module(struct module *mod, void *text, void *text_end, bool sld_safe)
>  {
>  	bool allow_vmx = sld_safe || !split_lock_enabled();
> @@ -285,6 +301,11 @@ static int decode_module(struct module *
>  			return -ENOEXEC;
>  		}
>  
> +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
> +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
> +			return -ENOEXEC;
> +		}
> +

Something like this should be done for all modules, not just out of tree
modules.

-- Steve

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 13:27   ` Steven Rostedt
@ 2020-04-08 15:44     ` Peter Zijlstra
  2020-04-08 15:46       ` Christoph Hellwig
  2020-04-08 15:54       ` Jessica Yu
  0 siblings, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-08 15:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Wed, Apr 08, 2020 at 09:27:26AM -0400, Steven Rostedt wrote:
> On Tue, 07 Apr 2020 13:02:40 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
> > +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
> > +			return -ENOEXEC;
> > +		}
> > +
> 
> Something like this should be done for all modules, not just out of tree
> modules.

I'm all for it; but people were worried scanning all modules was too
expensive (I don't really believe it is, module loading just can't be a
hot-path). Also, in-tree modules are audited a lot more than out of tree
magic voodoo crap.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 15:44     ` Peter Zijlstra
@ 2020-04-08 15:46       ` Christoph Hellwig
  2020-04-08 16:02         ` Sean Christopherson
  2020-04-08 16:15         ` Paolo Bonzini
  2020-04-08 15:54       ` Jessica Yu
  1 sibling, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2020-04-08 15:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat

On Wed, Apr 08, 2020 at 05:44:19PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 09:27:26AM -0400, Steven Rostedt wrote:
> > On Tue, 07 Apr 2020 13:02:40 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
> > > +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
> > > +			return -ENOEXEC;
> > > +		}
> > > +
> > 
> > Something like this should be done for all modules, not just out of tree
> > modules.
> 
> I'm all for it; but people were worried scanning all modules was too
> expensive (I don't really believe it is, module loading just can't be a
> hot-path). Also, in-tree modules are audited a lot more than out of tree
> magic voodoo crap.

Scanning all modules seems safer.  While we're at it - can be move the
kvm bits using VMX to be always in the core kernel and just forbid
modules from using those instructions entirely?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 15:44     ` Peter Zijlstra
  2020-04-08 15:46       ` Christoph Hellwig
@ 2020-04-08 15:54       ` Jessica Yu
  1 sibling, 0 replies; 74+ messages in thread
From: Jessica Yu @ 2020-04-08 15:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, hch, sean.j.christopherson,
	mingo, bp, hpa, x86, kenny, rasmus.villemoes, pbonzini,
	fenghua.yu, xiaoyao.li, nadav.amit, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat

+++ Peter Zijlstra [08/04/20 17:44 +0200]:
>On Wed, Apr 08, 2020 at 09:27:26AM -0400, Steven Rostedt wrote:
>> On Tue, 07 Apr 2020 13:02:40 +0200
>> Peter Zijlstra <peterz@infradead.org> wrote:
>
>> > +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
>> > +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
>> > +			return -ENOEXEC;
>> > +		}
>> > +
>>
>> Something like this should be done for all modules, not just out of tree
>> modules.
>
>I'm all for it; but people were worried scanning all modules was too
>expensive (I don't really believe it is, module loading just can't be a
>hot-path). Also, in-tree modules are audited a lot more than out of tree
>magic voodoo crap.

The intention of the original patches was to do the text scan to catch
a handful of out-of-tree hypervisor modules - but now that
decode_module() is being generalized to more cases, I don't mind
scanning all modules.

Thanks,

Jessica

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 15:46       ` Christoph Hellwig
@ 2020-04-08 16:02         ` Sean Christopherson
  2020-04-08 16:15         ` Paolo Bonzini
  1 sibling, 0 replies; 74+ messages in thread
From: Sean Christopherson @ 2020-04-08 16:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Steven Rostedt, tglx, linux-kernel, mingo, bp,
	hpa, x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On Wed, Apr 08, 2020 at 08:46:02AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 08, 2020 at 05:44:19PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 08, 2020 at 09:27:26AM -0400, Steven Rostedt wrote:
> > > On Tue, 07 Apr 2020 13:02:40 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
> > > > +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
> > > > +			return -ENOEXEC;
> > > > +		}
> > > > +
> > > 
> > > Something like this should be done for all modules, not just out of tree
> > > modules.
> > 
> > I'm all for it; but people were worried scanning all modules was too
> > expensive (I don't really believe it is, module loading just can't be a
> > hot-path). Also, in-tree modules are audited a lot more than out of tree
> > magic voodoo crap.
> 
> Scanning all modules seems safer.  While we're at it - can be move the
> kvm bits using VMX to be always in the core kernel and just forbid
> modules from using those instructions entirely?

Practically speaking, no.  Turning VMX on and off (literally VMXON/VMXOFF)
could be moved to helpers in the kernel, but KVM relies on inlining all
post-VMXON instructions (except for VMLAUNCH/VMRESUME) for performance.
VMLAUNCH/VMRESUME have their own caveats, moving them out of KVM would be
messy, to say the least.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 15:46       ` Christoph Hellwig
  2020-04-08 16:02         ` Sean Christopherson
@ 2020-04-08 16:15         ` Paolo Bonzini
  2020-04-09  8:56           ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Paolo Bonzini @ 2020-04-08 16:15 UTC (permalink / raw)
  To: Christoph Hellwig, Peter Zijlstra
  Cc: Steven Rostedt, tglx, linux-kernel, sean.j.christopherson, mingo,
	bp, hpa, x86, kenny, jeyu, rasmus.villemoes, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, gregkh, jannh,
	keescook, David.Laight, dcovelli, mhiramat

On 08/04/20 17:46, Christoph Hellwig wrote:
> On Wed, Apr 08, 2020 at 05:44:19PM +0200, Peter Zijlstra wrote:
>> On Wed, Apr 08, 2020 at 09:27:26AM -0400, Steven Rostedt wrote:
>>> On Tue, 07 Apr 2020 13:02:40 +0200
>>> Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>>> +		if (insn_is_mov_CRn(&insn) || insn_is_mov_DRn(&insn)) {
>>>> +			pr_err("Module writes to CRn or DRn, please use the proper accessors: %s\n", mod->name);
>>>> +			return -ENOEXEC;
>>>> +		}
>>>> +
>>>
>>> Something like this should be done for all modules, not just out of tree
>>> modules.
>>
>> I'm all for it; but people were worried scanning all modules was too
>> expensive (I don't really believe it is, module loading just can't be a
>> hot-path). Also, in-tree modules are audited a lot more than out of tree
>> magic voodoo crap.
> 
> Scanning all modules seems safer.  While we're at it - can be move the
> kvm bits using VMX to be always in the core kernel and just forbid
> modules from using those instructions entirely?

I suppose we could use PVOPS-style patching for the more
performance-critical cases, but VMREAD/VMWRITE does not seem like a
particularly bad thing to allow modules and VMLAUNCH/VMRESUME have very
peculiar calling conventions around them.

However, I wouldn't mind it if VMCLEAR/VMPTRLD and the associated kdump
cleanup code were moved to core kernel code.

Paolo


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-08 16:15         ` Paolo Bonzini
@ 2020-04-09  8:56           ` Peter Zijlstra
  2020-04-09 10:13             ` Nadav Amit
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2020-04-09  8:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Steven Rostedt, tglx, linux-kernel,
	sean.j.christopherson, mingo, bp, hpa, x86, kenny, jeyu,
	rasmus.villemoes, fenghua.yu, xiaoyao.li, nadav.amit, thellstrom,
	tony.luck, gregkh, jannh, keescook, David.Laight, dcovelli,
	mhiramat

On Wed, Apr 08, 2020 at 06:15:48PM +0200, Paolo Bonzini wrote:
> On 08/04/20 17:46, Christoph Hellwig wrote:

> > Scanning all modules seems safer.  While we're at it - can be move the
> > kvm bits using VMX to be always in the core kernel and just forbid
> > modules from using those instructions entirely?
> 
> I suppose we could use PVOPS-style patching for the more
> performance-critical cases, but VMREAD/VMWRITE does not seem like a
> particularly bad thing to allow modules and VMLAUNCH/VMRESUME have very
> peculiar calling conventions around them.
> 
> However, I wouldn't mind it if VMCLEAR/VMPTRLD and the associated kdump
> cleanup code were moved to core kernel code.

Speaking with my virt ignorance hat on, how impossible is it to provide
generic/useful VMLAUNCH/VMRESUME wrappers?

Because a lot of what happens around VMEXIT/VMENTER is very much like
the userspace entry crud, as per that series from Thomas that fixes all
that. And surely we don't need various broken copies of that in all the
out-of-tree hypervisors.

Also, I suppose if you have this, we no longer need to excempt CR2.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-09  8:56           ` Peter Zijlstra
@ 2020-04-09 10:13             ` Nadav Amit
  2020-04-09 21:13               ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Nadav Amit @ 2020-04-09 10:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Christoph Hellwig, Steven Rostedt,
	Thomas Gleixner, LKML, Sean Christopherson, mingo, bp, hpa, x86,
	kenny, jeyu, rasmus.villemoes, fenghua.yu, xiaoyao.li,
	thellstrom, tony.luck, gregkh, jannh, keescook, David.Laight,
	dcovelli, mhiramat

> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Apr 08, 2020 at 06:15:48PM +0200, Paolo Bonzini wrote:
>> On 08/04/20 17:46, Christoph Hellwig wrote:
> 
>>> Scanning all modules seems safer.  While we're at it - can be move the
>>> kvm bits using VMX to be always in the core kernel and just forbid
>>> modules from using those instructions entirely?
>> 
>> I suppose we could use PVOPS-style patching for the more
>> performance-critical cases, but VMREAD/VMWRITE does not seem like a
>> particularly bad thing to allow modules and VMLAUNCH/VMRESUME have very
>> peculiar calling conventions around them.
>> 
>> However, I wouldn't mind it if VMCLEAR/VMPTRLD and the associated kdump
>> cleanup code were moved to core kernel code.
> 
> Speaking with my virt ignorance hat on, how impossible is it to provide
> generic/useful VMLAUNCH/VMRESUME wrappers?
> 
> Because a lot of what happens around VMEXIT/VMENTER is very much like
> the userspace entry crud, as per that series from Thomas that fixes all
> that. And surely we don't need various broken copies of that in all the
> out-of-tree hypervisors.
> 
> Also, I suppose if you have this, we no longer need to excempt CR2.

It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the
instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(),
there is no problem. Even if you consider saving the general purpose
registers as done in __vmx_vcpu_run() - that’s relatively easy.

I think that anything that is greater than that, for instance
vmx_vcpu_run(), would require more thought and effort. KVM data-structures,
specifically kvm_vcpu and friends, would need to be broken into general and
KVM specific structures. I am really not sure that the end result would be
much better than using KVM user-space interfaces.

I can ask someone from the VMware hypervisor developers to provide VMware
point-of-view, but it would help to know when do you plan to make such a
change take, and whether there would be some transition stage. Adapting a
hypervisor to use different low-level interfaces would require quite some
development & testing effort.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-09 10:13             ` Nadav Amit
@ 2020-04-09 21:13               ` Thomas Gleixner
  2020-04-09 22:18                 ` Steven Rostedt
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2020-04-09 21:13 UTC (permalink / raw)
  To: Nadav Amit, Peter Zijlstra
  Cc: Paolo Bonzini, Christoph Hellwig, Steven Rostedt, LKML,
	Sean Christopherson, mingo, bp, hpa, x86, kenny, jeyu,
	rasmus.villemoes, fenghua.yu, xiaoyao.li, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat

Nadav Amit <nadav.amit@gmail.com> writes:
>> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> Speaking with my virt ignorance hat on, how impossible is it to provide
>> generic/useful VMLAUNCH/VMRESUME wrappers?
>> 
>> Because a lot of what happens around VMEXIT/VMENTER is very much like
>> the userspace entry crud, as per that series from Thomas that fixes all
>> that. And surely we don't need various broken copies of that in all the
>> out-of-tree hypervisors.
>> 
>> Also, I suppose if you have this, we no longer need to excempt CR2.
>
> It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the
> instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(),
> there is no problem. Even if you consider saving the general purpose
> registers as done in __vmx_vcpu_run() - that’s relatively easy.

__vmx_vcpu_run() is roughly the scope, but that wont work.

Looking at the vmmon source:

Task_Switch()

    1) Mask all APIC LVTs which have NMI delivery mode enabled, e.g. PERF

    2) Disable interrupts

    3) Disable PEBS

    4) Disable PT

    5) Load a magic IDT

       According to comments these are stubs to catch any exception which
       happens while switching over.

    6) Write CR0 and CR4 directly which is "safe" as the the IDT is
       redirected to the monitor stubs.

    7) VMXON()

    8) Invoke monitor on some magic page which switches CR3 and GDT and
       clears CR4.PCIDE (at least thats what the comments claim)

       The monitor code is loaded from a binary only blob and that does
       the actual vmlaunch/vmresume ...

       And as this runs with a completely different CR3 sharing that
       code is impossible.

    When returning the above is undone in reverse order and any catched
    exceptions / interrupts are replayed via "int $NR".

So it's pretty much the same mess as with vbox just different and
binary. Oh well...

The "good" news is that it's not involved in any of the context tracking
stuff so RCU wont ever be affected when a vmware vCPU runs. It's not
pretty, but TBH I don't care.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-09 21:13               ` Thomas Gleixner
@ 2020-04-09 22:18                 ` Steven Rostedt
  2020-04-10  5:37                   ` Nadav Amit
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Rostedt @ 2020-04-09 22:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nadav Amit, Peter Zijlstra, Paolo Bonzini, Christoph Hellwig,
	LKML, Sean Christopherson, mingo, bp, hpa, x86, kenny, jeyu,
	rasmus.villemoes, fenghua.yu, xiaoyao.li, thellstrom, tony.luck,
	gregkh, jannh, keescook, David.Laight, dcovelli, mhiramat

On Thu, 09 Apr 2020 23:13:22 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> Nadav Amit <nadav.amit@gmail.com> writes:
> >> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> Speaking with my virt ignorance hat on, how impossible is it to provide
> >> generic/useful VMLAUNCH/VMRESUME wrappers?
> >> 
> >> Because a lot of what happens around VMEXIT/VMENTER is very much like
> >> the userspace entry crud, as per that series from Thomas that fixes all
> >> that. And surely we don't need various broken copies of that in all the
> >> out-of-tree hypervisors.
> >> 
> >> Also, I suppose if you have this, we no longer need to excempt CR2.  
> >
> > It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the
> > instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(),
> > there is no problem. Even if you consider saving the general purpose
> > registers as done in __vmx_vcpu_run() - that’s relatively easy.  
> 
> __vmx_vcpu_run() is roughly the scope, but that wont work.
> 
> Looking at the vmmon source:
> 
> Task_Switch()
> 
>     1) Mask all APIC LVTs which have NMI delivery mode enabled, e.g. PERF
> 
>     2) Disable interrupts
> 
>     3) Disable PEBS
> 
>     4) Disable PT
> 
>     5) Load a magic IDT
> 
>        According to comments these are stubs to catch any exception which
>        happens while switching over.
> 
>     6) Write CR0 and CR4 directly which is "safe" as the the IDT is
>        redirected to the monitor stubs.
> 
>     7) VMXON()
> 
>     8) Invoke monitor on some magic page which switches CR3 and GDT and
>        clears CR4.PCIDE (at least thats what the comments claim)
> 
>        The monitor code is loaded from a binary only blob and that does
>        the actual vmlaunch/vmresume ...

From what I understand (never looked at the code), is that this binary blob
is the same for Windows and Apple. It's basically its own operating system
that does all the work and vmmon is the way to switch to and from it. When
this blob gets an interrupt that it doesn't know about, it assumes it
belongs to the operating system its sharing the machine with and exits back
to it, whether that's Linux, Windows or OSX.

It's not too unlike what jailhouse does with its hypervisor, to take over
the machine and place the running Linux into its own "cell", except that it
will switch full control of the machine back to Linux.

-- Steve


> 
>        And as this runs with a completely different CR3 sharing that
>        code is impossible.
> 
>     When returning the above is undone in reverse order and any catched
>     exceptions / interrupts are replayed via "int $NR".
> 
> So it's pretty much the same mess as with vbox just different and
> binary. Oh well...
> 
> The "good" news is that it's not involved in any of the context tracking
> stuff so RCU wont ever be affected when a vmware vCPU runs. It's not
> pretty, but TBH I don't care.
> 
> Thanks,
> 
>         tglx
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
  2020-04-09 22:18                 ` Steven Rostedt
@ 2020-04-10  5:37                   ` Nadav Amit
  0 siblings, 0 replies; 74+ messages in thread
From: Nadav Amit @ 2020-04-10  5:37 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner
  Cc: Peter Zijlstra, Paolo Bonzini, Christoph Hellwig, LKML,
	Sean Christopherson, mingo, bp, hpa, x86, Kenneth R. Crudup,
	Jessica Yu, Rasmus Villemoes, Fenghua Yu, Xiaoyao Li,
	Thomas Hellstrom, Tony Luck, Greg Kroah-Hartman, jannh, keescook,
	David.Laight, Doug Covelli, mhiramat

> On Apr 9, 2020, at 3:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Thu, 09 Apr 2020 23:13:22 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> __vmx_vcpu_run() is roughly the scope, but that wont work.
>> 
>> Looking at the vmmon source:
>> 
>> 

[ Snip ]

>> 
>>    8) Invoke monitor on some magic page which switches CR3 and GDT and
>>       clears CR4.PCIDE (at least thats what the comments claim)
>> 
>>       The monitor code is loaded from a binary only blob and that does
>>       the actual vmlaunch/vmresume ...
> 
> From what I understand (never looked at the code), is that this binary blob
> is the same for Windows and Apple. It's basically its own operating system
> that does all the work and vmmon is the way to switch to and from it. When
> this blob gets an interrupt that it doesn't know about, it assumes it
> belongs to the operating system its sharing the machine with and exits back
> to it, whether that's Linux, Windows or OSX.
> 
> It's not too unlike what jailhouse does with its hypervisor, to take over
> the machine and place the running Linux into its own "cell", except that it
> will switch full control of the machine back to Linux.
> 
> -- Steve
> 
> 
>>       And as this runs with a completely different CR3 sharing that
>>       code is impossible.


To complement Steven’s response, we are fully aware in VMware about this
issue since the patches were sent, and the hypervisor developers consider
how to address it. Indeed, vmmon does something similar to VirtualBox and
Jailhouse. The planned restriction on the creation of executable mappings by
modules would require changes in all of these hypervisors.

Actually, IIUC, even Mircosoft Hyper-V might be broken by the proposed
changes that prevent the creation of executable mappings. It may appear as
if there is no issue since the __vmalloc() that hyperv_init() uses in the
kernel to setup the hypercall pages will be changed into vmalloc_exec() [1].
But, AFAICT Microsoft can also setup the hypercall page in a module as part
of its Linux Integration Services. This code uses
__vmalloc(…PAGE_KERNEL_RX), and would therefore be broken. [2]

It seems to me that the proposed restrictions on which instructions a module
is allowed to run are completely new, have significant implications, and
therefore would hopefully not be rushed in. Alternatively, a transition
period in which the checks only trigger warnings would be beneficial.

I am not a developer of the VMware hypervisor, but my understanding is that
developers have every intention to fully comply with the new restrictions on
memory mappings and instructions that a module is allowed to run. They can
consider sharing VM-entry/exit code with KVM even if requires code changes
for VMware. They just ask for sufficient time to make the required
adaptations. It would help to have a reasonable timeline for when each
of the proposed changes (__vmalloc(), MOV-CR/DR, other instructions)
is expected to be merged.

[1] https://lore.kernel.org/lkml/20200408115926.1467567-2-hch@lst.de/
[2] https://github.com/LIS/lis-next/blob/master/hv-rhel7.x/hv/arch/x86/hyperv/hv_init.c#L187


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts
  2020-04-08  9:56     ` Peter Zijlstra
  2020-04-08 10:15       ` Andrew Cooper
@ 2020-04-10 11:25       ` Masami Hiramatsu
  1 sibling, 0 replies; 74+ messages in thread
From: Masami Hiramatsu @ 2020-04-10 11:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, linux-kernel, hch, sean.j.christopherson, mingo, bp, hpa,
	x86, kenny, jeyu, rasmus.villemoes, pbonzini, fenghua.yu,
	xiaoyao.li, nadav.amit, thellstrom, tony.luck, rostedt, gregkh,
	jannh, keescook, David.Laight, dcovelli

On Wed, 8 Apr 2020 11:56:04 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Apr 08, 2020 at 05:09:34PM +0900, Masami Hiramatsu wrote:
> > On Tue, 07 Apr 2020 13:02:39 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > +static bool insn_is_vmx(struct insn *insn)
> > > +{
> > > +	u8 modrm = insn->modrm.bytes[0];
> > > +	u8 modrm_mod = X86_MODRM_MOD(modrm);
> > > +	u8 modrm_reg = X86_MODRM_REG(modrm);
> > > +
> > > +	u8 prefix = insn->prefixes.bytes[0];
> > 
> > This should be the last prefix,
> > 
> > 	u8 prefix = insn->prefixes.bytes[3];
> > 
> > (The last prefix always copied on the bytes[3])
> 
> And that is 0 on no-prefix, right?

Yes, it should be.

> > > +
> > > +	if (insn->opcode.bytes[0] != 0x0f)
> > > +		return false;
> > > +
> > > +	switch (insn->opcode.bytes[1]) {
> > > +	case 0x01:
> > > +		switch (insn->opcode.bytes[2]) {
> > 
> > Sorry, VMCALL etc. is in Grp7 (0f 01), the 3rd code is embedded
> > in modrm instead of opcode. Thus it should be,
> > 
> > 		switch (insn->modrm.value) {
> 
> Indeed, I was hoping (I really should've checked) that that byte was
> duplicated in opcodes.
> 
> Also, since I already have modrm = insn->modrm.bytes[0], I should
> probably use that anyway.

Yeah, and please use modrm.value instead of bytes[0].
(maybe bytes[0] will be OK since x86 is little-endian)

> > > +		case 0xc1: /* VMCALL */
> > > +		case 0xc2: /* VMLAUNCH */
> > > +		case 0xc3: /* VMRESUME */
> > > +		case 0xc4: /* VMXOFF */
> > 
> > 		case 0xd4:	/* VMFUNC */
> 
> As per Andrew, VMCALL and VMFUNC are SMV, and I really only need VMX in
> this case. Including SMV is probably harmless, but I'm thinking a
> smaller function is better.

I got it.

> 
> > > +			return true;
> > > +
> > > +		default:
> > > +			break;
> > > +		}
> > > +		break;
> > > +
> > > +	case 0x78: /* VMREAD */
> > > +	case 0x79: /* VMWRITE */
> > 
> > 		return !insn_is_evex(insn);
> > 
> > With EVEX prefix, these becomes vcvt* instructions.
> 
> Thanks!


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2020-04-10 11:25 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07 11:02 [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Peter Zijlstra
2020-04-07 11:02 ` [PATCH 1/4] module: Expose load_info to arch module loader code Peter Zijlstra
2020-04-07 16:52   ` Kees Cook
2020-04-07 11:02 ` [PATCH 2/4] module: Convert module_finalize() to load_info Peter Zijlstra
2020-04-07 16:53   ` Kees Cook
2020-04-07 11:02 ` [PATCH 3/4] x86,module: Detect VMX vs SLD conflicts Peter Zijlstra
2020-04-07 14:35   ` Greg KH
2020-04-07 14:44     ` Paolo Bonzini
2020-04-07 14:55       ` Greg KH
2020-04-07 14:49     ` Steven Rostedt
2020-04-07 15:24     ` Peter Zijlstra
2020-04-07 15:28       ` Paolo Bonzini
2020-04-07 15:44       ` Greg KH
2020-04-07 16:51   ` Masami Hiramatsu
2020-04-07 17:16     ` Andrew Cooper
2020-04-07 23:59       ` Masami Hiramatsu
2020-04-08  7:25     ` Masami Hiramatsu
2020-04-07 18:26   ` kbuild test robot
2020-04-07 21:25   ` David Laight
2020-04-07 23:15     ` Kees Cook
2020-04-08  2:10   ` Xiaoyao Li
2020-04-08  8:09   ` Masami Hiramatsu
2020-04-08  9:56     ` Peter Zijlstra
2020-04-08 10:15       ` Andrew Cooper
2020-04-10 11:25       ` Masami Hiramatsu
2020-04-07 11:02 ` [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Peter Zijlstra
2020-04-07 17:01   ` Kees Cook
2020-04-07 18:13     ` Peter Zijlstra
2020-04-07 18:49       ` Kees Cook
2020-04-07 18:55   ` Nadav Amit
2020-04-07 19:38     ` Peter Zijlstra
2020-04-07 20:27       ` Nadav Amit
2020-04-07 20:50         ` Peter Zijlstra
2020-04-07 21:22           ` Nadav Amit
2020-04-07 21:27             ` Peter Zijlstra
2020-04-07 22:12               ` Paolo Bonzini
2020-04-07 23:51                 ` Nadav Amit
2020-04-08  8:45                 ` Peter Zijlstra
2020-04-08  5:18               ` Christoph Hellwig
2020-04-07 23:15             ` Andrew Cooper
2020-04-08  0:22               ` Paolo Bonzini
2020-04-08  8:37                 ` Peter Zijlstra
2020-04-08  9:52                 ` Andrew Cooper
2020-04-07 21:48   ` Steven Rostedt
2020-04-08  5:58     ` Jan Kiszka
2020-04-08  8:03       ` Paolo Bonzini
2020-04-08  8:58         ` Jan Kiszka
2020-04-08  9:04           ` Paolo Bonzini
2020-04-08 10:45             ` Jan Kiszka
2020-04-08  8:51       ` Peter Zijlstra
2020-04-08  8:59         ` Jan Kiszka
2020-04-08  9:25           ` David Laight
2020-04-08 11:13             ` Jan Kiszka
2020-04-08 11:17               ` David Laight
2020-04-08  9:13         ` Peter Zijlstra
2020-04-08 10:50           ` Jan Kiszka
2020-04-08 13:27   ` Steven Rostedt
2020-04-08 15:44     ` Peter Zijlstra
2020-04-08 15:46       ` Christoph Hellwig
2020-04-08 16:02         ` Sean Christopherson
2020-04-08 16:15         ` Paolo Bonzini
2020-04-09  8:56           ` Peter Zijlstra
2020-04-09 10:13             ` Nadav Amit
2020-04-09 21:13               ` Thomas Gleixner
2020-04-09 22:18                 ` Steven Rostedt
2020-04-10  5:37                   ` Nadav Amit
2020-04-08 15:54       ` Jessica Yu
2020-04-07 17:23 ` [PATCH 0/4] x86/module: Out-of-tree module decode and sanitize Andrew Cooper
2020-04-07 19:41   ` Peter Zijlstra
2020-04-07 20:11     ` Andrew Cooper
2020-04-07 20:45       ` Peter Zijlstra
2020-04-07 21:21         ` Andrew Cooper
2020-04-07 20:21     ` Andrew Cooper
2020-04-07 20:48       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).