All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
@ 2020-04-02 12:32 Thomas Gleixner
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 12:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kenneth R. Crudup, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

Kenneth reported that a VMWare guest crashes in the VMWare BIOS due to a
Split-Lock induced #AC which is injected by the VMWare hypervisor into the
guest.

While this is a good thing in principle, it's not really practical.
That means that Split-Lock-Detection has to be disabled when any
unprepared VMX hypervisor is loaded.

As hypervisor modules are not really identifiable, the only safe solution
we came up with is to scan the module text at load time for a VMLAUNCH
instruction. If VMLAUNCH is found then Split-Lock-Detection is disabled on
the host to prevent the above. If the hypervisor has at least minimal
handling code, the module can tell the kernel by adding MOD_INFO(sld_safe,
"Y") which disables the text scan.

For KVM it's simple enough to handle it at least at the basic level by
checking guest CR0.AM and EFLAGS.AC state and a trivial host side
handler which depending on the SLD mode handles it gracefully or tells
the VMX handler to deliver the #AC to user space which then can crash
and burn itself.

As Peter and myself don't have access to a SLD enabled machine, the
KVM/VMX part is untested. The module scan part works.

Alternatively we can obviously revert SLD, but that does not make the
problem vs. out of tree hypervisors go away magically. So we can just
get over it now.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 78+ messages in thread

* [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Thomas Gleixner
@ 2020-04-02 12:32 ` Thomas Gleixner
  2020-04-02 15:23   ` [patch v2 " Peter Zijlstra
                     ` (5 more replies)
  2020-04-02 12:33 ` [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage Thomas Gleixner
  2020-04-02 13:43 ` [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Kenneth R. Crudup
  2 siblings, 6 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 12:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kenneth R. Crudup, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

From: Peter Zijlstra <peterz@infradead.org>

It turns out that with Split-Lock-Detect enabled (default) any VMX
hypervisor needs at least a little modification in order to not blindly
inject the #AC into the guest without the guest being ready for it.

Since there is no telling which module implements a hypervisor, scan the
module text and look for the VMLAUNCH instruction. If found, the module is
assumed to be a hypervisor of some sort and SLD is disabled.

Hypervisors, which have been modified and are known to work correctly,
can add:

  MODULE_INFO(sld_safe, "Y");

to explicitly tell the module loader they're good.

NOTE: it is unfortunate that struct load_info is not available to the
      arch module code, this means CONFIG_CPU_SUP_INTEL gunk is needed
      in generic code.

NOTE: while we can 'trivially' fix KVM, we're still stuck with stuff
      like VMware and VirtualBox doing their own thing.

Reported-by: "Kenneth R. Crudup" <kenny@panix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/cpu.h  |    2 ++
 arch/x86/kernel/cpu/intel.c |   38 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/module.c    |    6 ++++++
 include/linux/module.h      |    4 ++++
 kernel/module.c             |    5 +++++
 5 files changed, 54 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int s
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -51,5 +52,6 @@ static inline bool handle_user_split_loc
 {
 	return false;
 }
+static inline void split_lock_validate_module_text(struct module *me, void *text, void *text_end) {}
 #endif
 #endif /* _ASM_X86_CPU_H */
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -9,6 +9,7 @@
 #include <linux/thread_info.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
+#include <linux/module.h>
 
 #include <asm/cpufeature.h>
 #include <asm/pgtable.h>
@@ -21,6 +22,7 @@
 #include <asm/elf.h>
 #include <asm/cpu_device_id.h>
 #include <asm/cmdline.h>
+#include <asm/insn.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -1055,12 +1057,46 @@ static void sld_update_msr(bool on)
 {
 	u64 test_ctrl_val = msr_test_ctrl_cache;
 
-	if (on)
+	if (on && (sld_state != sld_off))
 		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
 
 	wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
 }
 
+static void sld_remote_kill(void *arg)
+{
+	sld_update_msr(false);
+}
+
+void split_lock_validate_module_text(struct module *me, void *text, void *text_end)
+{
+	u8 vmlaunch[] = { 0x0f, 0x01, 0xc2 };
+	struct insn insn;
+
+	if (sld_state == sld_off)
+		return;
+
+	while (text < text_end) {
+		kernel_insn_init(&insn, text, text_end - text);
+		insn_get_length(&insn);
+
+		if (WARN_ON_ONCE(!insn_complete(&insn)))
+			break;
+
+		if (insn.length == sizeof(vmlaunch) && !memcmp(text, vmlaunch, sizeof(vmlaunch)))
+			goto bad_module;
+
+		text += insn.length;
+	}
+
+	return;
+
+bad_module:
+	pr_warn("disabled due to VMLAUNCH in module: %s\n", me->name);
+	sld_state = sld_off;
+	on_each_cpu(sld_remote_kill, NULL, 1);
+}
+
 static void split_lock_init(void)
 {
 	split_lock_verify_msr(sld_state != sld_off);
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -24,6 +24,7 @@
 #include <asm/pgtable.h>
 #include <asm/setup.h>
 #include <asm/unwind.h>
+#include <asm/cpu.h>
 
 #if 0
 #define DEBUGP(fmt, ...)				\
@@ -253,6 +254,11 @@ int module_finalize(const Elf_Ehdr *hdr,
 					    tseg, tseg + text->sh_size);
 	}
 
+	if (text && !me->sld_safe) {
+		void *tseg = (void *)text->sh_addr;
+		split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
+	}
+
 	if (para) {
 		void *pseg = (void *)para->sh_addr;
 		apply_paravirt(pseg, pseg + para->sh_size);
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -407,6 +407,10 @@ struct module {
 	bool sig_ok;
 #endif
 
+#ifdef CONFIG_CPU_SUP_INTEL
+	bool sld_safe;
+#endif
+
 	bool async_probe_requested;
 
 	/* symbols that will be GPL-only in the near future. */
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3096,6 +3096,11 @@ static int check_modinfo(struct module *
 			"is unknown, you have been warned.\n", mod->name);
 	}
 
+#ifdef CONFIG_CPU_SUP_INTEL
+	if (get_modinfo(info, "sld_safe"))
+		mod->sld_safe = true;
+#endif
+
 	err = check_modinfo_livepatch(mod, info);
 	if (err)
 		return err;


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage
  2020-04-02 12:32 [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Thomas Gleixner
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
@ 2020-04-02 12:33 ` Thomas Gleixner
  2020-04-02 15:30   ` Sean Christopherson
  2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
  2020-04-02 13:43 ` [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Kenneth R. Crudup
  2 siblings, 2 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 12:33 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra (Intel),
	Jessica Yu, Steven Rostedt

Without at least minimal handling for split lock detection induced #AC, VMX
will just run into the same problem as the VMWare hypervisor, which was
reported by Kenneth.

It will inject the #AC blindly into the guest whether the guest is prepared
or not.

Add the minimal required handling for it:

  - Check guest state whether CR0.AM is enabled and EFLAGS.AC is set.  If
    so, then the #AC originated from CPL3 and the guest has is prepared to
    handle it. In this case it does not matter whether the #AC is due to a
    split lock or a regular unaligned check.

 - Invoke a minimal split lock detection handler. If the host SLD mode is
   sld_warn, then handle it in the same way as user space handling works:
   Emit a warning, disable SLD and mark the current task with TIF_SLD.
   With that resume the guest without injecting #AC.

   If the host mode is sld_fatal or sld_off, emit a warning and deliver
   the exception to user space which can crash and burn itself.

Mark the module with MOD_INFO(sld_safe, "Y") so the module loader does not
force SLD off.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Kenneth R. Crudup" <kenny@panix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/cpu.h  |    1 +
 arch/x86/kernel/cpu/intel.c |   28 +++++++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c      |   40 +++++++++++++++++++++++++++++++++++++---
 3 files changed, 61 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int s
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern bool handle_guest_split_lock(unsigned long ip);
 extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1102,13 +1102,10 @@ static void split_lock_init(void)
 	split_lock_verify_msr(sld_state != sld_off);
 }
 
-bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+static void split_lock_warn(unsigned long ip)
 {
-	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
-		return false;
-
 	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
-			    current->comm, current->pid, regs->ip);
+			    current->comm, current->pid, ip);
 
 	/*
 	 * Disable the split lock detection for this task so it can make
@@ -1117,6 +1114,27 @@ bool handle_user_split_lock(struct pt_re
 	 */
 	sld_update_msr(false);
 	set_tsk_thread_flag(current, TIF_SLD);
+}
+
+bool handle_guest_split_lock(unsigned long ip)
+{
+	if (sld_state == sld_warn) {
+		split_lock_warn(ip);
+		return true;
+	}
+
+	pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
+		     current->comm, current->pid,
+		     sld_state == sld_fatal ? "fatal" : "bogus", ip);
+	return false;
+}
+EXPORT_SYMBOL_GPL(handle_guest_split_lock);
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+	split_lock_warn(regs->ip);
 	return true;
 }
 
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -65,6 +65,7 @@
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
+MODULE_INFO(sld_safe, "Y");
 
 #ifdef MODULE
 static const struct x86_cpu_id vmx_cpu_id[] = {
@@ -4623,6 +4624,22 @@ static int handle_machine_check(struct k
 	return 1;
 }
 
+static bool guest_handles_ac(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * If guest has alignment checking enabled in CR0 and activated in
+	 * eflags, then the #AC originated from CPL3 and the guest is able
+	 * to handle it. It does not matter whether this is a regular or
+	 * a split lock operation induced #AC.
+	 */
+	if (vcpu->arch.cr0 & X86_CR0_AM &&
+	    vmx_get_rflags(vcpu) & X86_EFLAGS_AC)
+		return true;
+
+	/* Add guest SLD handling checks here once it's supported */
+	return false;
+}
+
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4688,9 +4705,6 @@ static int handle_exception_nmi(struct k
 		return handle_rmode_exception(vcpu, ex_no, error_code);
 
 	switch (ex_no) {
-	case AC_VECTOR:
-		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
-		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
@@ -4719,6 +4733,26 @@ static int handle_exception_nmi(struct k
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
 		break;
+	case AC_VECTOR:
+		if (guest_handles_ac(vcpu)) {
+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			return 1;
+		}
+		/*
+		 * Handle #AC caused by split lock detection. If the host
+		 * mode is sld_warn, then it warns, marks current with
+		 * TIF_SLD and disables split lock detection. So the guest
+		 * can just continue.
+		 *
+		 * If the host mode is fatal, the handling code warned. Let
+		 * qemu kill itself.
+		 *
+		 * If the host mode is off, then this #AC is bonkers and
+		 * something is badly wrong. Let it fail as well.
+		 */
+		if (handle_guest_split_lock(kvm_rip_read(vcpu)))
+			return 1;
+		/* fall through */
 	default:
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 12:32 [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Thomas Gleixner
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
  2020-04-02 12:33 ` [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage Thomas Gleixner
@ 2020-04-02 13:43 ` Kenneth R. Crudup
  2020-04-02 14:32   ` Peter Zijlstra
  2020-04-02 14:37   ` Thomas Gleixner
  2 siblings, 2 replies; 78+ messages in thread
From: Kenneth R. Crudup @ 2020-04-02 13:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt


On Thu, 2 Apr 2020, Thomas Gleixner wrote:

> As Peter and myself don't have access to a SLD enabled machine, the
> KVM/VMX part is untested. The module scan part works.

I just applied both of these patches to my (Linus' tip) tree, and unfortunately
VMWare still hangs if split_lock_detect= is set to anything but "off".

Was there anything else I'd needed to do?

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 13:43 ` [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Kenneth R. Crudup
@ 2020-04-02 14:32   ` Peter Zijlstra
  2020-04-02 14:41     ` Kenneth R. Crudup
  2020-04-02 14:37   ` Thomas Gleixner
  1 sibling, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 14:32 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Thomas Gleixner, LKML, x86, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 06:43:19AM -0700, Kenneth R. Crudup wrote:
> 
> On Thu, 2 Apr 2020, Thomas Gleixner wrote:
> 
> > As Peter and myself don't have access to a SLD enabled machine, the
> > KVM/VMX part is untested. The module scan part works.
> 
> I just applied both of these patches to my (Linus' tip) tree, and unfortunately
> VMWare still hangs if split_lock_detect= is set to anything but "off".
> 
> Was there anything else I'd needed to do?

Hmm, you're not seeing this:

  +       pr_warn("disabled due to VMLAUNCH in module: %s\n", me->name);

fire when you load the vmware kernel module? Could you slip me a copy of
that thing by private mail?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 13:43 ` [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Kenneth R. Crudup
  2020-04-02 14:32   ` Peter Zijlstra
@ 2020-04-02 14:37   ` Thomas Gleixner
  2020-04-02 14:47     ` Nadav Amit
  1 sibling, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 14:37 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: LKML, x86, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

"Kenneth R. Crudup" <kenny@panix.com> writes:

> On Thu, 2 Apr 2020, Thomas Gleixner wrote:
>
>> As Peter and myself don't have access to a SLD enabled machine, the
>> KVM/VMX part is untested. The module scan part works.
>
> I just applied both of these patches to my (Linus' tip) tree, and unfortunately
> VMWare still hangs if split_lock_detect= is set to anything but "off".
>
> Was there anything else I'd needed to do?

Hmm. Not really. Does dmesg show the warning when the VMWare module loads?
Something like:

  x86/split lock detection: disabled due to VMLAUNCH in module: ....

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 14:32   ` Peter Zijlstra
@ 2020-04-02 14:41     ` Kenneth R. Crudup
  2020-04-02 14:46       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Kenneth R. Crudup @ 2020-04-02 14:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, x86, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt


On Thu, 2 Apr 2020, Peter Zijlstra wrote:

> Hmm, you're not seeing this:
>   +       pr_warn("disabled due to VMLAUNCH in module: %s\n", me->name);
> fire when you load the vmware kernel module?

I just looked back at the syslog's copy of the kernel messages at the time
I'd tried it, and no, I don't see that message.

> Could you slip me a copy of that thing by private mail?

OK, gimme a couple of days though, I've gotta get a little work done.
(Also, what "thing" exactly did you want?)

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 14:41     ` Kenneth R. Crudup
@ 2020-04-02 14:46       ` Peter Zijlstra
  2020-04-02 14:53         ` Kenneth R. Crudup
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 14:46 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Thomas Gleixner, LKML, x86, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 07:41:41AM -0700, Kenneth R. Crudup wrote:
> 
> On Thu, 2 Apr 2020, Peter Zijlstra wrote:
> 
> > Hmm, you're not seeing this:
> >   +       pr_warn("disabled due to VMLAUNCH in module: %s\n", me->name);
> > fire when you load the vmware kernel module?
> 
> I just looked back at the syslog's copy of the kernel messages at the time
> I'd tried it, and no, I don't see that message.

Dang!

> > Could you slip me a copy of that thing by private mail?
> 
> OK, gimme a couple of days though, I've gotta get a little work done.
> (Also, what "thing" exactly did you want?)

All the .ko files that come with vmware. I want to dig through them to
see why the VMLAUNCH detection thing isn't working.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 14:37   ` Thomas Gleixner
@ 2020-04-02 14:47     ` Nadav Amit
  2020-04-02 15:11       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Nadav Amit @ 2020-04-02 14:47 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra (Intel)
  Cc: Kenneth R. Crudup, LKML, x86, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Thomas Hellstrom, Sean Christopherson,
	Tony Luck, Steven Rostedt

> On Apr 2, 2020, at 7:37 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> "Kenneth R. Crudup" <kenny@panix.com> writes:
> 
>> On Thu, 2 Apr 2020, Thomas Gleixner wrote:
>> 
>>> As Peter and myself don't have access to a SLD enabled machine, the
>>> KVM/VMX part is untested. The module scan part works.
>> 
>> I just applied both of these patches to my (Linus' tip) tree, and unfortunately
>> VMWare still hangs if split_lock_detect= is set to anything but "off".
>> 
>> Was there anything else I'd needed to do?
> 
> Hmm. Not really. Does dmesg show the warning when the VMWare module loads?
> Something like:
> 
>  x86/split lock detection: disabled due to VMLAUNCH in module: ….

I ran an objdump on VMware workstation and indeed I do not see a
VMLAUNCH/VMRESUME. I do see however VMXON which should also be good for
detecting hypervisors. I will try to understand why VMLAUNCH is not there.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 14:46       ` Peter Zijlstra
@ 2020-04-02 14:53         ` Kenneth R. Crudup
  0 siblings, 0 replies; 78+ messages in thread
From: Kenneth R. Crudup @ 2020-04-02 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, x86, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt


> > (Also, what "thing" exactly did you want?)

On Thu, 2 Apr 2020, Peter Zijlstra wrote:

> All the .ko files that come with vmware.

Ah, gotcha. One thing you/VMWare may want to look at is it turns out that
"vmw_vmci", part of the kernel tree (CONFIG_VMWARE_VMCI) has to be compiled
into the kernel as a module so udev can properly create the Misc device node
(I'd tried making it a built-in and messing with udev rules and the other,
compiled-in-later VMWare module loading system didn't seem to like that).

Maybe some sort of mitigation for this can be done there, putting it back in-tree?

But anyway, I'll send you a .tar.bz2 in a little bit.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors
  2020-04-02 14:47     ` Nadav Amit
@ 2020-04-02 15:11       ` Peter Zijlstra
  0 siblings, 0 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 15:11 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, Kenneth R. Crudup, LKML, x86, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 02:47:33PM +0000, Nadav Amit wrote:
> > On Apr 2, 2020, at 7:37 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> > "Kenneth R. Crudup" <kenny@panix.com> writes:
> > 
> >> On Thu, 2 Apr 2020, Thomas Gleixner wrote:
> >> 
> >>> As Peter and myself don't have access to a SLD enabled machine, the
> >>> KVM/VMX part is untested. The module scan part works.
> >> 
> >> I just applied both of these patches to my (Linus' tip) tree, and unfortunately
> >> VMWare still hangs if split_lock_detect= is set to anything but "off".
> >> 
> >> Was there anything else I'd needed to do?
> > 
> > Hmm. Not really. Does dmesg show the warning when the VMWare module loads?
> > Something like:
> > 
> >  x86/split lock detection: disabled due to VMLAUNCH in module: ….
> 
> I ran an objdump on VMware workstation and indeed I do not see a
> VMLAUNCH/VMRESUME. I do see however VMXON which should also be good for
> detecting hypervisors. I will try to understand why VMLAUNCH is not there.

Let me send a version with VMXON detection added in as well.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
@ 2020-04-02 15:23   ` Peter Zijlstra
  2020-04-02 16:20     ` Xiaoyao Li
  2020-04-03  8:09     ` David Laight
  2020-04-02 23:42   ` [patch " Rasmus Villemoes
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 15:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt


I picked VMXOFF (which also appears in vmmon.ko) instead of VMXON
because that latter takes an argument is therefore more difficult to
decode.

---
Subject: x86,module: Detect VMX modules and disable Split-Lock-Detect
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 02 Apr 2020 14:32:59 +0200

It turns out that with Split-Lock-Detect enabled (default) any VMX
hypervisor needs at least a little modification in order to not blindly
inject the #AC into the guest without the guest being ready for it.

Since there is no telling which module implements a hypervisor, scan the
module text and look for the VMLAUNCH/VMXOFF instructions. If found, the
module is assumed to be a hypervisor of some sort and SLD is disabled.

Hypervisors, which have been modified and are known to work correctly,
can add:

  MODULE_INFO(sld_safe, "Y");

to explicitly tell the module loader they're good.

NOTE: it is unfortunate that struct load_info is not available to the
      arch module code, this means CONFIG_CPU_SUP_INTEL gunk is needed
      in generic code.

NOTE: while we can 'trivially' fix KVM, we're still stuck with stuff
      like VMware and VirtualBox doing their own thing.

Reported-by: "Kenneth R. Crudup" <kenny@panix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/cpu.h  |    2 ++
 arch/x86/kernel/cpu/intel.c |   41 ++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/module.c    |    6 ++++++
 include/linux/module.h      |    4 ++++
 kernel/module.c             |    5 +++++
 5 files changed, 57 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int s
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -51,5 +52,6 @@ static inline bool handle_user_split_loc
 {
 	return false;
 }
+static inline void split_lock_validate_module_text(struct module *me, void *text, void *text_end) {}
 #endif
 #endif /* _ASM_X86_CPU_H */
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -9,6 +9,7 @@
 #include <linux/thread_info.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
+#include <linux/module.h>
 
 #include <asm/cpufeature.h>
 #include <asm/pgtable.h>
@@ -21,6 +22,7 @@
 #include <asm/elf.h>
 #include <asm/cpu_device_id.h>
 #include <asm/cmdline.h>
+#include <asm/insn.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -1055,12 +1057,49 @@ static void sld_update_msr(bool on)
 {
 	u64 test_ctrl_val = msr_test_ctrl_cache;
 
-	if (on)
+	if (on && (sld_state != sld_off))
 		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
 
 	wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
 }
 
+static void sld_remote_kill(void *arg)
+{
+	sld_update_msr(false);
+}
+
+void split_lock_validate_module_text(struct module *me, void *text, void *text_end)
+{
+	u8 vmxoff[] = { 0x0f, 0x01, 0xc4 };
+	u8 vmlaunch[] = { 0x0f, 0x01, 0xc2 };
+	struct insn insn;
+
+	if (sld_state == sld_off)
+		return;
+
+	while (text < text_end) {
+		kernel_insn_init(&insn, text, text_end - text);
+		insn_get_length(&insn);
+
+		if (WARN_ON_ONCE(!insn_complete(&insn)))
+			break;
+
+		if (insn.length == 3 &&
+		    (!memcmp(text, vmlaunch, sizeof(vmlaunch)) ||
+		     !memcmp(text, vmxoff, sizeof(vmxoff))))
+				goto bad_module;
+
+		text += insn.length;
+	}
+
+	return;
+
+bad_module:
+	pr_warn("disabled due to VMX in module: %s\n", me->name);
+	sld_state = sld_off;
+	on_each_cpu(sld_remote_kill, NULL, 1);
+}
+
 static void split_lock_init(void)
 {
 	split_lock_verify_msr(sld_state != sld_off);
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -24,6 +24,7 @@
 #include <asm/pgtable.h>
 #include <asm/setup.h>
 #include <asm/unwind.h>
+#include <asm/cpu.h>
 
 #if 0
 #define DEBUGP(fmt, ...)				\
@@ -253,6 +254,11 @@ int module_finalize(const Elf_Ehdr *hdr,
 					    tseg, tseg + text->sh_size);
 	}
 
+	if (text && !me->sld_safe) {
+		void *tseg = (void *)text->sh_addr;
+		split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
+	}
+
 	if (para) {
 		void *pseg = (void *)para->sh_addr;
 		apply_paravirt(pseg, pseg + para->sh_size);
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -407,6 +407,10 @@ struct module {
 	bool sig_ok;
 #endif
 
+#ifdef CONFIG_CPU_SUP_INTEL
+	bool sld_safe;
+#endif
+
 	bool async_probe_requested;
 
 	/* symbols that will be GPL-only in the near future. */
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3096,6 +3096,11 @@ static int check_modinfo(struct module *
 			"is unknown, you have been warned.\n", mod->name);
 	}
 
+#ifdef CONFIG_CPU_SUP_INTEL
+	if (get_modinfo(info, "sld_safe"))
+		mod->sld_safe = true;
+#endif
+
 	err = check_modinfo_livepatch(mod, info);
 	if (err)
 		return err;

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage
  2020-04-02 12:33 ` [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage Thomas Gleixner
@ 2020-04-02 15:30   ` Sean Christopherson
  2020-04-02 15:44     ` Nadav Amit
  2020-04-02 16:56     ` Thomas Gleixner
  2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
  1 sibling, 2 replies; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 15:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu,
	Xiaoyao Li, Nadav Amit, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra (Intel),
	Jessica Yu, Steven Rostedt

On Thu, Apr 02, 2020 at 02:33:00PM +0200, Thomas Gleixner wrote:
> Without at least minimal handling for split lock detection induced #AC, VMX
> will just run into the same problem as the VMWare hypervisor, which was
> reported by Kenneth.
> 
> It will inject the #AC blindly into the guest whether the guest is prepared
> or not.
> 
> Add the minimal required handling for it:
> 
>   - Check guest state whether CR0.AM is enabled and EFLAGS.AC is set.  If
>     so, then the #AC originated from CPL3 and the guest has is prepared to
>     handle it. In this case it does not matter whether the #AC is due to a
>     split lock or a regular unaligned check.
> 
>  - Invoke a minimal split lock detection handler. If the host SLD mode is
>    sld_warn, then handle it in the same way as user space handling works:
>    Emit a warning, disable SLD and mark the current task with TIF_SLD.
>    With that resume the guest without injecting #AC.
> 
>    If the host mode is sld_fatal or sld_off, emit a warning and deliver
>    the exception to user space which can crash and burn itself.
> 
> Mark the module with MOD_INFO(sld_safe, "Y") so the module loader does not
> force SLD off.

Some comments below.  But, any objection to taking Xiaoyao's patches that
do effectively the same things, minus the MOD_INFO()?  I'll repost them in
reply to this thread.
 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Kenneth R. Crudup" <kenny@panix.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Xiaoyao Li <xiaoyao.li@intel.com>
> Cc: Nadav Amit <namit@vmware.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/cpu.h  |    1 +
>  arch/x86/kernel/cpu/intel.c |   28 +++++++++++++++++++++++-----
>  arch/x86/kvm/vmx/vmx.c      |   40 +++++++++++++++++++++++++++++++++++++---
>  3 files changed, 61 insertions(+), 8 deletions(-)
> 
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int s
>  extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
>  extern void switch_to_sld(unsigned long tifn);
>  extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern bool handle_guest_split_lock(unsigned long ip);
>  extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
>  #else
>  static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1102,13 +1102,10 @@ static void split_lock_init(void)
>  	split_lock_verify_msr(sld_state != sld_off);
>  }
>  
> -bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +static void split_lock_warn(unsigned long ip)
>  {
> -	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> -		return false;
> -
>  	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> -			    current->comm, current->pid, regs->ip);
> +			    current->comm, current->pid, ip);
>  
>  	/*
>  	 * Disable the split lock detection for this task so it can make
> @@ -1117,6 +1114,27 @@ bool handle_user_split_lock(struct pt_re
>  	 */
>  	sld_update_msr(false);
>  	set_tsk_thread_flag(current, TIF_SLD);
> +}
> +
> +bool handle_guest_split_lock(unsigned long ip)
> +{
> +	if (sld_state == sld_warn) {
> +		split_lock_warn(ip);
> +		return true;
> +	}
> +
> +	pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
> +		     current->comm, current->pid,
> +		     sld_state == sld_fatal ? "fatal" : "bogus", ip);
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(handle_guest_split_lock);
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +		return false;
> +	split_lock_warn(regs->ip);
>  	return true;
>  }
>  
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -65,6 +65,7 @@
>  
>  MODULE_AUTHOR("Qumranet");
>  MODULE_LICENSE("GPL");
> +MODULE_INFO(sld_safe, "Y");
>  
>  #ifdef MODULE
>  static const struct x86_cpu_id vmx_cpu_id[] = {
> @@ -4623,6 +4624,22 @@ static int handle_machine_check(struct k
>  	return 1;
>  }
>  
> +static bool guest_handles_ac(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * If guest has alignment checking enabled in CR0 and activated in
> +	 * eflags, then the #AC originated from CPL3 and the guest is able
> +	 * to handle it. It does not matter whether this is a regular or
> +	 * a split lock operation induced #AC.
> +	 */
> +	if (vcpu->arch.cr0 & X86_CR0_AM &&

Technically not required since KVM doesn't let the gets toggle CR0.AM at
will, but going through kvm_read_cr0{_bits}() is preferred.

> +	    vmx_get_rflags(vcpu) & X86_EFLAGS_AC)

I don't think this is correct.  A guest could trigger a split-lock #AC at
CPL0 with EFLAGS.AC=1 and CR0.AM=1, and then panic because it didn't expect
#AC at CPL0.

> +		return true;
> +
> +	/* Add guest SLD handling checks here once it's supported */
> +	return false;
> +}
> +
>  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4688,9 +4705,6 @@ static int handle_exception_nmi(struct k
>  		return handle_rmode_exception(vcpu, ex_no, error_code);
>  
>  	switch (ex_no) {
> -	case AC_VECTOR:
> -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> -		return 1;
>  	case DB_VECTOR:
>  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
>  		if (!(vcpu->guest_debug &
> @@ -4719,6 +4733,26 @@ static int handle_exception_nmi(struct k
>  		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>  		kvm_run->debug.arch.exception = ex_no;
>  		break;
> +	case AC_VECTOR:
> +		if (guest_handles_ac(vcpu)) {
> +			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> +			return 1;
> +		}
> +		/*
> +		 * Handle #AC caused by split lock detection. If the host
> +		 * mode is sld_warn, then it warns, marks current with
> +		 * TIF_SLD and disables split lock detection. So the guest
> +		 * can just continue.
> +		 *
> +		 * If the host mode is fatal, the handling code warned. Let
> +		 * qemu kill itself.
> +		 *
> +		 * If the host mode is off, then this #AC is bonkers and
> +		 * something is badly wrong. Let it fail as well.
> +		 */
> +		if (handle_guest_split_lock(kvm_rip_read(vcpu)))
> +			return 1;
> +		/* fall through */
>  	default:
>  		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
>  		kvm_run->ex.exception = ex_no;
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage
  2020-04-02 15:30   ` Sean Christopherson
@ 2020-04-02 15:44     ` Nadav Amit
  2020-04-02 16:04       ` Sean Christopherson
  2020-04-02 16:56     ` Thomas Gleixner
  1 sibling, 1 reply; 78+ messages in thread
From: Nadav Amit @ 2020-04-02 15:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra (Intel),
	Jessica Yu, Steven Rostedt

> On Apr 2, 2020, at 8:30 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Thu, Apr 02, 2020 at 02:33:00PM +0200, Thomas Gleixner wrote:
>> Without at least minimal handling for split lock detection induced #AC, VMX
>> will just run into the same problem as the VMWare hypervisor, which was
>> reported by Kenneth.
>> 
>> It will inject the #AC blindly into the guest whether the guest is prepared
>> or not.
>> 
>> Add the minimal required handling for it:
>> 
>>  - Check guest state whether CR0.AM is enabled and EFLAGS.AC is set.  If
>>    so, then the #AC originated from CPL3 and the guest has is prepared to
>>    handle it. In this case it does not matter whether the #AC is due to a
>>    split lock or a regular unaligned check.
>> 
>> - Invoke a minimal split lock detection handler. If the host SLD mode is
>>   sld_warn, then handle it in the same way as user space handling works:
>>   Emit a warning, disable SLD and mark the current task with TIF_SLD.
>>   With that resume the guest without injecting #AC.
>> 
>>   If the host mode is sld_fatal or sld_off, emit a warning and deliver
>>   the exception to user space which can crash and burn itself.
>> 
>> Mark the module with MOD_INFO(sld_safe, "Y") so the module loader does not
>> force SLD off.
> 
> Some comments below.  But, any objection to taking Xiaoyao's patches that
> do effectively the same things, minus the MOD_INFO()?  I'll repost them in
> reply to this thread.

IIUC they also deal with emulated split-lock accesses in the host, during
instruction emulation [1]. This seems also to be required, although I am not
sure the approach that he took once emulation encounters a split-lock is
robust.

[1] https://lore.kernel.org/lkml/20200324151859.31068-5-xiaoyao.li@intel.com/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling
  2020-04-02 12:33 ` [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage Thomas Gleixner
  2020-04-02 15:30   ` Sean Christopherson
@ 2020-04-02 15:55   ` Sean Christopherson
  2020-04-02 15:55     ` [PATCH 1/3] KVM: x86: Emulate split-lock access as a write in emulator Sean Christopherson
                       ` (3 more replies)
  1 sibling, 4 replies; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 15:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel

First three patches from Xiaoyao's series to add split-lock #AC support
in KVM.

Xiaoyao Li (3):
  KVM: x86: Emulate split-lock access as a write in emulator
  x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in
    guest

 arch/x86/include/asm/cpu.h  |  4 ++--
 arch/x86/kernel/cpu/intel.c |  7 ++++---
 arch/x86/kernel/traps.c     |  2 +-
 arch/x86/kvm/vmx/vmx.c      | 30 +++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c          | 12 +++++++++++-
 5 files changed, 45 insertions(+), 10 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 1/3] KVM: x86: Emulate split-lock access as a write in emulator
  2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
@ 2020-04-02 15:55     ` Sean Christopherson
  2020-04-02 15:55     ` [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM Sean Christopherson
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 15:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel

From: Xiaoyao Li <xiaoyao.li@intel.com>

Emulate split-lock accesses as writes if split lock detection is on to
avoid #AC during emulation, which will result in a panic().  This should
never occur for a well behaved guest, but a malicious guest can
manipulate the TLB to trigger emulation of a locked instruction[1].

More discussion can be found [2][3].

[1] https://lkml.kernel.org/r/8c5b11c9-58df-38e7-a514-dc12d687b198@redhat.com
[2] https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com
[3] https://lkml.kernel.org/r/20200227001117.GX9940@linux.intel.com

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf8564d73fc3..37ce0fc9a62d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5875,6 +5875,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 {
 	struct kvm_host_map map;
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+	u64 page_line_mask;
 	gpa_t gpa;
 	char *kaddr;
 	bool exchanged;
@@ -5889,7 +5890,16 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 	    (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
 		goto emul_write;
 
-	if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
+	/*
+	 * Emulate the atomic as a straight write to avoid #AC if SLD is
+	 * enabled in the host and the access splits a cache line.
+	 */
+	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+		page_line_mask = ~(cache_line_size() - 1);
+	else
+		page_line_mask = PAGE_MASK;
+
+	if (((gpa + bytes - 1) & page_line_mask) != (gpa & page_line_mask))
 		goto emul_write;
 
 	if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
  2020-04-02 15:55     ` [PATCH 1/3] KVM: x86: Emulate split-lock access as a write in emulator Sean Christopherson
@ 2020-04-02 15:55     ` Sean Christopherson
  2020-04-02 17:01       ` Thomas Gleixner
  2020-04-02 15:55     ` [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest Sean Christopherson
  2020-04-10 10:23     ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Paolo Bonzini
  3 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 15:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel

From: Xiaoyao Li <xiaoyao.li@intel.com>

In the future, KVM will use handle_user_split_lock() to handle #AC
caused by split lock in guest. Due to the fact that KVM doesn't have
a @regs context and will pre-check EFLAGS.AC, move the EFLAGS.AC check
to do_alignment_check().

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/cpu.h  | 4 ++--
 arch/x86/kernel/cpu/intel.c | 7 ++++---
 arch/x86/kernel/traps.c     | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index ff6f3ca649b3..ff567afa6ee1 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -43,11 +43,11 @@ unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
-extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern bool handle_user_split_lock(unsigned long ip);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
-static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+static inline bool handle_user_split_lock(unsigned long ip)
 {
 	return false;
 }
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 9a26e972cdea..7688f51aabdb 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1066,13 +1066,13 @@ static void split_lock_init(void)
 	split_lock_verify_msr(sld_state != sld_off);
 }
 
-bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+bool handle_user_split_lock(unsigned long ip)
 {
-	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+	if (sld_state == sld_fatal)
 		return false;
 
 	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
-			    current->comm, current->pid, regs->ip);
+			    current->comm, current->pid, ip);
 
 	/*
 	 * Disable the split lock detection for this task so it can make
@@ -1083,6 +1083,7 @@ bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 	set_tsk_thread_flag(current, TIF_SLD);
 	return true;
 }
+EXPORT_SYMBOL_GPL(handle_user_split_lock);
 
 /*
  * This function is called only when switching between tasks with
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d54cffdc7cac..55902c5bf0aa 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -304,7 +304,7 @@ dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
 
 	local_irq_enable();
 
-	if (handle_user_split_lock(regs, error_code))
+	if (!(regs->flags & X86_EFLAGS_AC) && handle_user_split_lock(regs->ip))
 		return;
 
 	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
  2020-04-02 15:55     ` [PATCH 1/3] KVM: x86: Emulate split-lock access as a write in emulator Sean Christopherson
  2020-04-02 15:55     ` [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM Sean Christopherson
@ 2020-04-02 15:55     ` Sean Christopherson
  2020-04-02 17:19       ` Thomas Gleixner
  2020-04-10 10:23     ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Paolo Bonzini
  3 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 15:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel

From: Xiaoyao Li <xiaoyao.li@intel.com>

Two types #AC can be generated in Intel CPUs:
 1. legacy alignment check #AC
 2. split lock #AC

Reflect #AC back into the guest if the guest has legacy alignment checks
enabled or if SLD is disabled.  If SLD is enabled, treat the guest like
a host userspace application by calling handle_user_split_lock().  If
the #AC is handled (SLD disabled and TIF_SLD set), then simply resume
the guest.  If the #AC isn't handled, i.e. host is sld_fatal, then
forward the #AC to the userspace VMM, similar to sending SIGBUS.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 458e684dfbdc..a96cfda0a5b9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4623,6 +4623,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
+	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
+}
+
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4688,9 +4694,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		return handle_rmode_exception(vcpu, ex_no, error_code);
 
 	switch (ex_no) {
-	case AC_VECTOR:
-		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
-		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
@@ -4719,6 +4722,27 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
 		break;
+	case AC_VECTOR:
+		/*
+		 * Reflect #AC to the guest if it's expecting the #AC, i.e. has
+		 * legacy alignment check enabled.  Pre-check host split lock
+		 * turned on to avoid the VMREADs needed to check legacy #AC,
+		 * i.e. reflect the #AC if the only possible source is legacy
+		 * alignment checks.
+		 */
+		if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ||
+		    guest_cpu_alignment_check_enabled(vcpu)) {
+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			return 1;
+		}
+
+		/*
+		 * Forward the #AC to userspace if kernel policy does not allow
+		 * temporarily disabling split lock detection.
+		 */
+		if (handle_user_split_lock(kvm_rip_read(vcpu)))
+			return 1;
+		fallthrough;
 	default:
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage
  2020-04-02 15:44     ` Nadav Amit
@ 2020-04-02 16:04       ` Sean Christopherson
  0 siblings, 0 replies; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 16:04 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra (Intel),
	Jessica Yu, Steven Rostedt

On Thu, Apr 02, 2020 at 03:44:00PM +0000, Nadav Amit wrote:
> > On Apr 2, 2020, at 8:30 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > On Thu, Apr 02, 2020 at 02:33:00PM +0200, Thomas Gleixner wrote:
> >> Without at least minimal handling for split lock detection induced #AC, VMX
> >> will just run into the same problem as the VMWare hypervisor, which was
> >> reported by Kenneth.
> >> 
> >> It will inject the #AC blindly into the guest whether the guest is prepared
> >> or not.
> >> 
> >> Add the minimal required handling for it:
> >> 
> >>  - Check guest state whether CR0.AM is enabled and EFLAGS.AC is set.  If
> >>    so, then the #AC originated from CPL3 and the guest has is prepared to
> >>    handle it. In this case it does not matter whether the #AC is due to a
> >>    split lock or a regular unaligned check.
> >> 
> >> - Invoke a minimal split lock detection handler. If the host SLD mode is
> >>   sld_warn, then handle it in the same way as user space handling works:
> >>   Emit a warning, disable SLD and mark the current task with TIF_SLD.
> >>   With that resume the guest without injecting #AC.
> >> 
> >>   If the host mode is sld_fatal or sld_off, emit a warning and deliver
> >>   the exception to user space which can crash and burn itself.
> >> 
> >> Mark the module with MOD_INFO(sld_safe, "Y") so the module loader does not
> >> force SLD off.
> > 
> > Some comments below.  But, any objection to taking Xiaoyao's patches that
> > do effectively the same things, minus the MOD_INFO()?  I'll repost them in
> > reply to this thread.
> 
> IIUC they also deal with emulated split-lock accesses in the host, during
> instruction emulation [1]. This seems also to be required, although I am not
> sure the approach that he took once emulation encounters a split-lock is
> robust.

Yep.  It's "robust" in the sense that KVM won't panic the host.  It's not
robust from the perspective that it could possibly hose the guest.  But, no
sane, well-behaved guest should reach that particular emulator path on a
split-lock enabled system.

> [1] https://lore.kernel.org/lkml/20200324151859.31068-5-xiaoyao.li@intel.com/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 15:23   ` [patch v2 " Peter Zijlstra
@ 2020-04-02 16:20     ` Xiaoyao Li
  2020-04-02 16:25       ` Peter Zijlstra
  2020-04-03  8:09     ` David Laight
  1 sibling, 1 reply; 78+ messages in thread
From: Xiaoyao Li @ 2020-04-02 16:20 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Nadav Amit, Thomas Hellstrom, Sean Christopherson,
	Tony Luck, Steven Rostedt

On 4/2/2020 11:23 PM, Peter Zijlstra wrote:
> 
> I picked VMXOFF (which also appears in vmmon.ko) instead of VMXON
> because that latter takes an argument is therefore more difficult to
> decode.
> 
> ---
> Subject: x86,module: Detect VMX modules and disable Split-Lock-Detect
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu, 02 Apr 2020 14:32:59 +0200
> 
> It turns out that with Split-Lock-Detect enabled (default) any VMX
> hypervisor needs at least a little modification in order to not blindly
> inject the #AC into the guest without the guest being ready for it.
> 
> Since there is no telling which module implements a hypervisor, scan the
> module text and look for the VMLAUNCH/VMXOFF instructions. If found, the
> module is assumed to be a hypervisor of some sort and SLD is disabled.
> 
> Hypervisors, which have been modified and are known to work correctly,
> can add:
> 
>    MODULE_INFO(sld_safe, "Y");
> 
> to explicitly tell the module loader they're good.
> 
> NOTE: it is unfortunate that struct load_info is not available to the
>        arch module code, this means CONFIG_CPU_SUP_INTEL gunk is needed
>        in generic code.
> 
> NOTE: while we can 'trivially' fix KVM, we're still stuck with stuff
>        like VMware and VirtualBox doing their own thing.
> 
> Reported-by: "Kenneth R. Crudup" <kenny@panix.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   arch/x86/include/asm/cpu.h  |    2 ++
>   arch/x86/kernel/cpu/intel.c |   41 ++++++++++++++++++++++++++++++++++++++++-
>   arch/x86/kernel/module.c    |    6 ++++++
>   include/linux/module.h      |    4 ++++
>   kernel/module.c             |    5 +++++
>   5 files changed, 57 insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int s
>   extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
>   extern void switch_to_sld(unsigned long tifn);
>   extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
>   #else
>   static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
>   static inline void switch_to_sld(unsigned long tifn) {}
> @@ -51,5 +52,6 @@ static inline bool handle_user_split_loc
>   {
>   	return false;
>   }
> +static inline void split_lock_validate_module_text(struct module *me, void *text, void *text_end) {}
>   #endif
>   #endif /* _ASM_X86_CPU_H */
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -9,6 +9,7 @@
>   #include <linux/thread_info.h>
>   #include <linux/init.h>
>   #include <linux/uaccess.h>
> +#include <linux/module.h>
>   
>   #include <asm/cpufeature.h>
>   #include <asm/pgtable.h>
> @@ -21,6 +22,7 @@
>   #include <asm/elf.h>
>   #include <asm/cpu_device_id.h>
>   #include <asm/cmdline.h>
> +#include <asm/insn.h>
>   
>   #ifdef CONFIG_X86_64
>   #include <linux/topology.h>
> @@ -1055,12 +1057,49 @@ static void sld_update_msr(bool on)
>   {
>   	u64 test_ctrl_val = msr_test_ctrl_cache;
>   
> -	if (on)
> +	if (on && (sld_state != sld_off))
>   		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
>   
>   	wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
>   }
>   
> +static void sld_remote_kill(void *arg)
> +{
> +	sld_update_msr(false);
> +}
> +
> +void split_lock_validate_module_text(struct module *me, void *text, void *text_end)
> +{
> +	u8 vmxoff[] = { 0x0f, 0x01, 0xc4 };
> +	u8 vmlaunch[] = { 0x0f, 0x01, 0xc2 };
> +	struct insn insn;
> +
> +	if (sld_state == sld_off)
> +		return;
> +
> +	while (text < text_end) {
> +		kernel_insn_init(&insn, text, text_end - text);
> +		insn_get_length(&insn);
> +
> +		if (WARN_ON_ONCE(!insn_complete(&insn)))
> +			break;
> +
> +		if (insn.length == 3 &&
> +		    (!memcmp(text, vmlaunch, sizeof(vmlaunch)) ||
> +		     !memcmp(text, vmxoff, sizeof(vmxoff))))
> +				goto bad_module;
> +
> +		text += insn.length;
> +	}
> +
> +	return;
> +
> +bad_module:
> +	pr_warn("disabled due to VMX in module: %s\n", me->name);
> +	sld_state = sld_off;

shouldn't we remove the __ro_after_init of sld_state?

And, shouldn't we clear X86_FEATURE_SPLIT_LOCK_DETECT flag?

> +	on_each_cpu(sld_remote_kill, NULL, 1);
> +}
> +
>   static void split_lock_init(void)
>   {
>   	split_lock_verify_msr(sld_state != sld_off);
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -24,6 +24,7 @@
>   #include <asm/pgtable.h>
>   #include <asm/setup.h>
>   #include <asm/unwind.h>
> +#include <asm/cpu.h>
>   
>   #if 0
>   #define DEBUGP(fmt, ...)				\
> @@ -253,6 +254,11 @@ int module_finalize(const Elf_Ehdr *hdr,
>   					    tseg, tseg + text->sh_size);
>   	}
>   
> +	if (text && !me->sld_safe) {
> +		void *tseg = (void *)text->sh_addr;
> +		split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
> +	}
> +
>   	if (para) {
>   		void *pseg = (void *)para->sh_addr;
>   		apply_paravirt(pseg, pseg + para->sh_size);
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -407,6 +407,10 @@ struct module {
>   	bool sig_ok;
>   #endif
>   
> +#ifdef CONFIG_CPU_SUP_INTEL
> +	bool sld_safe;
> +#endif
> +
>   	bool async_probe_requested;
>   
>   	/* symbols that will be GPL-only in the near future. */
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3096,6 +3096,11 @@ static int check_modinfo(struct module *
>   			"is unknown, you have been warned.\n", mod->name);
>   	}
>   
> +#ifdef CONFIG_CPU_SUP_INTEL
> +	if (get_modinfo(info, "sld_safe"))
> +		mod->sld_safe = true;
> +#endif
> +
>   	err = check_modinfo_livepatch(mod, info);
>   	if (err)
>   		return err;
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 16:20     ` Xiaoyao Li
@ 2020-04-02 16:25       ` Peter Zijlstra
  2020-04-02 16:39         ` Nadav Amit
  2020-04-02 16:41         ` Xiaoyao Li
  0 siblings, 2 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 16:25 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt


Learn to trim your replies already!

On Fri, Apr 03, 2020 at 12:20:08AM +0800, Xiaoyao Li wrote:
> On 4/2/2020 11:23 PM, Peter Zijlstra wrote:

> > +bad_module:
> > +	pr_warn("disabled due to VMX in module: %s\n", me->name);
> > +	sld_state = sld_off;
> 
> shouldn't we remove the __ro_after_init of sld_state?

Oh, that's probably a good idea. I can't actually test this due to no
hardware.

> And, shouldn't we clear X86_FEATURE_SPLIT_LOCK_DETECT flag?

Don't think you can do that this late. Also, the hardware has the MSR
and it works, it's just that we should not.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 16:25       ` Peter Zijlstra
@ 2020-04-02 16:39         ` Nadav Amit
  2020-04-02 16:41         ` Xiaoyao Li
  1 sibling, 0 replies; 78+ messages in thread
From: Nadav Amit @ 2020-04-02 16:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xiaoyao Li, Thomas Gleixner, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

> On Apr 2, 2020, at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> 
> Learn to trim your replies already!
> 
> On Fri, Apr 03, 2020 at 12:20:08AM +0800, Xiaoyao Li wrote:
>> On 4/2/2020 11:23 PM, Peter Zijlstra wrote:
> 
>>> +bad_module:
>>> +	pr_warn("disabled due to VMX in module: %s\n", me->name);
>>> +	sld_state = sld_off;
>> 
>> shouldn't we remove the __ro_after_init of sld_state?
> 
> Oh, that's probably a good idea. I can't actually test this due to no
> hardware.

Just wondering, since I lack hardware as well: can the performance counter
LOCK_CYCLES.SPLIT_LOCK_UC_LOCK_DURATION be used to detect split-locks
similarly to SLD (although it would be after the split-lock event)?


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 16:25       ` Peter Zijlstra
  2020-04-02 16:39         ` Nadav Amit
@ 2020-04-02 16:41         ` Xiaoyao Li
  2020-04-02 17:34           ` Thomas Gleixner
  1 sibling, 1 reply; 78+ messages in thread
From: Xiaoyao Li @ 2020-04-02 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On 4/3/2020 12:25 AM, Peter Zijlstra wrote:
> 
> Learn to trim your replies already!

Sorry.

> On Fri, Apr 03, 2020 at 12:20:08AM +0800, Xiaoyao Li wrote:
>> On 4/2/2020 11:23 PM, Peter Zijlstra wrote:
> 
>>> +bad_module:
>>> +	pr_warn("disabled due to VMX in module: %s\n", me->name);
>>> +	sld_state = sld_off;
>>
>> shouldn't we remove the __ro_after_init of sld_state?
> 
> Oh, that's probably a good idea. I can't actually test this due to no
> hardware.
> 
>> And, shouldn't we clear X86_FEATURE_SPLIT_LOCK_DETECT flag?
> 
> Don't think you can do that this late. Also, the hardware has the MSR
> and it works, it's just that we should not.
> 

Actually, I agree to keep this flag.

But, during the previous patch review, tglx wants to make

	sld_off = no X86_FEATURE_SPLIT_LOCK_DETECT

I'm not sure whether he still insists on it now.

I really want to decouple sld_off and X86_FEATURE_SPLIT_LOCK_DETECT.
So if X86_FEATURE_SPLIT_LOCK_DETECT is set, we can virtualize and expose 
it to guest even when host is sld_off.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage
  2020-04-02 15:30   ` Sean Christopherson
  2020-04-02 15:44     ` Nadav Amit
@ 2020-04-02 16:56     ` Thomas Gleixner
  1 sibling, 0 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 16:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu,
	Xiaoyao Li, Nadav Amit, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra (Intel),
	Jessica Yu, Steven Rostedt

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Thu, Apr 02, 2020 at 02:33:00PM +0200, Thomas Gleixner wrote:
>> Mark the module with MOD_INFO(sld_safe, "Y") so the module loader does not
>> force SLD off.
>
> Some comments below.  But, any objection to taking Xiaoyao's patches that
> do effectively the same things, minus the MOD_INFO()?  I'll repost them in
> reply to this thread.

If they are sane, I don't have a problem. But TBH, I really couldn't be
bothered to actually scan my mails whether there surfaced something sane
by now. Writing that up was just faster :)

I'll have look.

>> +static bool guest_handles_ac(struct kvm_vcpu *vcpu)
>> +{
>> +	/*
>> +	 * If guest has alignment checking enabled in CR0 and activated in
>> +	 * eflags, then the #AC originated from CPL3 and the guest is able
>> +	 * to handle it. It does not matter whether this is a regular or
>> +	 * a split lock operation induced #AC.
>> +	 */
>> +	if (vcpu->arch.cr0 & X86_CR0_AM &&
>
> Technically not required since KVM doesn't let the gets toggle CR0.AM at
> will, but going through kvm_read_cr0{_bits}() is preferred.

You're the expert here.

>> +	    vmx_get_rflags(vcpu) & X86_EFLAGS_AC)
>
> I don't think this is correct.  A guest could trigger a split-lock #AC at
> CPL0 with EFLAGS.AC=1 and CR0.AM=1, and then panic because it didn't expect
> #AC at CPL0.

Indeed.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  2020-04-02 15:55     ` [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM Sean Christopherson
@ 2020-04-02 17:01       ` Thomas Gleixner
  2020-04-02 17:19         ` Sean Christopherson
  0 siblings, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 17:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
>
> In the future, KVM will use handle_user_split_lock() to handle #AC
> caused by split lock in guest. Due to the fact that KVM doesn't have
> a @regs context and will pre-check EFLAGS.AC, move the EFLAGS.AC check
> to do_alignment_check().
>
> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/cpu.h  | 4 ++--
>  arch/x86/kernel/cpu/intel.c | 7 ++++---
>  arch/x86/kernel/traps.c     | 2 +-
>  3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index ff6f3ca649b3..ff567afa6ee1 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -43,11 +43,11 @@ unsigned int x86_stepping(unsigned int sig);
>  #ifdef CONFIG_CPU_SUP_INTEL
>  extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
>  extern void switch_to_sld(unsigned long tifn);
> -extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern bool handle_user_split_lock(unsigned long ip);
>  #else
>  static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
>  static inline void switch_to_sld(unsigned long tifn) {}
> -static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +static inline bool handle_user_split_lock(unsigned long ip)

This is necessary because VMX can be compiled without CPU_SUP_INTEL?

>  {
>  	return false;
>  }
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 9a26e972cdea..7688f51aabdb 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1066,13 +1066,13 @@ static void split_lock_init(void)
>  	split_lock_verify_msr(sld_state != sld_off);
>  }
>  
> -bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +bool handle_user_split_lock(unsigned long ip)
>  {
> -	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +	if (sld_state == sld_fatal)
>  		return false;
>  
>  	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> -			    current->comm, current->pid, regs->ip);
> +			    current->comm, current->pid, ip);

So this returns true even in the case that sld_state == off.

Should never happen, but I rather have an extra check and be both
verbose and correct. See the variant I did.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 15:55     ` [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest Sean Christopherson
@ 2020-04-02 17:19       ` Thomas Gleixner
  2020-04-02 17:40         ` Sean Christopherson
  0 siblings, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 17:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> @@ -4623,6 +4623,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)

I used a different function name intentionally so the check for 'guest
want's split lock #AC' can go there as well once it's sorted.

> +{
> +	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
> +	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
> +}
> +
>  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4688,9 +4694,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		return handle_rmode_exception(vcpu, ex_no, error_code);
>  
>  	switch (ex_no) {
> -	case AC_VECTOR:
> -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> -		return 1;
>  	case DB_VECTOR:
>  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
>  		if (!(vcpu->guest_debug &
> @@ -4719,6 +4722,27 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>  		kvm_run->debug.arch.exception = ex_no;
>  		break;
> +	case AC_VECTOR:
> +		/*
> +		 * Reflect #AC to the guest if it's expecting the #AC, i.e. has
> +		 * legacy alignment check enabled.  Pre-check host split lock
> +		 * turned on to avoid the VMREADs needed to check legacy #AC,
> +		 * i.e. reflect the #AC if the only possible source is legacy
> +		 * alignment checks.
> +		 */
> +		if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ||

I think the right thing to do here is to make this really independent of
that feature, i.e. inject the exception if

 (CPL==3 && CR0.AM && EFLAGS.AC) || (FUTURE && (GUEST_TEST_CTRL & SLD))

iow. when its really clear that the guest asked for it. If there is an
actual #AC with SLD disabled and !(CPL==3 && CR0.AM && EFLAGS.AC) then
something is badly wrong and the thing should just die. That's why I
separated handle_guest_split_lock() and tell about that case.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  2020-04-02 17:01       ` Thomas Gleixner
@ 2020-04-02 17:19         ` Sean Christopherson
  2020-04-02 19:06           ` Thomas Gleixner
  0 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 17:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Peter Zijlstra,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

On Thu, Apr 02, 2020 at 07:01:56PM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> >
> > In the future, KVM will use handle_user_split_lock() to handle #AC
> > caused by split lock in guest. Due to the fact that KVM doesn't have
> > a @regs context and will pre-check EFLAGS.AC, move the EFLAGS.AC check
> > to do_alignment_check().
> >
> > Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/include/asm/cpu.h  | 4 ++--
> >  arch/x86/kernel/cpu/intel.c | 7 ++++---
> >  arch/x86/kernel/traps.c     | 2 +-
> >  3 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> > index ff6f3ca649b3..ff567afa6ee1 100644
> > --- a/arch/x86/include/asm/cpu.h
> > +++ b/arch/x86/include/asm/cpu.h
> > @@ -43,11 +43,11 @@ unsigned int x86_stepping(unsigned int sig);
> >  #ifdef CONFIG_CPU_SUP_INTEL
> >  extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
> >  extern void switch_to_sld(unsigned long tifn);
> > -extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> > +extern bool handle_user_split_lock(unsigned long ip);
> >  #else
> >  static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> >  static inline void switch_to_sld(unsigned long tifn) {}
> > -static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> > +static inline bool handle_user_split_lock(unsigned long ip)
> 
> This is necessary because VMX can be compiled without CPU_SUP_INTEL?

Ya, it came about when cleaning up the IA32_FEATURE_CONTROL MSR handling
to consolidate duplicate code.

config KVM_INTEL
        tristate "KVM for Intel (and compatible) processors support"
        depends on KVM && IA32_FEAT_CTL

config IA32_FEAT_CTL
        def_bool y
        depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 16:41         ` Xiaoyao Li
@ 2020-04-02 17:34           ` Thomas Gleixner
  2020-04-02 17:51             ` Sean Christopherson
  0 siblings, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 17:34 UTC (permalink / raw)
  To: Xiaoyao Li, Peter Zijlstra
  Cc: LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Nadav Amit, Thomas Hellstrom, Sean Christopherson,
	Tony Luck, Steven Rostedt

Xiaoyao Li <xiaoyao.li@intel.com> writes:
> On 4/3/2020 12:25 AM, Peter Zijlstra wrote:
>> On Fri, Apr 03, 2020 at 12:20:08AM +0800, Xiaoyao Li wrote:
>>> And, shouldn't we clear X86_FEATURE_SPLIT_LOCK_DETECT flag?
>> 
>> Don't think you can do that this late. Also, the hardware has the MSR
>> and it works, it's just that we should not.
>> 
>
> Actually, I agree to keep this flag.
>
> But, during the previous patch review, tglx wants to make
>
> 	sld_off = no X86_FEATURE_SPLIT_LOCK_DETECT
>
> I'm not sure whether he still insists on it now.

Obviously I cant.

> I really want to decouple sld_off and X86_FEATURE_SPLIT_LOCK_DETECT.
> So if X86_FEATURE_SPLIT_LOCK_DETECT is set, we can virtualize and expose 
> it to guest even when host is sld_off.

Can we first have a sane solution for the problem at hand?

Aside of that I'm still against the attempt of proliferating crap,
i.e. disabling it because the host is triggering it and then exposing it
to guests. The above does not change my mind in any way. This proposal
is still wrong.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 17:19       ` Thomas Gleixner
@ 2020-04-02 17:40         ` Sean Christopherson
  2020-04-02 20:07           ` Thomas Gleixner
  0 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 17:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Peter Zijlstra,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

On Thu, Apr 02, 2020 at 07:19:44PM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > @@ -4623,6 +4623,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
> >  	return 1;
> >  }
> >  
> > +static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
> 
> I used a different function name intentionally so the check for 'guest
> want's split lock #AC' can go there as well once it's sorted.

Heh, IIRC, I advised Xiaoyao to do the opposite so that the injection logic
in the #AC case statement was more or less complete without having to dive
into the helper, e.g. the resulting code looks like this once split-lock is
exposed to the guest:

	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ||
	    guest_cpu_alignment_check_enabled(vcpu) ||
	    guest_cpu_sld_on(vmx)) {
		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
		return 1;
	}

> > +{
> > +	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
> > +	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
> > +}
> > +
> >  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -4688,9 +4694,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >  		return handle_rmode_exception(vcpu, ex_no, error_code);
> >  
> >  	switch (ex_no) {
> > -	case AC_VECTOR:
> > -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> > -		return 1;
> >  	case DB_VECTOR:
> >  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
> >  		if (!(vcpu->guest_debug &
> > @@ -4719,6 +4722,27 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >  		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
> >  		kvm_run->debug.arch.exception = ex_no;
> >  		break;
> > +	case AC_VECTOR:
> > +		/*
> > +		 * Reflect #AC to the guest if it's expecting the #AC, i.e. has
> > +		 * legacy alignment check enabled.  Pre-check host split lock
> > +		 * turned on to avoid the VMREADs needed to check legacy #AC,
> > +		 * i.e. reflect the #AC if the only possible source is legacy
> > +		 * alignment checks.
> > +		 */
> > +		if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ||
> 
> I think the right thing to do here is to make this really independent of
> that feature, i.e. inject the exception if
> 
>  (CPL==3 && CR0.AM && EFLAGS.AC) || (FUTURE && (GUEST_TEST_CTRL & SLD))
> 
> iow. when its really clear that the guest asked for it. If there is an
> actual #AC with SLD disabled and !(CPL==3 && CR0.AM && EFLAGS.AC) then
> something is badly wrong and the thing should just die. That's why I
> separated handle_guest_split_lock() and tell about that case.

That puts KVM in a weird spot if/when intercepting #AC is no longer
necessary, e.g. "if" future CPUs happen to gain a feature that traps into
the hypervisor (KVM) if a potential near-infinite ucode loop is detected.

The only reason KVM intercepts #AC (before split-lock) is to prevent a
malicious guest from executing a DoS attack on the host by putting the #AC
handler in ring 3.  Current CPUs will get stuck in ucode vectoring #AC
faults more or less indefinitely, e.g. long enough to trigger watchdogs in
the host.

Injecting #AC if and only if KVM is 100% certain the guest wants the #AC
would lead to divergent behavior if KVM chose to not intercept #AC, e.g.
some theoretical unknown #AC source would conditionally result in exits to
userspace depending on whether or not KVM wanted to intercept #AC for
other reasons.

That's why we went with the approach of reflecting #AC unless KVM detected
that the #AC was host-induced.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 17:34           ` Thomas Gleixner
@ 2020-04-02 17:51             ` Sean Christopherson
  2020-04-02 18:51               ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 17:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Xiaoyao Li, Peter Zijlstra, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 07:34:35PM +0200, Thomas Gleixner wrote:
> Aside of that I'm still against the attempt of proliferating crap,
> i.e. disabling it because the host is triggering it and then exposing it
> to guests. The above does not change my mind in any way. This proposal
> is still wrong.

Eh, I still think the "off in host, on in guest" is a legit scenario for
debug/development/testing, but I agree that the added complexity doesn't
justify the minimal benefits versus sld_warn.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 17:51             ` Sean Christopherson
@ 2020-04-02 18:51               ` Peter Zijlstra
  2020-04-02 20:23                 ` Sean Christopherson
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 18:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Xiaoyao Li, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 10:51:28AM -0700, Sean Christopherson wrote:
> On Thu, Apr 02, 2020 at 07:34:35PM +0200, Thomas Gleixner wrote:
> > Aside of that I'm still against the attempt of proliferating crap,
> > i.e. disabling it because the host is triggering it and then exposing it
> > to guests. The above does not change my mind in any way. This proposal
> > is still wrong.
> 
> Eh, I still think the "off in host, on in guest" is a legit scenario for
> debug/development/testing, but I agree that the added complexity doesn't
> justify the minimal benefits versus sld_warn.

Off in host on in guest seems utterly insane to me. Why do you care
about that?

That's like building a bridge with rotten timber.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  2020-04-02 17:19         ` Sean Christopherson
@ 2020-04-02 19:06           ` Thomas Gleixner
  2020-04-10  4:39             ` Xiaoyao Li
  0 siblings, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 19:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Peter Zijlstra,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Thu, Apr 02, 2020 at 07:01:56PM +0200, Thomas Gleixner wrote:
>> >  static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
>> >  static inline void switch_to_sld(unsigned long tifn) {}
>> > -static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>> > +static inline bool handle_user_split_lock(unsigned long ip)
>> 
>> This is necessary because VMX can be compiled without CPU_SUP_INTEL?
>
> Ya, it came about when cleaning up the IA32_FEATURE_CONTROL MSR handling
> to consolidate duplicate code.
>
> config KVM_INTEL
>         tristate "KVM for Intel (and compatible) processors support"
>         depends on KVM && IA32_FEAT_CTL
>
> config IA32_FEAT_CTL
>         def_bool y
>         depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN

Ah, indeed. So something like the below would make sense. Hmm?

Of course that can be mangled into Xiaoyao's patches, I'm not worried
about my patch count :)

Aside of that I really wish Intel HW folks had indicated the source of
the #AC via the error code. It can only be 0 or 1 for the regular #AC so
there would have been 31 bits to chose from.

Thanks,

        tglx

8<----------------
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -43,14 +43,14 @@ unsigned int x86_stepping(unsigned int s
 #ifdef CONFIG_CPU_SUP_INTEL
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
-extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern int handle_ac_split_lock(unsigned long ip);
 extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
-static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+static int handle_ac_split_lock(unsigned long ip)
 {
-	return false;
+	return -ENOSYS;
 }
 static inline void split_lock_validate_module_text(struct module *me, void *text, void *text_end) {}
 #endif

--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1102,13 +1102,20 @@ static void split_lock_init(void)
 	split_lock_verify_msr(sld_state != sld_off);
 }
 
-bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+int handle_ac_split_lock(unsigned long ip)
 {
-	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
-		return false;
+	switch (sld_state) {
+	case sld_warn:
+		break;
+	case sld_off:
+		pr_warn_once("#AC: Spurious trap at address: 0x%lx\n", ip);
+		return -ENOSYS;
+	case sld_fatal:
+		return -EFAULT;
+	}
 
 	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
-			    current->comm, current->pid, regs->ip);
+			    current->comm, current->pid, ip);
 
 	/*
 	 * Disable the split lock detection for this task so it can make
@@ -1117,8 +1124,9 @@ bool handle_user_split_lock(struct pt_re
 	 */
 	sld_update_msr(false);
 	set_tsk_thread_flag(current, TIF_SLD);
-	return true;
+	return 0;
 }
+EXPORT_SYMBOL_GPL(handle_ac_split_lock);
 
 /*
  * This function is called only when switching between tasks with
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -304,7 +304,7 @@ dotraplinkage void do_alignment_check(st
 
 	local_irq_enable();
 
-	if (handle_user_split_lock(regs, error_code))
+	if (!(regs->flags & X86_EFLAGS_AC) && !handle_ac_split_lock(regs->ip))
 		return;
 
 	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -65,6 +65,7 @@
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
+MODULE_INFO(sld_safe, "Y");
 
 #ifdef MODULE
 static const struct x86_cpu_id vmx_cpu_id[] = {
@@ -4623,6 +4624,22 @@ static int handle_machine_check(struct k
 	return 1;
 }
 
+static bool guest_handles_ac(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * If guest has alignment checking enabled in CR0 and activated in
+	 * eflags, then the #AC originated from CPL3 and the guest is able
+	 * to handle it. It does not matter whether this is a regular or
+	 * a split lock operation induced #AC.
+	 */
+	if (vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
+	    kvm_get_rflags(vcpu) & X86_EFLAGS_AC)
+		return true;
+
+	/* Add guest SLD handling checks here once it's supported */
+	return false;
+}
+
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4630,6 +4647,7 @@ static int handle_exception_nmi(struct k
 	u32 intr_info, ex_no, error_code;
 	unsigned long cr2, rip, dr6;
 	u32 vect_info;
+	int err;
 
 	vect_info = vmx->idt_vectoring_info;
 	intr_info = vmx->exit_intr_info;
@@ -4688,9 +4706,6 @@ static int handle_exception_nmi(struct k
 		return handle_rmode_exception(vcpu, ex_no, error_code);
 
 	switch (ex_no) {
-	case AC_VECTOR:
-		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
-		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
@@ -4719,6 +4734,29 @@ static int handle_exception_nmi(struct k
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
 		break;
+	case AC_VECTOR:
+		if (guest_handles_ac(vcpu)) {
+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			return 1;
+		}
+		/*
+		 * Handle #AC caused by split lock detection. If the host
+		 * mode is sld_warn, then it warns, marks current with
+		 * TIF_SLD and disables split lock detection. So the guest
+		 * can just continue.
+		 *
+		 * If the host mode is fatal, the handling code warned. Let
+		 * qemu kill itself.
+		 *
+		 * If the host mode is off, then this #AC is bonkers and
+		 * something is badly wrong. Let it fail as well.
+		 */
+		err = handle_ac_split_lock(kvm_rip_read(vcpu));
+		if (!err)
+			return 1;
+		/* Propagate the error type to user space */
+		error_code = err == -EFAULT ? 0x100 : 0x200;
+		fallthrough;
 	default:
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 17:40         ` Sean Christopherson
@ 2020-04-02 20:07           ` Thomas Gleixner
  2020-04-02 20:36             ` Andy Lutomirski
                               ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 20:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Peter Zijlstra,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

Sean,

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Thu, Apr 02, 2020 at 07:19:44PM +0200, Thomas Gleixner wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> > +	case AC_VECTOR:
>> > +		/*
>> > +		 * Reflect #AC to the guest if it's expecting the #AC, i.e. has
>> > +		 * legacy alignment check enabled.  Pre-check host split lock
>> > +		 * turned on to avoid the VMREADs needed to check legacy #AC,
>> > +		 * i.e. reflect the #AC if the only possible source is legacy
>> > +		 * alignment checks.
>> > +		 */
>> > +		if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ||
>> 
>> I think the right thing to do here is to make this really independent of
>> that feature, i.e. inject the exception if
>> 
>>  (CPL==3 && CR0.AM && EFLAGS.AC) || (FUTURE && (GUEST_TEST_CTRL & SLD))
>> 
>> iow. when its really clear that the guest asked for it. If there is an
>> actual #AC with SLD disabled and !(CPL==3 && CR0.AM && EFLAGS.AC) then
>> something is badly wrong and the thing should just die. That's why I
>> separated handle_guest_split_lock() and tell about that case.
>
> That puts KVM in a weird spot if/when intercepting #AC is no longer
> necessary, e.g. "if" future CPUs happen to gain a feature that traps into
> the hypervisor (KVM) if a potential near-infinite ucode loop is detected.
>
> The only reason KVM intercepts #AC (before split-lock) is to prevent a
> malicious guest from executing a DoS attack on the host by putting the #AC
> handler in ring 3.  Current CPUs will get stuck in ucode vectoring #AC
> faults more or less indefinitely, e.g. long enough to trigger watchdogs in
> the host.

Which is thankfully well documented in the VMX code and the
corresponding chapter in the SDM. 

> Injecting #AC if and only if KVM is 100% certain the guest wants the #AC
> would lead to divergent behavior if KVM chose to not intercept #AC, e.g.

AFAICT, #AC is not really something which is performance relevant, but I
might obviously be uninformed on that.

Assumed it is not, then there is neither a hard requirement nor a real
incentive to give up on intercepting #AC even when future CPUs have a
fix for the above wreckage.

> some theoretical unknown #AC source would conditionally result in exits to
> userspace depending on whether or not KVM wanted to intercept #AC for
> other reasons.

I'd rather like to know when there is an unknown #AC source instead of
letting the guest silently swallow it.

TBH, the more I learn about this, the more I tend to just give up on
this whole split lock stuff in its current form and wait until HW folks
provide something which is actually usable:

   - Per thread
   - Properly distinguishable from a regular #AC via error code

OTOH, that means I won't be able to use it before retirement. Oh well.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 18:51               ` Peter Zijlstra
@ 2020-04-02 20:23                 ` Sean Christopherson
  2020-04-02 21:04                   ` Thomas Gleixner
  0 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 20:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Xiaoyao Li, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 08:51:48PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 02, 2020 at 10:51:28AM -0700, Sean Christopherson wrote:
> > On Thu, Apr 02, 2020 at 07:34:35PM +0200, Thomas Gleixner wrote:
> > > Aside of that I'm still against the attempt of proliferating crap,
> > > i.e. disabling it because the host is triggering it and then exposing it
> > > to guests. The above does not change my mind in any way. This proposal
> > > is still wrong.
> > 
> > Eh, I still think the "off in host, on in guest" is a legit scenario for
> > debug/development/testing, but I agree that the added complexity doesn't
> > justify the minimal benefits versus sld_warn.
> 
> Off in host on in guest seems utterly insane to me. Why do you care
> about that?

For development/debug/testing.  Ignoring the core-scope stupidity of split
lock, the _functional_ behavior of the host kernel and guest kernel are
completely separate.  The host can generate split locks all it wants, but
other than performance, its bad behavior has no impact on the guest.

For example, all of the debug that was done to eliminate split locks in the
kernel could have been done in a KVM guest, even though the host kernel
would not have yet been split-lock free.

It's somewhat of a moot point now that the kernel is split-lock free.  But,
if I encountered a split lock panic on my system, the first thing I would
do (after rebooting) would be to fire up a VM to try and reproduce and
debug the issue.

Oftentimes it's significantly easier to "enable" a feature in KVM, i.e.
expose a feature to the guest, than it is to actually enable it in the
kernel.  Enabling KVM first doesn't work if there are hard dependencies on
kernel enabling, e.g. most things that have an XSAVE component, but for a
lot of features it's a viable strategy to enable KVM first, and then do all
testing and debug inside a KVM guest.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 20:07           ` Thomas Gleixner
@ 2020-04-02 20:36             ` Andy Lutomirski
  2020-04-02 20:48             ` Peter Zijlstra
  2020-04-02 20:51             ` Sean Christopherson
  2 siblings, 0 replies; 78+ messages in thread
From: Andy Lutomirski @ 2020-04-02 20:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sean Christopherson, x86, Kenneth R . Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, linux-kernel



> On Apr 2, 2020, at 1:07 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 

> 
> 
> TBH, the more I learn about this, the more I tend to just give up on
> this whole split lock stuff in its current form and wait until HW folks
> provide something which is actually usable:
> 
>   - Per thread
>   - Properly distinguishable from a regular #AC via error code

Why the latter?  I would argue that #AC from CPL3 with EFLAGS.AC set is almost by construction not a split lock. In particular, if you meet these conditions, how exactly can you do a split lock without simultaneously triggering an alignment check?  (Maybe CMPXCHG16B?

> 
> OTOH, that means I won't be able to use it before retirement. Oh well.
> 
> Thanks,
> 
>        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 20:07           ` Thomas Gleixner
  2020-04-02 20:36             ` Andy Lutomirski
@ 2020-04-02 20:48             ` Peter Zijlstra
  2020-04-02 20:51             ` Sean Christopherson
  2 siblings, 0 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-02 20:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sean Christopherson, x86, Kenneth R . Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom, Tony Luck,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

On Thu, Apr 02, 2020 at 10:07:07PM +0200, Thomas Gleixner wrote:

If we're doing wish lists, I have one more ;-)

> TBH, the more I learn about this, the more I tend to just give up on
> this whole split lock stuff in its current form and wait until HW folks
> provide something which is actually usable:
> 
>    - Per thread
>    - Properly distinguishable from a regular #AC via error code

   - is an actual alignment check.

That is, disallow all unaligned LOCK prefixed ops, not just those that
happen to straddle a cacheline.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 20:07           ` Thomas Gleixner
  2020-04-02 20:36             ` Andy Lutomirski
  2020-04-02 20:48             ` Peter Zijlstra
@ 2020-04-02 20:51             ` Sean Christopherson
  2020-04-02 22:27               ` Thomas Gleixner
  2 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 20:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Peter Zijlstra,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

On Thu, Apr 02, 2020 at 10:07:07PM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > On Thu, Apr 02, 2020 at 07:19:44PM +0200, Thomas Gleixner wrote:
> >> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > That puts KVM in a weird spot if/when intercepting #AC is no longer
> > necessary, e.g. "if" future CPUs happen to gain a feature that traps into
> > the hypervisor (KVM) if a potential near-infinite ucode loop is detected.
> >
> > The only reason KVM intercepts #AC (before split-lock) is to prevent a
> > malicious guest from executing a DoS attack on the host by putting the #AC
> > handler in ring 3.  Current CPUs will get stuck in ucode vectoring #AC
> > faults more or less indefinitely, e.g. long enough to trigger watchdogs in
> > the host.
> 
> Which is thankfully well documented in the VMX code and the
> corresponding chapter in the SDM. 
> 
> > Injecting #AC if and only if KVM is 100% certain the guest wants the #AC
> > would lead to divergent behavior if KVM chose to not intercept #AC, e.g.
> 
> AFAICT, #AC is not really something which is performance relevant, but I
> might obviously be uninformed on that.
> 
> Assumed it is not, then there is neither a hard requirement nor a real
> incentive to give up on intercepting #AC even when future CPUs have a
> fix for the above wreckage.

Agreed that there's no hard requirement, but general speaking, the less KVM
needs to poke into the guest the better.

> > some theoretical unknown #AC source would conditionally result in exits to
> > userspace depending on whether or not KVM wanted to intercept #AC for
> > other reasons.
> 
> I'd rather like to know when there is an unknown #AC source instead of
> letting the guest silently swallow it.

Trying to prevent the guest from squashing a spurious fault is a fools
errand.   For example, with nested virtualization, the correct behavior
from an architectural perspective is to forward exceptions from L2 (the
nested VM) to L1 (the direct VM) that L1 wants to intercept.  E.g. if L1
wants to intercept #AC faults that happen in L2, then KVM reflects all #AC
faults into L1 as VM-Exits without ever reaching this code.

Nested virt aside, detecting spurious #AC and a few other exceptions is
mostly feasible, but for many exceptions it's flat out impossible.

Anyways, this particular case isn't a sticking point, i.e. I'd be ok with
exiting to userspace on a spurious #AC, I just don't see the value in doing
so.  Returning KVM_EXIT_EXCEPTION doesn't necessarily equate to throwing up
a red flag, e.g. from a kernel perspective you'd still be relying on the
userspace VMM to report the error in a sane manner.  I think at one point
Xiaoyao had a WARN_ON for a spurious #AC, but it was removed because the
odds of a false positive due to some funky corner case seemed higher than
detecting a CPU bug.

> TBH, the more I learn about this, the more I tend to just give up on
> this whole split lock stuff in its current form and wait until HW folks
> provide something which is actually usable:
> 
>    - Per thread
>    - Properly distinguishable from a regular #AC via error code
> 
> OTOH, that means I won't be able to use it before retirement. Oh well.
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 20:23                 ` Sean Christopherson
@ 2020-04-02 21:04                   ` Thomas Gleixner
  2020-04-02 21:16                     ` Sean Christopherson
  0 siblings, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 21:04 UTC (permalink / raw)
  To: Sean Christopherson, Peter Zijlstra
  Cc: Xiaoyao Li, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Nadav Amit, Thomas Hellstrom, Tony Luck,
	Steven Rostedt

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Thu, Apr 02, 2020 at 08:51:48PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 02, 2020 at 10:51:28AM -0700, Sean Christopherson wrote:
>> > On Thu, Apr 02, 2020 at 07:34:35PM +0200, Thomas Gleixner wrote:
>> > > Aside of that I'm still against the attempt of proliferating crap,
>> > > i.e. disabling it because the host is triggering it and then exposing it
>> > > to guests. The above does not change my mind in any way. This proposal
>> > > is still wrong.
>> > 
>> > Eh, I still think the "off in host, on in guest" is a legit scenario for
>> > debug/development/testing, but I agree that the added complexity doesn't
>> > justify the minimal benefits versus sld_warn.
>> 
>> Off in host on in guest seems utterly insane to me. Why do you care
>> about that?
>
> For development/debug/testing.  Ignoring the core-scope stupidity of split
> lock, the _functional_ behavior of the host kernel and guest kernel are
> completely separate.  The host can generate split locks all it wants, but
> other than performance, its bad behavior has no impact on the guest.
>
> For example, all of the debug that was done to eliminate split locks in the
> kernel could have been done in a KVM guest, even though the host kernel
> would not have yet been split-lock free.
>
> It's somewhat of a moot point now that the kernel is split-lock free.  But,
> if I encountered a split lock panic on my system, the first thing I would
> do (after rebooting) would be to fire up a VM to try and reproduce and
> debug the issue.
>
> Oftentimes it's significantly easier to "enable" a feature in KVM, i.e.
> expose a feature to the guest, than it is to actually enable it in the
> kernel.  Enabling KVM first doesn't work if there are hard dependencies on
> kernel enabling, e.g. most things that have an XSAVE component, but for a
> lot of features it's a viable strategy to enable KVM first, and then do all
> testing and debug inside a KVM guest.

I can see that aspect, but there were pretty clear messages in one of
the other threads:

 "It's not about whether or not host is clean. It's for the cases that
  users just don't want it enabled on host, to not break the
  applications or drivers that do have split lock issue."

 "My thought is for CSPs that they might not turn on SLD on their
  product environment. Any split lock in kernel or drivers may break
  their service for tenants."

which I back then called out as proliferating crap and ensuring that
this stuff never gets fixed.

I still call it out as exactly that and you know as well as I do that
this is the reality.

For people like you who actually want to debug stuff in a guest, the
extra 10 lines of hack on top of the other 1000 lines of hacks you
already have are not really something which justifies to give hardware
and OS/application vendors the easy way out to avoid fixing their broken
crap.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 21:04                   ` Thomas Gleixner
@ 2020-04-02 21:16                     ` Sean Christopherson
  0 siblings, 0 replies; 78+ messages in thread
From: Sean Christopherson @ 2020-04-02 21:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Xiaoyao Li, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 11:04:05PM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > On Thu, Apr 02, 2020 at 08:51:48PM +0200, Peter Zijlstra wrote:
> >> On Thu, Apr 02, 2020 at 10:51:28AM -0700, Sean Christopherson wrote:
> >> > On Thu, Apr 02, 2020 at 07:34:35PM +0200, Thomas Gleixner wrote:
> >> > > Aside of that I'm still against the attempt of proliferating crap,
> >> > > i.e. disabling it because the host is triggering it and then exposing it
> >> > > to guests. The above does not change my mind in any way. This proposal
> >> > > is still wrong.
> >> > 
> >> > Eh, I still think the "off in host, on in guest" is a legit scenario for
> >> > debug/development/testing, but I agree that the added complexity doesn't
> >> > justify the minimal benefits versus sld_warn.
> >> 
> >> Off in host on in guest seems utterly insane to me. Why do you care
> >> about that?
> >
> > For development/debug/testing.  Ignoring the core-scope stupidity of split
> > lock, the _functional_ behavior of the host kernel and guest kernel are
> > completely separate.  The host can generate split locks all it wants, but
> > other than performance, its bad behavior has no impact on the guest.
> >
> > For example, all of the debug that was done to eliminate split locks in the
> > kernel could have been done in a KVM guest, even though the host kernel
> > would not have yet been split-lock free.
> >
> > It's somewhat of a moot point now that the kernel is split-lock free.  But,
> > if I encountered a split lock panic on my system, the first thing I would
> > do (after rebooting) would be to fire up a VM to try and reproduce and
> > debug the issue.
> >
> > Oftentimes it's significantly easier to "enable" a feature in KVM, i.e.
> > expose a feature to the guest, than it is to actually enable it in the
> > kernel.  Enabling KVM first doesn't work if there are hard dependencies on
> > kernel enabling, e.g. most things that have an XSAVE component, but for a
> > lot of features it's a viable strategy to enable KVM first, and then do all
> > testing and debug inside a KVM guest.
> 
> I can see that aspect, but there were pretty clear messages in one of
> the other threads:
> 
>  "It's not about whether or not host is clean. It's for the cases that
>   users just don't want it enabled on host, to not break the
>   applications or drivers that do have split lock issue."
> 
>  "My thought is for CSPs that they might not turn on SLD on their
>   product environment. Any split lock in kernel or drivers may break
>   their service for tenants."
> 
> which I back then called out as proliferating crap and ensuring that
> this stuff never gets fixed.

Or more likely, gets fixed, just not in upstream :-)

> I still call it out as exactly that and you know as well as I do that
> this is the reality.
>
> For people like you who actually want to debug stuff in a guest, the
> extra 10 lines of hack on top of the other 1000 lines of hacks you
> already have are not really something which justifies to give hardware
> and OS/application vendors the easy way out to avoid fixing their broken
> crap.

Ya, I see where you're coming from.  As above, I agree that having a "KVM
only" mode does more harm than good.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 20:51             ` Sean Christopherson
@ 2020-04-02 22:27               ` Thomas Gleixner
  2020-04-02 22:40                 ` Nadav Amit
  0 siblings, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 22:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Peter Zijlstra,
	Jessica Yu, Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, kvm, linux-kernel

Sean,

Sean Christopherson <sean.j.christopherson@intel.com> writes:
> On Thu, Apr 02, 2020 at 10:07:07PM +0200, Thomas Gleixner wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> AFAICT, #AC is not really something which is performance relevant, but I
>> might obviously be uninformed on that.
>> 
>> Assumed it is not, then there is neither a hard requirement nor a real
>> incentive to give up on intercepting #AC even when future CPUs have a
>> fix for the above wreckage.
>
> Agreed that there's no hard requirement, but general speaking, the less KVM
> needs to poke into the guest the better.

Fair enough.

>> > some theoretical unknown #AC source would conditionally result in exits to
>> > userspace depending on whether or not KVM wanted to intercept #AC for
>> > other reasons.
>> 
>> I'd rather like to know when there is an unknown #AC source instead of
>> letting the guest silently swallow it.
>
> Trying to prevent the guest from squashing a spurious fault is a fools
> errand.   For example, with nested virtualization, the correct behavior
> from an architectural perspective is to forward exceptions from L2 (the
> nested VM) to L1 (the direct VM) that L1 wants to intercept.  E.g. if L1
> wants to intercept #AC faults that happen in L2, then KVM reflects all #AC
> faults into L1 as VM-Exits without ever reaching this code.

Which means L1 should have some handling for that case at least those L1
hypervisors which we can fix. If we want to go there.

> Anyways, this particular case isn't a sticking point, i.e. I'd be ok with
> exiting to userspace on a spurious #AC, I just don't see the value in doing
> so.  Returning KVM_EXIT_EXCEPTION doesn't necessarily equate to throwing up
> a red flag, e.g. from a kernel perspective you'd still be relying on the
> userspace VMM to report the error in a sane manner.  I think at one point
> Xiaoyao had a WARN_ON for a spurious #AC, but it was removed because the
> odds of a false positive due to some funky corner case seemed higher than
> detecting a CPU bug.

Agreed. Relying on the user space side to crash and burn the guest is
probably wishful thinking. So the right thing might be to just kill it
right at the spot.

But coming back to the above discussion:

    if (!cpu_has(SLD) || guest_wants_regular_ac()) {
    	kvm_queue_exception_e();
        return 1;
    }

vs.

    if (guest_wants_regular_ac()) {
    	kvm_queue_exception_e();
        return 1;
    }

The only situation where this makes a difference is when the CPU does
not support SLD. If it does the thing became ambiguous today.

With my previous attempt to bring sanity into this by not setting the
feature flag when SLD is disabled at the command line, the check is
consistent.

But the detection of unaware hypervisors with the module scan brings us
into a situation where we have sld_state == sld_off and the feature flag
being set because we can't undo it anymore.

So now you load VMWare which disables SLD, but the feature bit stays and
then when you unload it and load VMX afterwards then you have exactly
the same situation as with the feature check removed. Consistency gone.

So with that we might want to go back to the sld_state check instead of
the cpu feature check, but that does not make it more consistent:

  As I just verified, it's possible to load the vmware module parallel
  to the KVM/VMX one.

So either we deal with it in some way or just decide that SLD and HV
modules which do not have the MOD_INFO(sld_safe) magic cannot be loaded
when SLD is enabled on the host. I'm fine with the latter :)

What a mess.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 22:27               ` Thomas Gleixner
@ 2020-04-02 22:40                 ` Nadav Amit
  2020-04-02 23:03                   ` Thomas Gleixner
  2020-04-02 23:08                   ` Steven Rostedt
  0 siblings, 2 replies; 78+ messages in thread
From: Nadav Amit @ 2020-04-02 22:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sean Christopherson, x86, Kenneth R . Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, LKML, Doug Covelli

> On Apr 2, 2020, at 3:27 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>  As I just verified, it's possible to load the vmware module parallel
>  to the KVM/VMX one.
> 
> So either we deal with it in some way or just decide that SLD and HV
> modules which do not have the MOD_INFO(sld_safe) magic cannot be loaded
> when SLD is enabled on the host. I'm fine with the latter :)
> 
> What a mess.

[ +Doug ]

Just to communicate the information that was given to me: we do intend to
fix the SLD issue in VMware and if needed to release a minor version that
addresses it. Having said that, there are other hypervisors, such as
virtualbox or jailhouse, which would have a similar issue.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 22:40                 ` Nadav Amit
@ 2020-04-02 23:03                   ` Thomas Gleixner
  2020-04-02 23:08                   ` Steven Rostedt
  1 sibling, 0 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-02 23:03 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Sean Christopherson, x86, Kenneth R . Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Thomas Hellstrom, Tony Luck,
	Peter Zijlstra, Jessica Yu, Steven Rostedt, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, LKML, Doug Covelli

Nadav Amit <namit@vmware.com> writes:
> Just to communicate the information that was given to me: we do intend to
> fix the SLD issue in VMware and if needed to release a minor version that
> addresses it. Having said that, there are other hypervisors, such as
> virtualbox or jailhouse, which would have a similar issue.

I'm well aware of that and even if VMWare fixes this, this still will
trip up users which fail to install updates for one reason or the other
and leave them puzzled. Maybe we just should not care at all.

Despite that I might have mentioned it before: What a mess ...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 22:40                 ` Nadav Amit
  2020-04-02 23:03                   ` Thomas Gleixner
@ 2020-04-02 23:08                   ` Steven Rostedt
  2020-04-02 23:16                     ` Kenneth R. Crudup
  1 sibling, 1 reply; 78+ messages in thread
From: Steven Rostedt @ 2020-04-02 23:08 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, Sean Christopherson, x86, Kenneth R . Crudup,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Peter Zijlstra, Jessica Yu, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, LKML, Doug Covelli

On Thu, 2 Apr 2020 22:40:03 +0000
Nadav Amit <namit@vmware.com> wrote:

> > On Apr 2, 2020, at 3:27 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> >  As I just verified, it's possible to load the vmware module parallel
> >  to the KVM/VMX one.
> > 
> > So either we deal with it in some way or just decide that SLD and HV
> > modules which do not have the MOD_INFO(sld_safe) magic cannot be loaded
> > when SLD is enabled on the host. I'm fine with the latter :)
> > 
> > What a mess.  
> 
> [ +Doug ]
> 
> Just to communicate the information that was given to me: we do intend to
> fix the SLD issue in VMware and if needed to release a minor version that
> addresses it. Having said that, there are other hypervisors, such as
> virtualbox or jailhouse, which would have a similar issue.

If we go the approach of not letting VM modules load if it doesn't have the
sld_safe flag set, how is this different than a VM module not loading due
to kabi breakage?

If we prevent it from loading (and keeping from having to go into this
inconsistent state that Thomas described), it would encourage people to get
the latest modules, and the maintainers of said modules motivation to
update them.

-- Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 23:08                   ` Steven Rostedt
@ 2020-04-02 23:16                     ` Kenneth R. Crudup
  2020-04-02 23:18                       ` Jim Mattson
  0 siblings, 1 reply; 78+ messages in thread
From: Kenneth R. Crudup @ 2020-04-02 23:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nadav Amit, Thomas Gleixner, Sean Christopherson, x86,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Peter Zijlstra, Jessica Yu, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, kvm, LKML, Doug Covelli


On Thu, 2 Apr 2020, Steven Rostedt wrote:

> If we go the approach of not letting VM modules load if it doesn't have the
> sld_safe flag set, how is this different than a VM module not loading due
> to kabi breakage?

Why not a compromise: if such a module is attempted to be loaded, print up
a message saying something akin to "turn the parameter 'split_lock_detect'
off" as we reject loading it- and if we see that we've booted with it off
just splat a WARN_ON() if someone tries to load such modules?

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 23:16                     ` Kenneth R. Crudup
@ 2020-04-02 23:18                       ` Jim Mattson
  2020-04-03 12:16                         ` Thomas Gleixner
  0 siblings, 1 reply; 78+ messages in thread
From: Jim Mattson @ 2020-04-02 23:18 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Steven Rostedt, Nadav Amit, Thomas Gleixner, Sean Christopherson,
	x86, Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Peter Zijlstra, Jessica Yu, Vitaly Kuznetsov,
	Wanpeng Li, kvm, LKML, Doug Covelli

On Thu, Apr 2, 2020 at 4:16 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Thu, 2 Apr 2020, Steven Rostedt wrote:
>
> > If we go the approach of not letting VM modules load if it doesn't have the
> > sld_safe flag set, how is this different than a VM module not loading due
> > to kabi breakage?
>
> Why not a compromise: if such a module is attempted to be loaded, print up
> a message saying something akin to "turn the parameter 'split_lock_detect'
> off" as we reject loading it- and if we see that we've booted with it off
> just splat a WARN_ON() if someone tries to load such modules?

What modules are we talking about? I thought we were discussing L1
hypervisors, which are just binary blobs. The only modules at the L0
level are kvm and kvm_intel.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
  2020-04-02 15:23   ` [patch v2 " Peter Zijlstra
@ 2020-04-02 23:42   ` Rasmus Villemoes
  2020-04-03 14:35     ` Jessica Yu
  2020-04-03 11:29     ` [patch 1/2] x86, module: " kbuild test robot
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 78+ messages in thread
From: Rasmus Villemoes @ 2020-04-02 23:42 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Kenneth R. Crudup, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

On 02/04/2020 14.32, Thomas Gleixner wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> It turns out that with Split-Lock-Detect enabled (default) any VMX
> hypervisor needs at least a little modification in order to not blindly
> inject the #AC into the guest without the guest being ready for it.
> 
> Since there is no telling which module implements a hypervisor, scan the
> module text and look for the VMLAUNCH instruction. If found, the module is
> assumed to be a hypervisor of some sort and SLD is disabled.

How long does that scan take/add to module load time? Would it make
sense to exempt in-tree modules?

Rasmus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 15:23   ` [patch v2 " Peter Zijlstra
  2020-04-02 16:20     ` Xiaoyao Li
@ 2020-04-03  8:09     ` David Laight
  2020-04-03 14:33       ` Peter Zijlstra
  1 sibling, 1 reply; 78+ messages in thread
From: David Laight @ 2020-04-03  8:09 UTC (permalink / raw)
  To: 'Peter Zijlstra', Thomas Gleixner
  Cc: LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Jessica Yu,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

From: Peter Zijlstra
> Sent: 02 April 2020 16:24
> 
> I picked VMXOFF (which also appears in vmmon.ko) instead of VMXON
> because that latter takes an argument is therefore more difficult to
> decode.
...
> +	while (text < text_end) {
> +		kernel_insn_init(&insn, text, text_end - text);
> +		insn_get_length(&insn);
> +
> +		if (WARN_ON_ONCE(!insn_complete(&insn)))
> +			break;
> +
> +		if (insn.length == 3 &&
> +		    (!memcmp(text, vmlaunch, sizeof(vmlaunch)) ||
> +		     !memcmp(text, vmxoff, sizeof(vmxoff))))
> +				goto bad_module;
> +
> +		text += insn.length;
> +	}

How long is that going to take on a module with (say) 400k of text?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
@ 2020-04-03 11:29     ` kbuild test robot
  2020-04-02 23:42   ` [patch " Rasmus Villemoes
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 78+ messages in thread
From: kbuild test robot @ 2020-04-03 11:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: kbuild-all, LKML, x86@kernel.org, Kenneth R. Crudup ,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1965 bytes --]

Hi Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20200403]
[cannot apply to jeyu/modules-next tip/x86/core v5.6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Thomas-Gleixner/x86-Prevent-Split-Lock-Detection-wreckage-on-VMX-hypervisors/20200403-171430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ee8bac724cc7767dcf9480afb656318994f22c3d
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   /usr/bin/ld: arch/x86/um/../kernel/module.o: in function `module_finalize':
>> module.c:(.text+0x315): undefined reference to `split_lock_validate_module_text'
>> collect2: error: ld returned 1 exit status
--
   In file included from arch/x86/um/../kernel/module.c:27:0:
>> arch/x86/include/asm/cpu.h:38:31: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
    int mwait_usable(const struct cpuinfo_x86 *);
                                  ^~~~~~~~~~~
   arch/x86/include/asm/cpu.h:44:49: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
    extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
                                                    ^~~~~~~~~~~

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8505 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86, module: Detect VMX modules and disable Split-Lock-Detect
@ 2020-04-03 11:29     ` kbuild test robot
  0 siblings, 0 replies; 78+ messages in thread
From: kbuild test robot @ 2020-04-03 11:29 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]

Hi Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20200403]
[cannot apply to jeyu/modules-next tip/x86/core v5.6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Thomas-Gleixner/x86-Prevent-Split-Lock-Detection-wreckage-on-VMX-hypervisors/20200403-171430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ee8bac724cc7767dcf9480afb656318994f22c3d
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   /usr/bin/ld: arch/x86/um/../kernel/module.o: in function `module_finalize':
>> module.c:(.text+0x315): undefined reference to `split_lock_validate_module_text'
>> collect2: error: ld returned 1 exit status
--
   In file included from arch/x86/um/../kernel/module.c:27:0:
>> arch/x86/include/asm/cpu.h:38:31: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
    int mwait_usable(const struct cpuinfo_x86 *);
                                  ^~~~~~~~~~~
   arch/x86/include/asm/cpu.h:44:49: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
    extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
                                                    ^~~~~~~~~~~

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 8505 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest
  2020-04-02 23:18                       ` Jim Mattson
@ 2020-04-03 12:16                         ` Thomas Gleixner
  0 siblings, 0 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-03 12:16 UTC (permalink / raw)
  To: Jim Mattson, Kenneth R. Crudup
  Cc: Steven Rostedt, Nadav Amit, Sean Christopherson, x86,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Thomas Hellstrom,
	Tony Luck, Peter Zijlstra, Jessica Yu, Vitaly Kuznetsov,
	Wanpeng Li, kvm, LKML, Doug Covelli

Jim,

Jim Mattson <jmattson@google.com> writes:
> On Thu, Apr 2, 2020 at 4:16 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>> On Thu, 2 Apr 2020, Steven Rostedt wrote:
>>
>> > If we go the approach of not letting VM modules load if it doesn't have the
>> > sld_safe flag set, how is this different than a VM module not loading due
>> > to kabi breakage?
>>
>> Why not a compromise: if such a module is attempted to be loaded, print up
>> a message saying something akin to "turn the parameter 'split_lock_detect'
>> off" as we reject loading it- and if we see that we've booted with it off
>> just splat a WARN_ON() if someone tries to load such modules?
>
> What modules are we talking about? I thought we were discussing L1
> hypervisors, which are just binary blobs. The only modules at the L0
> level are kvm and kvm_intel.

Maybe in your world, but VmWare (which got this started), VirtualBox,
Jailhouse and who knows what else _are_ L0 hypervisors. Otherwise we
wouldn't have that conversation at all.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch v2 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03  8:09     ` David Laight
@ 2020-04-03 14:33       ` Peter Zijlstra
  0 siblings, 0 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-03 14:33 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On Fri, Apr 03, 2020 at 08:09:03AM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 02 April 2020 16:24
> > 
> > I picked VMXOFF (which also appears in vmmon.ko) instead of VMXON
> > because that latter takes an argument is therefore more difficult to
> > decode.
> ...
> > +	while (text < text_end) {
> > +		kernel_insn_init(&insn, text, text_end - text);
> > +		insn_get_length(&insn);
> > +
> > +		if (WARN_ON_ONCE(!insn_complete(&insn)))
> > +			break;
> > +
> > +		if (insn.length == 3 &&
> > +		    (!memcmp(text, vmlaunch, sizeof(vmlaunch)) ||
> > +		     !memcmp(text, vmxoff, sizeof(vmxoff))))
> > +				goto bad_module;
> > +
> > +		text += insn.length;
> > +	}
> 
> How long is that going to take on a module with (say) 400k of text?

It's module load, why would you care? I suspect it's really fast, but
even if it wasn't I couldn't be arsed.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 23:42   ` [patch " Rasmus Villemoes
@ 2020-04-03 14:35     ` Jessica Yu
  2020-04-03 15:21       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Jessica Yu @ 2020-04-03 14:35 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

+++ Rasmus Villemoes [03/04/20 01:42 +0200]:
>On 02/04/2020 14.32, Thomas Gleixner wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> It turns out that with Split-Lock-Detect enabled (default) any VMX
>> hypervisor needs at least a little modification in order to not blindly
>> inject the #AC into the guest without the guest being ready for it.
>>
>> Since there is no telling which module implements a hypervisor, scan the
>> module text and look for the VMLAUNCH instruction. If found, the module is
>> assumed to be a hypervisor of some sort and SLD is disabled.
>
>How long does that scan take/add to module load time? Would it make
>sense to exempt in-tree modules?
>
>Rasmus

I second Rasmus's question. It seems rather unfortunate that we have
to do this text scan for every module load on x86, when it doesn't
apply to the majority of them, and only to a handful of out-of-tree
hypervisor modules (assuming kvm is taken care of already).

I wonder if it would make sense then to limit the text scans to just
out-of-tree modules (i.e., missing the intree modinfo flag)?

Jessica

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
@ 2020-04-03 14:43     ` kbuild test robot
  2020-04-02 23:42   ` [patch " Rasmus Villemoes
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 78+ messages in thread
From: kbuild test robot @ 2020-04-03 14:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: kbuild-all, LKML, x86@kernel.org, Kenneth R. Crudup ,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 3326 bytes --]

Hi Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20200403]
[cannot apply to jeyu/modules-next tip/x86/core v5.6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Thomas-Gleixner/x86-Prevent-Split-Lock-Detection-wreckage-on-VMX-hypervisors/20200403-171430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ee8bac724cc7767dcf9480afb656318994f22c3d
config: x86_64-randconfig-s1-20200403 (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/x86/kernel/module.c: In function 'module_finalize':
>> arch/x86/kernel/module.c:257:17: error: 'struct module' has no member named 'sld_safe'
     if (text && !me->sld_safe) {
                    ^~

vim +257 arch/x86/kernel/module.c

   220	
   221	int module_finalize(const Elf_Ehdr *hdr,
   222			    const Elf_Shdr *sechdrs,
   223			    struct module *me)
   224	{
   225		const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
   226			*para = NULL, *orc = NULL, *orc_ip = NULL;
   227		char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
   228	
   229		for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
   230			if (!strcmp(".text", secstrings + s->sh_name))
   231				text = s;
   232			if (!strcmp(".altinstructions", secstrings + s->sh_name))
   233				alt = s;
   234			if (!strcmp(".smp_locks", secstrings + s->sh_name))
   235				locks = s;
   236			if (!strcmp(".parainstructions", secstrings + s->sh_name))
   237				para = s;
   238			if (!strcmp(".orc_unwind", secstrings + s->sh_name))
   239				orc = s;
   240			if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
   241				orc_ip = s;
   242		}
   243	
   244		if (alt) {
   245			/* patch .altinstructions */
   246			void *aseg = (void *)alt->sh_addr;
   247			apply_alternatives(aseg, aseg + alt->sh_size);
   248		}
   249		if (locks && text) {
   250			void *lseg = (void *)locks->sh_addr;
   251			void *tseg = (void *)text->sh_addr;
   252			alternatives_smp_module_add(me, me->name,
   253						    lseg, lseg + locks->sh_size,
   254						    tseg, tseg + text->sh_size);
   255		}
   256	
 > 257		if (text && !me->sld_safe) {
   258			void *tseg = (void *)text->sh_addr;
   259			split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
   260		}
   261	
   262		if (para) {
   263			void *pseg = (void *)para->sh_addr;
   264			apply_paravirt(pseg, pseg + para->sh_size);
   265		}
   266	
   267		/* make jump label nops */
   268		jump_label_apply_nops(me);
   269	
   270		if (orc && orc_ip)
   271			unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size,
   272					   (void *)orc->sh_addr, orc->sh_size);
   273	
   274		return 0;
   275	}
   276	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36422 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86, module: Detect VMX modules and disable Split-Lock-Detect
@ 2020-04-03 14:43     ` kbuild test robot
  0 siblings, 0 replies; 78+ messages in thread
From: kbuild test robot @ 2020-04-03 14:43 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3420 bytes --]

Hi Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20200403]
[cannot apply to jeyu/modules-next tip/x86/core v5.6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Thomas-Gleixner/x86-Prevent-Split-Lock-Detection-wreckage-on-VMX-hypervisors/20200403-171430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ee8bac724cc7767dcf9480afb656318994f22c3d
config: x86_64-randconfig-s1-20200403 (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/x86/kernel/module.c: In function 'module_finalize':
>> arch/x86/kernel/module.c:257:17: error: 'struct module' has no member named 'sld_safe'
     if (text && !me->sld_safe) {
                    ^~

vim +257 arch/x86/kernel/module.c

   220	
   221	int module_finalize(const Elf_Ehdr *hdr,
   222			    const Elf_Shdr *sechdrs,
   223			    struct module *me)
   224	{
   225		const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
   226			*para = NULL, *orc = NULL, *orc_ip = NULL;
   227		char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
   228	
   229		for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
   230			if (!strcmp(".text", secstrings + s->sh_name))
   231				text = s;
   232			if (!strcmp(".altinstructions", secstrings + s->sh_name))
   233				alt = s;
   234			if (!strcmp(".smp_locks", secstrings + s->sh_name))
   235				locks = s;
   236			if (!strcmp(".parainstructions", secstrings + s->sh_name))
   237				para = s;
   238			if (!strcmp(".orc_unwind", secstrings + s->sh_name))
   239				orc = s;
   240			if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
   241				orc_ip = s;
   242		}
   243	
   244		if (alt) {
   245			/* patch .altinstructions */
   246			void *aseg = (void *)alt->sh_addr;
   247			apply_alternatives(aseg, aseg + alt->sh_size);
   248		}
   249		if (locks && text) {
   250			void *lseg = (void *)locks->sh_addr;
   251			void *tseg = (void *)text->sh_addr;
   252			alternatives_smp_module_add(me, me->name,
   253						    lseg, lseg + locks->sh_size,
   254						    tseg, tseg + text->sh_size);
   255		}
   256	
 > 257		if (text && !me->sld_safe) {
   258			void *tseg = (void *)text->sh_addr;
   259			split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
   260		}
   261	
   262		if (para) {
   263			void *pseg = (void *)para->sh_addr;
   264			apply_paravirt(pseg, pseg + para->sh_size);
   265		}
   266	
   267		/* make jump label nops */
   268		jump_label_apply_nops(me);
   269	
   270		if (orc && orc_ip)
   271			unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size,
   272					   (void *)orc->sh_addr, orc->sh_size);
   273	
   274		return 0;
   275	}
   276	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36422 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 14:35     ` Jessica Yu
@ 2020-04-03 15:21       ` Peter Zijlstra
  2020-04-03 16:01         ` Sean Christopherson
  2020-04-03 18:53         ` Thomas Gleixner
  0 siblings, 2 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-03 15:21 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Rasmus Villemoes, Thomas Gleixner, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> +++ Rasmus Villemoes [03/04/20 01:42 +0200]:
> > On 02/04/2020 14.32, Thomas Gleixner wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > > 
> > > It turns out that with Split-Lock-Detect enabled (default) any VMX
> > > hypervisor needs at least a little modification in order to not blindly
> > > inject the #AC into the guest without the guest being ready for it.
> > > 
> > > Since there is no telling which module implements a hypervisor, scan the
> > > module text and look for the VMLAUNCH instruction. If found, the module is
> > > assumed to be a hypervisor of some sort and SLD is disabled.
> > 
> > How long does that scan take/add to module load time? Would it make
> > sense to exempt in-tree modules?
> > 
> > Rasmus
> 
> I second Rasmus's question. It seems rather unfortunate that we have
> to do this text scan for every module load on x86, when it doesn't
> apply to the majority of them, and only to a handful of out-of-tree
> hypervisor modules (assuming kvm is taken care of already).
> 
> I wonder if it would make sense then to limit the text scans to just
> out-of-tree modules (i.e., missing the intree modinfo flag)?

It would; didn't know there was one.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 15:21       ` Peter Zijlstra
@ 2020-04-03 16:01         ` Sean Christopherson
  2020-04-03 16:12           ` Peter Zijlstra
  2020-04-03 18:53         ` Thomas Gleixner
  1 sibling, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-03 16:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jessica Yu, Rasmus Villemoes, Thomas Gleixner, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> > +++ Rasmus Villemoes [03/04/20 01:42 +0200]:
> > > On 02/04/2020 14.32, Thomas Gleixner wrote:
> > > > From: Peter Zijlstra <peterz@infradead.org>
> > > > 
> > > > It turns out that with Split-Lock-Detect enabled (default) any VMX
> > > > hypervisor needs at least a little modification in order to not blindly
> > > > inject the #AC into the guest without the guest being ready for it.
> > > > 
> > > > Since there is no telling which module implements a hypervisor, scan the
> > > > module text and look for the VMLAUNCH instruction. If found, the module is
> > > > assumed to be a hypervisor of some sort and SLD is disabled.
> > > 
> > > How long does that scan take/add to module load time? Would it make
> > > sense to exempt in-tree modules?
> > > 
> > > Rasmus
> > 
> > I second Rasmus's question. It seems rather unfortunate that we have
> > to do this text scan for every module load on x86, when it doesn't
> > apply to the majority of them, and only to a handful of out-of-tree
> > hypervisor modules (assuming kvm is taken care of already).
> > 
> > I wonder if it would make sense then to limit the text scans to just
> > out-of-tree modules (i.e., missing the intree modinfo flag)?
> 
> It would; didn't know there was one.

Rather than scanning modules at all, what about hooking native_write_cr4()
to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
"sld safe" counter?

Partially tested patch incoming...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:01         ` Sean Christopherson
@ 2020-04-03 16:12           ` Peter Zijlstra
  2020-04-03 16:16             ` David Laight
  2020-04-03 16:25             ` Sean Christopherson
  0 siblings, 2 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-03 16:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jessica Yu, Rasmus Villemoes, Thomas Gleixner, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
> On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:

> > > I wonder if it would make sense then to limit the text scans to just
> > > out-of-tree modules (i.e., missing the intree modinfo flag)?
> > 
> > It would; didn't know there was one.
> 
> Rather than scanning modules at all, what about hooking native_write_cr4()
> to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
> "sld safe" counter?

And then you're hoping that the module uses that and not:

  asm volatile ("mov %0, cr4" :: "r" (val));

I think I feel safer with the scanning to be fair. Also with the intree
hint on, we can extend the scanning for out-of-tree modules for more
dodgy crap we really don't want modules to do, like for example the
above.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:12           ` Peter Zijlstra
@ 2020-04-03 16:16             ` David Laight
  2020-04-03 16:39               ` Peter Zijlstra
  2020-04-03 16:25             ` Sean Christopherson
  1 sibling, 1 reply; 78+ messages in thread
From: David Laight @ 2020-04-03 16:16 UTC (permalink / raw)
  To: 'Peter Zijlstra', Sean Christopherson
  Cc: Jessica Yu, Rasmus Villemoes, Thomas Gleixner, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

From: Peter Zijlstra
> Sent: 03 April 2020 17:12
> On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
> > On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> > > On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> 
> > > > I wonder if it would make sense then to limit the text scans to just
> > > > out-of-tree modules (i.e., missing the intree modinfo flag)?
> > >
> > > It would; didn't know there was one.
> >
> > Rather than scanning modules at all, what about hooking native_write_cr4()
> > to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
> > "sld safe" counter?
> 
> And then you're hoping that the module uses that and not:
> 
>   asm volatile ("mov %0, cr4" :: "r" (val));
> 
> I think I feel safer with the scanning to be fair. Also with the intree
> hint on, we can extend the scanning for out-of-tree modules for more
> dodgy crap we really don't want modules to do, like for example the
> above.

Could you do the scanning in the last phase of the module build
that has to be done against the target kernel headers and with the
target kernel build infrastructure?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:12           ` Peter Zijlstra
  2020-04-03 16:16             ` David Laight
@ 2020-04-03 16:25             ` Sean Christopherson
  2020-04-03 16:40               ` Peter Zijlstra
  1 sibling, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-03 16:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jessica Yu, Rasmus Villemoes, Thomas Gleixner, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 06:12:05PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
> > On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> > > On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> 
> > > > I wonder if it would make sense then to limit the text scans to just
> > > > out-of-tree modules (i.e., missing the intree modinfo flag)?
> > > 
> > > It would; didn't know there was one.
> > 
> > Rather than scanning modules at all, what about hooking native_write_cr4()
> > to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
> > "sld safe" counter?
> 
> And then you're hoping that the module uses that and not:
> 
>   asm volatile ("mov %0, cr4" :: "r" (val));
> 
> I think I feel safer with the scanning to be fair. Also with the intree
> hint on, we can extend the scanning for out-of-tree modules for more
> dodgy crap we really don't want modules to do, like for example the
> above.

Ya, that's the big uknown.  But wouldn't they'd already be broken in the
sense that they'd corrupt the CR4 shadow?  E.g. setting VMXE without
updating cpu_tlbstate.cr4 would result in future in-kernel writes to CR4
attempting to clear CR4.VMXE post-VMXON, which would #GP.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
                     ` (3 preceding siblings ...)
  2020-04-03 14:43     ` [patch 1/2] x86, module: " kbuild test robot
@ 2020-04-03 16:36   ` Sean Christopherson
  2020-04-03 16:41     ` Peter Zijlstra
  2020-04-06 12:23   ` Christoph Hellwig
  5 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2020-04-03 16:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kenneth R. Crudup, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 02:32:59PM +0200, Thomas Gleixner wrote:
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -24,6 +24,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/setup.h>
>  #include <asm/unwind.h>
> +#include <asm/cpu.h>
>  
>  #if 0
>  #define DEBUGP(fmt, ...)				\
> @@ -253,6 +254,11 @@ int module_finalize(const Elf_Ehdr *hdr,
>  					    tseg, tseg + text->sh_size);
>  	}
>  
> +	if (text && !me->sld_safe) {

As also reported by the test bot, sld_safe only exist if CPU_SUP_INTEL=y.

This can also be conditioned on boot_cpu_has(X86_FEATURE_VMX), or the
static variant.  If CPU_SUP_INTEL=y, X86_FEATURE_VMX will be set if and
only if VMX is fully enabled, i.e. supported by the CPU and enabled in
MSR_IA32_FEATURE_CONTROl.

> +		void *tseg = (void *)text->sh_addr;
> +		split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
> +	}
> +
>  	if (para) {
>  		void *pseg = (void *)para->sh_addr;
>  		apply_paravirt(pseg, pseg + para->sh_size);
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -407,6 +407,10 @@ struct module {
>  	bool sig_ok;
>  #endif
>  
> +#ifdef CONFIG_CPU_SUP_INTEL
> +	bool sld_safe;
> +#endif
> +
>  	bool async_probe_requested;
>  
>  	/* symbols that will be GPL-only in the near future. */
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3096,6 +3096,11 @@ static int check_modinfo(struct module *
>  			"is unknown, you have been warned.\n", mod->name);
>  	}
>  
> +#ifdef CONFIG_CPU_SUP_INTEL
> +	if (get_modinfo(info, "sld_safe"))
> +		mod->sld_safe = true;
> +#endif
> +
>  	err = check_modinfo_livepatch(mod, info);
>  	if (err)
>  		return err;
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:16             ` David Laight
@ 2020-04-03 16:39               ` Peter Zijlstra
  0 siblings, 0 replies; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-03 16:39 UTC (permalink / raw)
  To: David Laight
  Cc: Sean Christopherson, Jessica Yu, Rasmus Villemoes,
	Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom, Tony Luck,
	Steven Rostedt, Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 04:16:38PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 03 April 2020 17:12
> > On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
> > > On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> > > > On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> > 
> > > > > I wonder if it would make sense then to limit the text scans to just
> > > > > out-of-tree modules (i.e., missing the intree modinfo flag)?
> > > >
> > > > It would; didn't know there was one.
> > >
> > > Rather than scanning modules at all, what about hooking native_write_cr4()
> > > to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
> > > "sld safe" counter?
> > 
> > And then you're hoping that the module uses that and not:
> > 
> >   asm volatile ("mov %0, cr4" :: "r" (val));
> > 
> > I think I feel safer with the scanning to be fair. Also with the intree
> > hint on, we can extend the scanning for out-of-tree modules for more
> > dodgy crap we really don't want modules to do, like for example the
> > above.
> 
> Could you do the scanning in the last phase of the module build
> that has to be done against the target kernel headers and with the
> target kernel build infrastructure?

You think the binary module people care to respect our module build
warnings?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:25             ` Sean Christopherson
@ 2020-04-03 16:40               ` Peter Zijlstra
  2020-04-03 16:48                 ` Nadav Amit
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-03 16:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jessica Yu, Rasmus Villemoes, Thomas Gleixner, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 09:25:55AM -0700, Sean Christopherson wrote:
> On Fri, Apr 03, 2020 at 06:12:05PM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
> > > On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> > > > On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> > 
> > > > > I wonder if it would make sense then to limit the text scans to just
> > > > > out-of-tree modules (i.e., missing the intree modinfo flag)?
> > > > 
> > > > It would; didn't know there was one.
> > > 
> > > Rather than scanning modules at all, what about hooking native_write_cr4()
> > > to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
> > > "sld safe" counter?
> > 
> > And then you're hoping that the module uses that and not:
> > 
> >   asm volatile ("mov %0, cr4" :: "r" (val));
> > 
> > I think I feel safer with the scanning to be fair. Also with the intree
> > hint on, we can extend the scanning for out-of-tree modules for more
> > dodgy crap we really don't want modules to do, like for example the
> > above.
> 
> Ya, that's the big uknown.  But wouldn't they'd already be broken in the
> sense that they'd corrupt the CR4 shadow?  E.g. setting VMXE without
> updating cpu_tlbstate.cr4 would result in future in-kernel writes to CR4
> attempting to clear CR4.VMXE post-VMXON, which would #GP.

Sadly the CR4 shadow is exported, so they can actually fix that up :/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:36   ` [patch 1/2] x86,module: " Sean Christopherson
@ 2020-04-03 16:41     ` Peter Zijlstra
  2020-04-03 18:35       ` Jessica Yu
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-03 16:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Tony Luck, Steven Rostedt

On Fri, Apr 03, 2020 at 09:36:05AM -0700, Sean Christopherson wrote:
> On Thu, Apr 02, 2020 at 02:32:59PM +0200, Thomas Gleixner wrote:
> > --- a/arch/x86/kernel/module.c
> > +++ b/arch/x86/kernel/module.c
> > @@ -24,6 +24,7 @@
> >  #include <asm/pgtable.h>
> >  #include <asm/setup.h>
> >  #include <asm/unwind.h>
> > +#include <asm/cpu.h>
> >  
> >  #if 0
> >  #define DEBUGP(fmt, ...)				\
> > @@ -253,6 +254,11 @@ int module_finalize(const Elf_Ehdr *hdr,
> >  					    tseg, tseg + text->sh_size);
> >  	}
> >  
> > +	if (text && !me->sld_safe) {
> 
> As also reported by the test bot, sld_safe only exist if CPU_SUP_INTEL=y.
> 
> This can also be conditioned on boot_cpu_has(X86_FEATURE_VMX), or the
> static variant.  If CPU_SUP_INTEL=y, X86_FEATURE_VMX will be set if and
> only if VMX is fully enabled, i.e. supported by the CPU and enabled in
> MSR_IA32_FEATURE_CONTROl.
> 
> > +		void *tseg = (void *)text->sh_addr;
> > +		split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
> > +	}

Ideally we push it all into arch code, but load_info isn't exposed to
arch module code :/.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:40               ` Peter Zijlstra
@ 2020-04-03 16:48                 ` Nadav Amit
  2020-04-03 17:21                   ` Sean Christopherson
  0 siblings, 1 reply; 78+ messages in thread
From: Nadav Amit @ 2020-04-03 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, Jessica Yu, Rasmus Villemoes,
	Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Thomas Hellstrom, Tony Luck,
	Steven Rostedt, Greg Kroah-Hartman, jannh, keescook

> On Apr 3, 2020, at 9:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Fri, Apr 03, 2020 at 09:25:55AM -0700, Sean Christopherson wrote:
>> On Fri, Apr 03, 2020 at 06:12:05PM +0200, Peter Zijlstra wrote:
>>> On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
>>>> On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
>>>>> On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
>>> 
>>>>>> I wonder if it would make sense then to limit the text scans to just
>>>>>> out-of-tree modules (i.e., missing the intree modinfo flag)?
>>>>> 
>>>>> It would; didn't know there was one.
>>>> 
>>>> Rather than scanning modules at all, what about hooking native_write_cr4()
>>>> to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
>>>> "sld safe" counter?
>>> 
>>> And then you're hoping that the module uses that and not:
>>> 
>>>  asm volatile ("mov %0, cr4" :: "r" (val));
>>> 
>>> I think I feel safer with the scanning to be fair. Also with the intree
>>> hint on, we can extend the scanning for out-of-tree modules for more
>>> dodgy crap we really don't want modules to do, like for example the
>>> above.
>> 
>> Ya, that's the big uknown.  But wouldn't they'd already be broken in the
>> sense that they'd corrupt the CR4 shadow?  E.g. setting VMXE without
>> updating cpu_tlbstate.cr4 would result in future in-kernel writes to CR4
>> attempting to clear CR4.VMXE post-VMXON, which would #GP.
> 
> Sadly the CR4 shadow is exported, so they can actually fix that up :/

I do not think that Sean’s idea would work for VMware.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:48                 ` Nadav Amit
@ 2020-04-03 17:21                   ` Sean Christopherson
  0 siblings, 0 replies; 78+ messages in thread
From: Sean Christopherson @ 2020-04-03 17:21 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Peter Zijlstra, Jessica Yu, Rasmus Villemoes, Thomas Gleixner,
	LKML, x86, Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu,
	Xiaoyao Li, Thomas Hellstrom, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook

On Fri, Apr 03, 2020 at 04:48:35PM +0000, Nadav Amit wrote:
> > On Apr 3, 2020, at 9:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Fri, Apr 03, 2020 at 09:25:55AM -0700, Sean Christopherson wrote:
> >> On Fri, Apr 03, 2020 at 06:12:05PM +0200, Peter Zijlstra wrote:
> >>> On Fri, Apr 03, 2020 at 09:01:56AM -0700, Sean Christopherson wrote:
> >>>> On Fri, Apr 03, 2020 at 05:21:58PM +0200, Peter Zijlstra wrote:
> >>>>> On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
> >>> 
> >>>>>> I wonder if it would make sense then to limit the text scans to just
> >>>>>> out-of-tree modules (i.e., missing the intree modinfo flag)?
> >>>>> 
> >>>>> It would; didn't know there was one.
> >>>> 
> >>>> Rather than scanning modules at all, what about hooking native_write_cr4()
> >>>> to kill SLD if CR4.VMXE is toggled on and the caller didn't increment a
> >>>> "sld safe" counter?
> >>> 
> >>> And then you're hoping that the module uses that and not:
> >>> 
> >>>  asm volatile ("mov %0, cr4" :: "r" (val));
> >>> 
> >>> I think I feel safer with the scanning to be fair. Also with the intree
> >>> hint on, we can extend the scanning for out-of-tree modules for more
> >>> dodgy crap we really don't want modules to do, like for example the
> >>> above.
> >> 
> >> Ya, that's the big uknown.  But wouldn't they'd already be broken in the
> >> sense that they'd corrupt the CR4 shadow?  E.g. setting VMXE without
> >> updating cpu_tlbstate.cr4 would result in future in-kernel writes to CR4
> >> attempting to clear CR4.VMXE post-VMXON, which would #GP.
> > 
> > Sadly the CR4 shadow is exported, so they can actually fix that up :/
> 
> I do not think that Sean’s idea would work for VMware.

Well phooey.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 16:41     ` Peter Zijlstra
@ 2020-04-03 18:35       ` Jessica Yu
  0 siblings, 0 replies; 78+ messages in thread
From: Jessica Yu @ 2020-04-03 18:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, Thomas Gleixner, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Tony Luck, Steven Rostedt

+++ Peter Zijlstra [03/04/20 18:41 +0200]:
>On Fri, Apr 03, 2020 at 09:36:05AM -0700, Sean Christopherson wrote:
>> On Thu, Apr 02, 2020 at 02:32:59PM +0200, Thomas Gleixner wrote:
>> > --- a/arch/x86/kernel/module.c
>> > +++ b/arch/x86/kernel/module.c
>> > @@ -24,6 +24,7 @@
>> >  #include <asm/pgtable.h>
>> >  #include <asm/setup.h>
>> >  #include <asm/unwind.h>
>> > +#include <asm/cpu.h>
>> >
>> >  #if 0
>> >  #define DEBUGP(fmt, ...)				\
>> > @@ -253,6 +254,11 @@ int module_finalize(const Elf_Ehdr *hdr,
>> >  					    tseg, tseg + text->sh_size);
>> >  	}
>> >
>> > +	if (text && !me->sld_safe) {
>>
>> As also reported by the test bot, sld_safe only exist if CPU_SUP_INTEL=y.
>>
>> This can also be conditioned on boot_cpu_has(X86_FEATURE_VMX), or the
>> static variant.  If CPU_SUP_INTEL=y, X86_FEATURE_VMX will be set if and
>> only if VMX is fully enabled, i.e. supported by the CPU and enabled in
>> MSR_IA32_FEATURE_CONTROl.
>>
>> > +		void *tseg = (void *)text->sh_addr;
>> > +		split_lock_validate_module_text(me, tseg, tseg + text->sh_size);
>> > +	}
>
>Ideally we push it all into arch code, but load_info isn't exposed to
>arch module code :/.

Hm, I can look into exposing load_info to arch module code and will
post a patch on Monday.

Jessica

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 15:21       ` Peter Zijlstra
  2020-04-03 16:01         ` Sean Christopherson
@ 2020-04-03 18:53         ` Thomas Gleixner
  2020-04-03 20:58           ` Andy Lutomirski
  1 sibling, 1 reply; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-03 18:53 UTC (permalink / raw)
  To: Peter Zijlstra, Jessica Yu
  Cc: Rasmus Villemoes, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt,
	Greg Kroah-Hartman, jannh, keescook, vbox-dev

Peter Zijlstra <peterz@infradead.org> writes:
> On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
>> +++ Rasmus Villemoes [03/04/20 01:42 +0200]:
>> > On 02/04/2020 14.32, Thomas Gleixner wrote:
>> > > From: Peter Zijlstra <peterz@infradead.org>
>> > > 
>> > > It turns out that with Split-Lock-Detect enabled (default) any VMX
>> > > hypervisor needs at least a little modification in order to not blindly
>> > > inject the #AC into the guest without the guest being ready for it.
>> > > 
>> > > Since there is no telling which module implements a hypervisor, scan the
>> > > module text and look for the VMLAUNCH instruction. If found, the module is
>> > > assumed to be a hypervisor of some sort and SLD is disabled.
>> > 
>> > How long does that scan take/add to module load time? Would it make
>> > sense to exempt in-tree modules?
>> > 
>> > Rasmus
>> 
>> I second Rasmus's question. It seems rather unfortunate that we have
>> to do this text scan for every module load on x86, when it doesn't
>> apply to the majority of them, and only to a handful of out-of-tree
>> hypervisor modules (assuming kvm is taken care of already).
>> 
>> I wonder if it would make sense then to limit the text scans to just
>> out-of-tree modules (i.e., missing the intree modinfo flag)?
>
> It would; didn't know there was one.

But that still would not make it complete.

I was staring at virtualbox today after Jann pointed out that this
sucker does complete backwards things.

  The kernel driver does not contain any VM* instructions at all.

The actual hypervisor code is built as a separate binary and somehow
loaded into the kernel with their own magic fixup of relocations and
function linking. This "design" probably comes from the original
virtualbox implementation which circumvented GPL that way.

TBH, I don't care if we wreckage virtualbox simply because that thing is
already a complete and utter trainwreck violating taste and common sense
in any possible way. Just for illustration:

  - It installs preempt notifiers and the first thing in the callback
    function is to issue 'stac()'!

  - There is quite some other horrible code in there which fiddles in
    the guts of the kernel just because it can.

  - Conditionals in release code which check stuff like
    VBOX_WITH_TEXT_MODMEM_HACK, VBOX_WITH_EFLAGS_AC_SET_IN_VBOXDRV,
    VBOX_WITH_NON_PROD_HACK_FOR_PERF_STACKS along with the most absurd
    hacks ever.

If you feel the need to look yourself, please use your eyecancer
protection gear.

Can someone at Oracle please make sure, that this monstrosity gets shred
in pieces?

Enough vented, but that still does not solve the SLD problem in any
sensible way.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 18:53         ` Thomas Gleixner
@ 2020-04-03 20:58           ` Andy Lutomirski
  2020-04-03 21:49             ` Thomas Gleixner
  0 siblings, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2020-04-03 20:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Jessica Yu, Rasmus Villemoes, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Steven Rostedt, Greg Kroah-Hartman, jannh, keescook, vbox-dev


> On Apr 3, 2020, at 11:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Peter Zijlstra <peterz@infradead.org> writes:
>>> On Fri, Apr 03, 2020 at 04:35:00PM +0200, Jessica Yu wrote:
>>> +++ Rasmus Villemoes [03/04/20 01:42 +0200]:
>>>> On 02/04/2020 14.32, Thomas Gleixner wrote:
>>>>> From: Peter Zijlstra <peterz@infradead.org>
>>>>> 
>>>>> It turns out that with Split-Lock-Detect enabled (default) any VMX
>>>>> hypervisor needs at least a little modification in order to not blindly
>>>>> inject the #AC into the guest without the guest being ready for it.
>>>>> 
>>>>> Since there is no telling which module implements a hypervisor, scan the
>>>>> module text and look for the VMLAUNCH instruction. If found, the module is
>>>>> assumed to be a hypervisor of some sort and SLD is disabled.
>>>> 
>>>> How long does that scan take/add to module load time? Would it make
>>>> sense to exempt in-tree modules?
>>>> 
>>>> Rasmus
>>> 
>>> I second Rasmus's question. It seems rather unfortunate that we have
>>> to do this text scan for every module load on x86, when it doesn't
>>> apply to the majority of them, and only to a handful of out-of-tree
>>> hypervisor modules (assuming kvm is taken care of already).
>>> 
>>> I wonder if it would make sense then to limit the text scans to just
>>> out-of-tree modules (i.e., missing the intree modinfo flag)?
>> 
>> It would; didn't know there was one.
> 
> But that still would not make it complete.
> 
> I was staring at virtualbox today after Jann pointed out that this
> sucker does complete backwards things.
> 
>  The kernel driver does not contain any VM* instructions at all.
> 
> The actual hypervisor code is built as a separate binary and somehow
> loaded into the kernel with their own magic fixup of relocations and
> function linking. This "design" probably comes from the original
> virtualbox implementation which circumvented GPL that way.
> 
> TBH, I don't care if we wreckage virtualbox simply because that thing is
> already a complete and utter trainwreck violating taste and common sense
> in any possible way. Just for illustration:
> 
>  - It installs preempt notifiers and the first thing in the callback
>    function is to issue 'stac()'!
> 
>  - There is quite some other horrible code in there which fiddles in
>    the guts of the kernel just because it can.
> 
>  - Conditionals in release code which check stuff like
>    VBOX_WITH_TEXT_MODMEM_HACK, VBOX_WITH_EFLAGS_AC_SET_IN_VBOXDRV,
>    VBOX_WITH_NON_PROD_HACK_FOR_PERF_STACKS along with the most absurd
>    hacks ever.
> 
> If you feel the need to look yourself, please use your eyecancer
> protection gear.
> 
> Can someone at Oracle please make sure, that this monstrosity gets shred
> in pieces?
> 
> Enough vented, but that still does not solve the SLD problem in any
> sensible way.

Could we unexport set_memory_x perhaps?  And maybe try to make virtualbox break in as many ways as possible?

> 
> Thanks,
> 
>        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-03 20:58           ` Andy Lutomirski
@ 2020-04-03 21:49             ` Thomas Gleixner
  0 siblings, 0 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-03 21:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Jessica Yu, Rasmus Villemoes, LKML, x86,
	Kenneth R. Crudup, Paolo Bonzini, Fenghua Yu, Xiaoyao Li,
	Nadav Amit, Thomas Hellstrom, Sean Christopherson, Tony Luck,
	Steven Rostedt, Greg Kroah-Hartman, jannh, keescook, vbox-dev

Andy Lutomirski <luto@amacapital.net> writes:
>> On Apr 3, 2020, at 11:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> Enough vented, but that still does not solve the SLD problem in any
>> sensible way.
>
> Could we unexport set_memory_x perhaps?  And maybe try to make
> virtualbox break in as many ways as possible?

set_memory_x() is not exported anymore, but they found new ways of
circumvention. set_memory_x() is only used on really old kernels.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
                     ` (4 preceding siblings ...)
  2020-04-03 16:36   ` [patch 1/2] x86,module: " Sean Christopherson
@ 2020-04-06 12:23   ` Christoph Hellwig
  2020-04-06 14:40     ` Peter Zijlstra
  5 siblings, 1 reply; 78+ messages in thread
From: Christoph Hellwig @ 2020-04-06 12:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kenneth R. Crudup, Peter Zijlstra (Intel),
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

On Thu, Apr 02, 2020 at 02:32:59PM +0200, Thomas Gleixner wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> It turns out that with Split-Lock-Detect enabled (default) any VMX
> hypervisor needs at least a little modification in order to not blindly
> inject the #AC into the guest without the guest being ready for it.
> 
> Since there is no telling which module implements a hypervisor, scan the
> module text and look for the VMLAUNCH instruction. If found, the module is
> assumed to be a hypervisor of some sort and SLD is disabled.
> 
> Hypervisors, which have been modified and are known to work correctly,
> can add:
> 
>   MODULE_INFO(sld_safe, "Y");
> 
> to explicitly tell the module loader they're good.
> 
> NOTE: it is unfortunate that struct load_info is not available to the
>       arch module code, this means CONFIG_CPU_SUP_INTEL gunk is needed
>       in generic code.
> 
> NOTE: while we can 'trivially' fix KVM, we're still stuck with stuff
>       like VMware and VirtualBox doing their own thing.

This is just crazy.  We have never cared about any out tree module, why
would we care here where it creates a real complexity.  Just fix KVM
and ignore anything else.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-06 12:23   ` Christoph Hellwig
@ 2020-04-06 14:40     ` Peter Zijlstra
  2020-04-06 15:18       ` Christoph Hellwig
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-06 14:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On Mon, Apr 06, 2020 at 05:23:43AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 02, 2020 at 02:32:59PM +0200, Thomas Gleixner wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > It turns out that with Split-Lock-Detect enabled (default) any VMX
> > hypervisor needs at least a little modification in order to not blindly
> > inject the #AC into the guest without the guest being ready for it.
> > 
> > Since there is no telling which module implements a hypervisor, scan the
> > module text and look for the VMLAUNCH instruction. If found, the module is
> > assumed to be a hypervisor of some sort and SLD is disabled.
> > 
> > Hypervisors, which have been modified and are known to work correctly,
> > can add:
> > 
> >   MODULE_INFO(sld_safe, "Y");
> > 
> > to explicitly tell the module loader they're good.
> > 
> > NOTE: it is unfortunate that struct load_info is not available to the
> >       arch module code, this means CONFIG_CPU_SUP_INTEL gunk is needed
> >       in generic code.
> > 
> > NOTE: while we can 'trivially' fix KVM, we're still stuck with stuff
> >       like VMware and VirtualBox doing their own thing.
> 
> This is just crazy.  We have never cared about any out tree module, why
> would we care here where it creates a real complexity.  Just fix KVM
> and ignore anything else.

It is absolutely bonkers, but at the same time we can extend this
infrastructure to scan for dubious code patterns we don't want to
support. Like for instance direct manipulation of CR4.

Look at is as another layer to enforce sanity on binary only modules.

Do we want to go that way, and do you know of other code patterns you'd
want to fail module loading for?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-06 14:40     ` Peter Zijlstra
@ 2020-04-06 15:18       ` Christoph Hellwig
  2020-04-06 15:22         ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Christoph Hellwig @ 2020-04-06 15:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Thomas Gleixner, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck, Steven Rostedt

On Mon, Apr 06, 2020 at 04:40:20PM +0200, Peter Zijlstra wrote:
> It is absolutely bonkers, but at the same time we can extend this
> infrastructure to scan for dubious code patterns we don't want to
> support. Like for instance direct manipulation of CR4.

But that is not what this code does - it disables split lock detection.
If it failed to load the module the whole thing would make a little
more sense.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-06 15:18       ` Christoph Hellwig
@ 2020-04-06 15:22         ` Peter Zijlstra
  2020-04-06 18:27           ` Steven Rostedt
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2020-04-06 15:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thomas Gleixner, LKML, x86, Kenneth R. Crudup, Paolo Bonzini,
	Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit, Thomas Hellstrom,
	Sean Christopherson, Tony Luck, Steven Rostedt

On Mon, Apr 06, 2020 at 08:18:47AM -0700, Christoph Hellwig wrote:
> On Mon, Apr 06, 2020 at 04:40:20PM +0200, Peter Zijlstra wrote:
> > It is absolutely bonkers, but at the same time we can extend this
> > infrastructure to scan for dubious code patterns we don't want to
> > support. Like for instance direct manipulation of CR4.
> 
> But that is not what this code does - it disables split lock detection.
> If it failed to load the module the whole thing would make a little
> more sense.

If this lives, it'll be to just to fail module loading. IIRC the same
was suggested elsewhere in the thread.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect
  2020-04-06 15:22         ` Peter Zijlstra
@ 2020-04-06 18:27           ` Steven Rostedt
  0 siblings, 0 replies; 78+ messages in thread
From: Steven Rostedt @ 2020-04-06 18:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Thomas Gleixner, LKML, x86, Kenneth R. Crudup,
	Paolo Bonzini, Jessica Yu, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Sean Christopherson, Tony Luck

On Mon, 6 Apr 2020 17:22:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Apr 06, 2020 at 08:18:47AM -0700, Christoph Hellwig wrote:
> > On Mon, Apr 06, 2020 at 04:40:20PM +0200, Peter Zijlstra wrote:  
> > > It is absolutely bonkers, but at the same time we can extend this
> > > infrastructure to scan for dubious code patterns we don't want to
> > > support. Like for instance direct manipulation of CR4.  
> > 
> > But that is not what this code does - it disables split lock detection.
> > If it failed to load the module the whole thing would make a little
> > more sense.  
> 
> If this lives, it'll be to just to fail module loading. IIRC the same
> was suggested elsewhere in the thread.

I believe I may have been the one to suggest it. It's no different than
breaking kabi if you ask me. If a module can't deal with a new feature,
than it should not be able to load. And let whoever owns that module fix it
for their users.

-- Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  2020-04-02 19:06           ` Thomas Gleixner
@ 2020-04-10  4:39             ` Xiaoyao Li
  2020-04-10 10:21               ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Xiaoyao Li @ 2020-04-10  4:39 UTC (permalink / raw)
  To: Thomas Gleixner, Sean Christopherson, Paolo Bonzini
  Cc: x86, Kenneth R . Crudup, Fenghua Yu, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Peter Zijlstra, Jessica Yu,
	Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, kvm,
	linux-kernel

On 4/3/2020 3:06 AM, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> On Thu, Apr 02, 2020 at 07:01:56PM +0200, Thomas Gleixner wrote:
>>>>   static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
>>>>   static inline void switch_to_sld(unsigned long tifn) {}
>>>> -static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>>>> +static inline bool handle_user_split_lock(unsigned long ip)
>>>
>>> This is necessary because VMX can be compiled without CPU_SUP_INTEL?
>>
>> Ya, it came about when cleaning up the IA32_FEATURE_CONTROL MSR handling
>> to consolidate duplicate code.
>>
>> config KVM_INTEL
>>          tristate "KVM for Intel (and compatible) processors support"
>>          depends on KVM && IA32_FEAT_CTL
>>
>> config IA32_FEAT_CTL
>>          def_bool y
>>          depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN
> 
> Ah, indeed. So something like the below would make sense. Hmm?
> 
> Of course that can be mangled into Xiaoyao's patches, I'm not worried
> about my patch count :)
> 

I don't mind using yours in my next version.

Hi Paolo,

Are you OK with the kvm part below?

If no objection, I can spin the next version using tglx's.

> 
> 8<----------------
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -43,14 +43,14 @@ unsigned int x86_stepping(unsigned int s
>   #ifdef CONFIG_CPU_SUP_INTEL
>   extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
>   extern void switch_to_sld(unsigned long tifn);
> -extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern int handle_ac_split_lock(unsigned long ip);
>   extern void split_lock_validate_module_text(struct module *me, void *text, void *text_end);
>   #else
>   static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
>   static inline void switch_to_sld(unsigned long tifn) {}
> -static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +static int handle_ac_split_lock(unsigned long ip)
>   {
> -	return false;
> +	return -ENOSYS;
>   }
>   static inline void split_lock_validate_module_text(struct module *me, void *text, void *text_end) {}
>   #endif
> 
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1102,13 +1102,20 @@ static void split_lock_init(void)
>   	split_lock_verify_msr(sld_state != sld_off);
>   }
>   
> -bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +int handle_ac_split_lock(unsigned long ip)
>   {
> -	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> -		return false;
> +	switch (sld_state) {
> +	case sld_warn:
> +		break;
> +	case sld_off:
> +		pr_warn_once("#AC: Spurious trap at address: 0x%lx\n", ip);
> +		return -ENOSYS;
> +	case sld_fatal:
> +		return -EFAULT;
> +	}
>   
>   	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> -			    current->comm, current->pid, regs->ip);
> +			    current->comm, current->pid, ip);
>   
>   	/*
>   	 * Disable the split lock detection for this task so it can make
> @@ -1117,8 +1124,9 @@ bool handle_user_split_lock(struct pt_re
>   	 */
>   	sld_update_msr(false);
>   	set_tsk_thread_flag(current, TIF_SLD);
> -	return true;
> +	return 0;
>   }
> +EXPORT_SYMBOL_GPL(handle_ac_split_lock);
>   
>   /*
>    * This function is called only when switching between tasks with
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -304,7 +304,7 @@ dotraplinkage void do_alignment_check(st
>   
>   	local_irq_enable();
>   
> -	if (handle_user_split_lock(regs, error_code))
> +	if (!(regs->flags & X86_EFLAGS_AC) && !handle_ac_split_lock(regs->ip))
>   		return;
>   
>   	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -65,6 +65,7 @@
>   
>   MODULE_AUTHOR("Qumranet");
>   MODULE_LICENSE("GPL");
> +MODULE_INFO(sld_safe, "Y");
>   
>   #ifdef MODULE
>   static const struct x86_cpu_id vmx_cpu_id[] = {
> @@ -4623,6 +4624,22 @@ static int handle_machine_check(struct k
>   	return 1;
>   }
>   
> +static bool guest_handles_ac(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * If guest has alignment checking enabled in CR0 and activated in
> +	 * eflags, then the #AC originated from CPL3 and the guest is able
> +	 * to handle it. It does not matter whether this is a regular or
> +	 * a split lock operation induced #AC.
> +	 */
> +	if (vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
> +	    kvm_get_rflags(vcpu) & X86_EFLAGS_AC)
> +		return true;
> +
> +	/* Add guest SLD handling checks here once it's supported */
> +	return false;
> +}
> +
>   static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4630,6 +4647,7 @@ static int handle_exception_nmi(struct k
>   	u32 intr_info, ex_no, error_code;
>   	unsigned long cr2, rip, dr6;
>   	u32 vect_info;
> +	int err;
>   
>   	vect_info = vmx->idt_vectoring_info;
>   	intr_info = vmx->exit_intr_info;
> @@ -4688,9 +4706,6 @@ static int handle_exception_nmi(struct k
>   		return handle_rmode_exception(vcpu, ex_no, error_code);
>   
>   	switch (ex_no) {
> -	case AC_VECTOR:
> -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> -		return 1;
>   	case DB_VECTOR:
>   		dr6 = vmcs_readl(EXIT_QUALIFICATION);
>   		if (!(vcpu->guest_debug &
> @@ -4719,6 +4734,29 @@ static int handle_exception_nmi(struct k
>   		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>   		kvm_run->debug.arch.exception = ex_no;
>   		break;
> +	case AC_VECTOR:
> +		if (guest_handles_ac(vcpu)) {
> +			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> +			return 1;
> +		}
> +		/*
> +		 * Handle #AC caused by split lock detection. If the host
> +		 * mode is sld_warn, then it warns, marks current with
> +		 * TIF_SLD and disables split lock detection. So the guest
> +		 * can just continue.
> +		 *
> +		 * If the host mode is fatal, the handling code warned. Let
> +		 * qemu kill itself.
> +		 *
> +		 * If the host mode is off, then this #AC is bonkers and
> +		 * something is badly wrong. Let it fail as well.
> +		 */
> +		err = handle_ac_split_lock(kvm_rip_read(vcpu));
> +		if (!err)
> +			return 1;
> +		/* Propagate the error type to user space */
> +		error_code = err == -EFAULT ? 0x100 : 0x200;
> +		fallthrough;
>   	default:
>   		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
>   		kvm_run->ex.exception = ex_no;
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM
  2020-04-10  4:39             ` Xiaoyao Li
@ 2020-04-10 10:21               ` Paolo Bonzini
  0 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2020-04-10 10:21 UTC (permalink / raw)
  To: Xiaoyao Li, Thomas Gleixner, Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Fenghua Yu, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Peter Zijlstra, Jessica Yu,
	Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, kvm,
	linux-kernel

On 10/04/20 06:39, Xiaoyao Li wrote:
> 
>   +static bool guest_handles_ac(struct kvm_vcpu *vcpu)
> +{
> +    /*
> +     * If guest has alignment checking enabled in CR0 and activated in
> +     * eflags, then the #AC originated from CPL3 and the guest is able
> +     * to handle it. It does not matter whether this is a regular or
> +     * a split lock operation induced #AC.
> +     */
> +    if (vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
> +        kvm_get_rflags(vcpu) & X86_EFLAGS_AC)
> +        return true;
> +
> +    /* Add guest SLD handling checks here once it's supported */
> +    return false;
> +}
> +
>   static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   {
>       struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4630,6 +4647,7 @@ static int handle_exception_nmi(struct k
>       u32 intr_info, ex_no, error_code;
>       unsigned long cr2, rip, dr6;
>       u32 vect_info;
> +    int err;
>         vect_info = vmx->idt_vectoring_info;
>       intr_info = vmx->exit_intr_info;
> @@ -4688,9 +4706,6 @@ static int handle_exception_nmi(struct k
>           return handle_rmode_exception(vcpu, ex_no, error_code);
>         switch (ex_no) {
> -    case AC_VECTOR:
> -        kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> -        return 1;
>       case DB_VECTOR:
>           dr6 = vmcs_readl(EXIT_QUALIFICATION);
>           if (!(vcpu->guest_debug &
> @@ -4719,6 +4734,29 @@ static int handle_exception_nmi(struct k
>           kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>           kvm_run->debug.arch.exception = ex_no;
>           break;
> +    case AC_VECTOR:
> +        if (guest_handles_ac(vcpu)) {
> +            kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> +            return 1;
> +        }
> +        /*
> +         * Handle #AC caused by split lock detection. If the host
> +         * mode is sld_warn, then it warns, marks current with
> +         * TIF_SLD and disables split lock detection. So the guest
> +         * can just continue.
> +         *
> +         * If the host mode is fatal, the handling code warned. Let
> +         * qemu kill itself.
> +         *
> +         * If the host mode is off, then this #AC is bonkers and
> +         * something is badly wrong. Let it fail as well.
> +         */
> +        err = handle_ac_split_lock(kvm_rip_read(vcpu));
> +        if (!err)
> +            return 1;
> +        /* Propagate the error type to user space */
> +        error_code = err == -EFAULT ? 0x100 : 0x200;
> +        fallthrough;


Acked-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling
  2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
                       ` (2 preceding siblings ...)
  2020-04-02 15:55     ` [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest Sean Christopherson
@ 2020-04-10 10:23     ` Paolo Bonzini
  2020-04-10 11:14       ` Thomas Gleixner
  3 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2020-04-10 10:23 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner
  Cc: x86, Kenneth R . Crudup, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Peter Zijlstra, Jessica Yu,
	Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, kvm,
	linux-kernel

On 02/04/20 17:55, Sean Christopherson wrote:
> First three patches from Xiaoyao's series to add split-lock #AC support
> in KVM.
> 
> Xiaoyao Li (3):
>   KVM: x86: Emulate split-lock access as a write in emulator
>   x86/split_lock: Refactor and export handle_user_split_lock() for KVM
>   KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in
>     guest

Sorry I was out of the loop on this (I'm working part time and it's a
mess).  Sean, can you send the patches as a top-level message?  I'll
queue them and get them to Linus over the weekend.

Paolo


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling
  2020-04-10 10:23     ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Paolo Bonzini
@ 2020-04-10 11:14       ` Thomas Gleixner
  0 siblings, 0 replies; 78+ messages in thread
From: Thomas Gleixner @ 2020-04-10 11:14 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: x86, Kenneth R . Crudup, Fenghua Yu, Xiaoyao Li, Nadav Amit,
	Thomas Hellstrom, Tony Luck, Peter Zijlstra, Jessica Yu,
	Steven Rostedt, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, kvm,
	linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:
> On 02/04/20 17:55, Sean Christopherson wrote:
>> First three patches from Xiaoyao's series to add split-lock #AC support
>> in KVM.
>> 
>> Xiaoyao Li (3):
>>   KVM: x86: Emulate split-lock access as a write in emulator
>>   x86/split_lock: Refactor and export handle_user_split_lock() for KVM
>>   KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in
>>     guest
>
> Sorry I was out of the loop on this (I'm working part time and it's a
> mess).  Sean, can you send the patches as a top-level message?  I'll
> queue them and get them to Linus over the weekend.

I have a reworked version. I'll post it after lunch.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2020-04-10 11:15 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02 12:32 [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Thomas Gleixner
2020-04-02 12:32 ` [patch 1/2] x86,module: Detect VMX modules and disable Split-Lock-Detect Thomas Gleixner
2020-04-02 15:23   ` [patch v2 " Peter Zijlstra
2020-04-02 16:20     ` Xiaoyao Li
2020-04-02 16:25       ` Peter Zijlstra
2020-04-02 16:39         ` Nadav Amit
2020-04-02 16:41         ` Xiaoyao Li
2020-04-02 17:34           ` Thomas Gleixner
2020-04-02 17:51             ` Sean Christopherson
2020-04-02 18:51               ` Peter Zijlstra
2020-04-02 20:23                 ` Sean Christopherson
2020-04-02 21:04                   ` Thomas Gleixner
2020-04-02 21:16                     ` Sean Christopherson
2020-04-03  8:09     ` David Laight
2020-04-03 14:33       ` Peter Zijlstra
2020-04-02 23:42   ` [patch " Rasmus Villemoes
2020-04-03 14:35     ` Jessica Yu
2020-04-03 15:21       ` Peter Zijlstra
2020-04-03 16:01         ` Sean Christopherson
2020-04-03 16:12           ` Peter Zijlstra
2020-04-03 16:16             ` David Laight
2020-04-03 16:39               ` Peter Zijlstra
2020-04-03 16:25             ` Sean Christopherson
2020-04-03 16:40               ` Peter Zijlstra
2020-04-03 16:48                 ` Nadav Amit
2020-04-03 17:21                   ` Sean Christopherson
2020-04-03 18:53         ` Thomas Gleixner
2020-04-03 20:58           ` Andy Lutomirski
2020-04-03 21:49             ` Thomas Gleixner
2020-04-03 11:29   ` kbuild test robot
2020-04-03 11:29     ` [patch 1/2] x86, module: " kbuild test robot
2020-04-03 14:43   ` [patch 1/2] x86,module: " kbuild test robot
2020-04-03 14:43     ` [patch 1/2] x86, module: " kbuild test robot
2020-04-03 16:36   ` [patch 1/2] x86,module: " Sean Christopherson
2020-04-03 16:41     ` Peter Zijlstra
2020-04-03 18:35       ` Jessica Yu
2020-04-06 12:23   ` Christoph Hellwig
2020-04-06 14:40     ` Peter Zijlstra
2020-04-06 15:18       ` Christoph Hellwig
2020-04-06 15:22         ` Peter Zijlstra
2020-04-06 18:27           ` Steven Rostedt
2020-04-02 12:33 ` [patch 2/2] x86/kvm/vmx: Prevent split lock detection induced #AC wreckage Thomas Gleixner
2020-04-02 15:30   ` Sean Christopherson
2020-04-02 15:44     ` Nadav Amit
2020-04-02 16:04       ` Sean Christopherson
2020-04-02 16:56     ` Thomas Gleixner
2020-04-02 15:55   ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Sean Christopherson
2020-04-02 15:55     ` [PATCH 1/3] KVM: x86: Emulate split-lock access as a write in emulator Sean Christopherson
2020-04-02 15:55     ` [PATCH 2/3] x86/split_lock: Refactor and export handle_user_split_lock() for KVM Sean Christopherson
2020-04-02 17:01       ` Thomas Gleixner
2020-04-02 17:19         ` Sean Christopherson
2020-04-02 19:06           ` Thomas Gleixner
2020-04-10  4:39             ` Xiaoyao Li
2020-04-10 10:21               ` Paolo Bonzini
2020-04-02 15:55     ` [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest Sean Christopherson
2020-04-02 17:19       ` Thomas Gleixner
2020-04-02 17:40         ` Sean Christopherson
2020-04-02 20:07           ` Thomas Gleixner
2020-04-02 20:36             ` Andy Lutomirski
2020-04-02 20:48             ` Peter Zijlstra
2020-04-02 20:51             ` Sean Christopherson
2020-04-02 22:27               ` Thomas Gleixner
2020-04-02 22:40                 ` Nadav Amit
2020-04-02 23:03                   ` Thomas Gleixner
2020-04-02 23:08                   ` Steven Rostedt
2020-04-02 23:16                     ` Kenneth R. Crudup
2020-04-02 23:18                       ` Jim Mattson
2020-04-03 12:16                         ` Thomas Gleixner
2020-04-10 10:23     ` [PATCH 0/3] x86: KVM: VMX: Add basic split-lock #AC handling Paolo Bonzini
2020-04-10 11:14       ` Thomas Gleixner
2020-04-02 13:43 ` [patch 0/2] x86: Prevent Split-Lock-Detection wreckage on VMX hypervisors Kenneth R. Crudup
2020-04-02 14:32   ` Peter Zijlstra
2020-04-02 14:41     ` Kenneth R. Crudup
2020-04-02 14:46       ` Peter Zijlstra
2020-04-02 14:53         ` Kenneth R. Crudup
2020-04-02 14:37   ` Thomas Gleixner
2020-04-02 14:47     ` Nadav Amit
2020-04-02 15:11       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.