linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Zhong <yang.zhong@intel.com>
To: x86@kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	pbonzini@redhat.com
Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com,
	jing2.liu@linux.intel.com, jing2.liu@intel.com,
	yang.zhong@intel.com
Subject: [PATCH 13/19] kvm: x86: Disable WRMSR interception for IA32_XFD on demand
Date: Tue,  7 Dec 2021 19:03:53 -0500	[thread overview]
Message-ID: <20211208000359.2853257-14-yang.zhong@intel.com> (raw)
In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com>

From: Jing Liu <jing2.liu@intel.com>

Always intercepting IA32_XFD causes non-negligible overhead when
this register is updated frequently in the guest.

Disable WRMSR interception to IA32_XFD after fpstate reallocation
is completed. There are three options for when to disable the
interception:

  1) When emulating the 1st WRMSR which requires reallocation,
     disable interception before exiting to userapce with the
     assumption that the userspace VMM should not bounch back to
     the kernel if reallocation fails. However it's not good to
     design kernel based on application behavior. If due to bug
     the vCPU thread comes back to the kernel after reallocation
     fails, XFD passthrough may lead to host memory corruption
     when doing XSAVES for guest fpstate which has a smaller size
     than what guest XFD allows.

  2) Disable interception when coming back from the userspace VMM
     (for the 1st WRMSR which triggers reallocation). Re-check
     whether fpstate size can serve the new guest XFD value. Disable
     interception only when the check succeeds. This requires KVM
     to store guest XFD value in some place and then compare it
     to guest_fpu::user_xfeatures in the completion handler.

  3) Disable interception at the 2nd WRMSR which enables dynamic
     XSTATE features. If guest_fpu::user_xfeatures already includes
     bits for dynamic features set in guest XFD value, disable
     interception.

Currently 3) is implemented, with a flow like below:

    (G) WRMSR(IA32_XFD) which enables AMX for the FIRST time
    --trap to host--
    (HK) Emulate WRMSR and find fpstate size too small
    (HK) Reallocate fpstate
    --exit to userspace--
    (HU) do nothing
    --back to kernel via kvm_run--
    (HK) complete WRMSR emulation
    --enter guest--
    (G) do something
    (G) WRMSR(IA32_XFD) which disables AMX
    --trap to host--
    (HK) Emulate WRMSR and disable AMX in IA32_XFD
    --enter guest--
    (G) do something
    (G) WRMSR(IA32_XFD) which enables AMX for the SECOND time
    --trap to host--
    (HK) Emulate WRMSR and find fpstate size sufficient for AMX
    (HK) Disable WRMSR interception for IA32_XFD
    --enter guest--
    (G) WRMSR(IA32_XFD)
    (G) WRMSR(IA32_XFD)
    (G) WRMSR(IA32_XFD)
    ...

After disabling WRMSR interception, the guest directly updates
IA32_XFD which becomes out-of-sync with the host-side software
state (guest_fpstate::xfd and per-cpu xfd cache). This requires
KVM to call xfd_sync_state() to bring the software state in
sync with IA32_XFD register after VM-exit (before preemption
happens or exiting to userspace).

p.s. We have confirmed that SDM is being revised to say that
when setting IA32_XFD[18] the AMX register state is not
guaranteed to be preserved. This clarification avoids adding
mess for a creative guest which sets IA32_XFD[18]=1 before saving
active AMX state to its own storage.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/vmx/vmx.c             | 10 ++++++++++
 arch/x86/kvm/vmx/vmx.h             |  2 +-
 arch/x86/kvm/x86.c                 |  7 +++++++
 5 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index cefe1d81e2e8..60c27f9990e9 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -30,6 +30,7 @@ KVM_X86_OP(update_exception_bitmap)
 KVM_X86_OP(get_msr)
 KVM_X86_OP(set_msr)
 KVM_X86_OP(get_segment_base)
+KVM_X86_OP_NULL(set_xfd_passthrough)
 KVM_X86_OP(get_segment)
 KVM_X86_OP(get_cpl)
 KVM_X86_OP(set_segment)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ac61f85e07b..7c97cc1fea89 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -640,6 +640,7 @@ struct kvm_vcpu_arch {
 	u64 smi_count;
 	bool tpr_access_reporting;
 	bool xsaves_enabled;
+	bool xfd_out_of_sync;
 	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
@@ -1328,6 +1329,7 @@ struct kvm_x86_ops {
 	void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
 	int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
 	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
+	void (*set_xfd_passthrough)(struct kvm_vcpu *vcpu);
 	u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
 	void (*get_segment)(struct kvm_vcpu *vcpu,
 			    struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 971d60980d5b..6198b13c4846 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -160,6 +160,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
 	MSR_FS_BASE,
 	MSR_GS_BASE,
 	MSR_KERNEL_GS_BASE,
+	MSR_IA32_XFD,
 #endif
 	MSR_IA32_SYSENTER_CS,
 	MSR_IA32_SYSENTER_ESP,
@@ -1924,6 +1925,14 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
 	return debugctl;
 }
 
+#ifdef CONFIG_X86_64
+static void vmx_set_xfd_passthrough(struct kvm_vcpu *vcpu)
+{
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
+	vcpu->arch.xfd_out_of_sync = true;
+}
+#endif
+
 /*
  * Writes msr value into the appropriate "register".
  * Returns 0 on success, non-0 otherwise.
@@ -7657,6 +7666,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 #ifdef CONFIG_X86_64
 	.set_hv_timer = vmx_set_hv_timer,
 	.cancel_hv_timer = vmx_cancel_hv_timer,
+	.set_xfd_passthrough = vmx_set_xfd_passthrough,
 #endif
 
 	.setup_mce = vmx_setup_mce,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 4df2ac24ffc1..bf9d3051cd6c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,7 +340,7 @@ struct vcpu_vmx {
 	struct lbr_desc lbr_desc;
 
 	/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS	13
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS	14
 	struct {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b195f4fa888f..d127b229dd29 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -974,6 +974,10 @@ bool kvm_check_guest_realloc_fpstate(struct kvm_vcpu *vcpu, u64 xfd)
 			vcpu->arch.guest_fpu.realloc_request = request;
 			return true;
 		}
+
+		/* Disable WRMSR interception if possible */
+		if (kvm_x86_ops.set_xfd_passthrough)
+			static_call(kvm_x86_set_xfd_passthrough)(vcpu);
 	}
 
 	return false;
@@ -10002,6 +10006,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (hw_breakpoint_active())
 		hw_breakpoint_restore();
 
+	if (vcpu->arch.xfd_out_of_sync)
+		xfd_sync_state();
+
 	vcpu->arch.last_vmentry_cpu = vcpu->cpu;
 	vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 

  parent reply	other threads:[~2021-12-07 15:10 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08  0:03 [PATCH 00/19] AMX Support in KVM Yang Zhong
2021-12-08  0:03 ` [PATCH 01/19] x86/fpu: Extend prctl() with guest permissions Yang Zhong
2021-12-14  0:16   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 02/19] x86/fpu: Prepare KVM for dynamically enabled states Yang Zhong
2021-12-13  9:12   ` Paolo Bonzini
2021-12-13 12:00     ` Thomas Gleixner
2021-12-13 12:45       ` Paolo Bonzini
2021-12-13 19:50         ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 03/19] kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule Yang Zhong
2021-12-08  0:03 ` [PATCH 04/19] kvm: x86: Check guest xstate permissions when KVM_SET_CPUID2 Yang Zhong
2021-12-08  0:03 ` [PATCH 05/19] x86/fpu: Move xfd initialization out of __fpstate_reset() to the callers Yang Zhong
2021-12-10 22:33   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 06/19] x86/fpu: Add reallocation mechanims for KVM Yang Zhong
2021-12-08  0:03 ` [PATCH 07/19] kvm: x86: Propagate fpstate reallocation error to userspace Yang Zhong
2021-12-10 15:44   ` Paolo Bonzini
2021-12-08  0:03 ` [PATCH 08/19] x86/fpu: Move xfd_update_state() to xstate.c and export symbol Yang Zhong
2021-12-10 22:44   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 09/19] kvm: x86: Prepare reallocation check Yang Zhong
2021-12-13  9:16   ` Paolo Bonzini
2021-12-14  7:06     ` Tian, Kevin
2021-12-14 10:16       ` Paolo Bonzini
2021-12-14 14:41         ` Liu, Jing2
2021-12-15  7:09           ` Tian, Kevin
2021-12-08  0:03 ` [PATCH 10/19] kvm: x86: Emulate WRMSR of guest IA32_XFD Yang Zhong
2021-12-10 16:02   ` Paolo Bonzini
2021-12-13  7:51     ` Liu, Jing2
2021-12-13  9:01       ` Paolo Bonzini
2021-12-14 10:26     ` Yang Zhong
2021-12-14 11:24       ` Paolo Bonzini
2021-12-10 23:09   ` Thomas Gleixner
2021-12-13 15:06   ` Paolo Bonzini
2021-12-13 19:45     ` Thomas Gleixner
2021-12-13 21:23       ` Thomas Gleixner
2021-12-14  7:16         ` Tian, Kevin
2021-12-08  0:03 ` [PATCH 11/19] kvm: x86: Check fpstate reallocation in XSETBV emulation Yang Zhong
2021-12-08  0:03 ` [PATCH 12/19] x86/fpu: Prepare KVM for bringing XFD state back in-sync Yang Zhong
2021-12-10 23:11   ` Thomas Gleixner
2021-12-08  0:03 ` Yang Zhong [this message]
2021-12-08  7:23   ` [PATCH 13/19] kvm: x86: Disable WRMSR interception for IA32_XFD on demand Liu, Jing2
2021-12-08  0:03 ` [PATCH 14/19] x86/fpu: Prepare for KVM XFD_ERR handling Yang Zhong
2021-12-10 16:16   ` Paolo Bonzini
2021-12-10 23:20   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly Yang Zhong
2021-12-10 16:23   ` Paolo Bonzini
2021-12-10 22:01   ` Paolo Bonzini
2021-12-12 13:10     ` Yang Zhong
2021-12-11  0:10   ` Thomas Gleixner
2021-12-11  1:31     ` Paolo Bonzini
2021-12-11  3:23       ` Tian, Kevin
2021-12-11 13:10       ` Thomas Gleixner
2021-12-11  3:07     ` Tian, Kevin
2021-12-11 13:29       ` Thomas Gleixner
2021-12-12  1:50         ` Tian, Kevin
2021-12-12  9:10           ` Paolo Bonzini
2021-12-08  0:03 ` [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Yang Zhong
2021-12-10 16:25   ` Paolo Bonzini
2021-12-10 16:30   ` Paolo Bonzini
2021-12-10 22:13     ` Paolo Bonzini
2021-12-13  8:23       ` Wang, Wei W
2021-12-13  9:24         ` Paolo Bonzini
2021-12-14  6:06           ` Wang, Wei W
2021-12-14  6:18             ` Paolo Bonzini
2021-12-15  2:39               ` Wang, Wei W
2021-12-15 13:42                 ` Paolo Bonzini
2021-12-16  8:25                   ` Wang, Wei W
2021-12-16 10:28                     ` Paolo Bonzini
2021-12-20 17:54       ` State Component 18 and Palette 1 (Re: [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl) Nakajima, Jun
2021-12-22 14:44         ` Paolo Bonzini
2021-12-22 23:47           ` Nakajima, Jun
2021-12-22 14:52         ` Dave Hansen
2021-12-22 23:51           ` Nakajima, Jun
2021-12-13 10:10     ` [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Thomas Gleixner
2021-12-13 10:43       ` Paolo Bonzini
2021-12-13 12:40         ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 17/19] docs: virt: api.rst: Document the new KVM_{G, S}ET_XSAVE2 ioctls Yang Zhong
2021-12-08  0:03 ` [PATCH 18/19] kvm: x86: AMX XCR0 support for guest Yang Zhong
2021-12-10 16:30   ` Paolo Bonzini
2021-12-08  0:03 ` [PATCH 19/19] kvm: x86: Add AMX CPUIDs support Yang Zhong
2021-12-10 21:52   ` Paolo Bonzini
2021-12-11 21:20 ` [PATCH 00/19] AMX Support in KVM Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211208000359.2853257-14-yang.zhong@intel.com \
    --to=yang.zhong@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jing2.liu@intel.com \
    --cc=jing2.liu@linux.intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).