From: Yang Zhong <yang.zhong@intel.com>
To: x86@kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
pbonzini@redhat.com
Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com,
jing2.liu@linux.intel.com, jing2.liu@intel.com,
yang.zhong@intel.com
Subject: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly
Date: Tue, 7 Dec 2021 19:03:55 -0500 [thread overview]
Message-ID: <20211208000359.2853257-16-yang.zhong@intel.com> (raw)
In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com>
KVM needs to save the guest XFD_ERR value before this register
might be accessed by the host and restore it before entering the
guest.
This implementation saves guest XFD_ERR in two transition points:
- When the vCPU thread exits to the userspace VMM;
- When the vCPU thread is preempted;
XFD_ERR is cleared to ZERO right after saving the previous guest
value. Otherwise a stale guest value may confuse the host #NM
handler to misinterpret a non-XFD-related #NM as XFD related.
There is no need to save the host XFD_ERR value because the only
place where XFD_ERR is consumed outside of KVM is in #NM handler
(which can not be preempted by a vCPU thread). XFD_ERR should
always be observed as ZER0 outside of #NM hanlder, thus clearing
XFD_ERR meets the host expectation here.
The saved guest value is restored to XFD_ERR right before entering
the guest (with preemption disabled).
Current implementation still has two opens which we would like
to hear suggestions:
1) Will #NM be triggered in host kernel?
Now the code is written assuming above is true, and it's the only
reason for saving guest XFD_ERR at preemption time. Otherwise the
save is only required when the CPU enters ring-3 (either from the
vCPU itself or other threads), by leveraging the "user-return
notifier" machinery as suggested by Paolo.
2) When to enable XFD_ERR save/restore?
There are four options on the table:
a) As long as guest cpuid has xfd enabled
XFD_ERR save/restore is enabled in every VM-exit (if preemption
or ret-to-userspace happens)
b) When the guest sets IA32_XFD to 1 for the first time
Indicate that guest OS supports XFD features. Because guest OS
usually initializes IA32_XFD at boot time, XFD_ERR save/restore
is enabled for almost every VM-exit (if preemption or ret-to-
userspace happens).
No save/restore for legacy guest OS which doesn't support XFD
features at all (thus won't touch IA32_XFD).
c) When the guest sets IA32_XFD to 0 for the first time
Lazily enabling XFD_ERR save/restore until XFD features are
used inside guest. However, this option doesn't work because
XFD_ERR is set when #NM is raised. An VM-exit could happen
between CPU raising #NM and guest #NM handler reading XFD_ERR
(before setting XFD to 0). The very first XFD_ERR might be
already clobbered by the host due to no save/restore in that
small window.
d) When the 1st guest #NM with non-zero XFD_ERR occurs
Lazily enabling XFD_ERR save/restore until XFD features are
used inside guest. This requires intercepting guest #NM until
non-zero XFD_ERR occurs. If a guest with XFD in cpuid never
launches an AMX application, it implies that #NM is always
trapped thus adding a constant overhead which may be even
higher than doing RDMSR in preemption path in a) and b):
#preempts < #VMEXITS (no #NM trap) < #VMEXITS (#NM trap)
The number of preemptions and ret-to-userspaces should be a
small portion of total #VMEXITs in a healthy virtualization
environment. Our gut-feeling is that adding at most one MSR
read and one MSR write to the preempt/user-ret paths is possibly
more efficient than increasing #VMEXITs due to trapping #NM.
For above analysis we plan to go option b), although this version
currently implements a). But we would like to hear other suggestions
before making this change.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
arch/x86/kernel/fpu/core.c | 2 ++
arch/x86/kvm/cpuid.c | 5 +++++
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 5 +++++
5 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 5089f2e7dc22..9811dc98d550 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
fpstate->is_guest = true;
gfpu->fpstate = fpstate;
+ gfpu->xfd_err = XFD_ERR_GUEST_DISABLED;
gfpu->user_xfeatures = fpu_user_cfg.default_features;
gfpu->user_perm = fpu_user_cfg.default_features;
fpu_init_guest_permissions(gfpu);
@@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
fpu->fpstate = guest_fps;
guest_fps->in_use = true;
} else {
+ fpu_save_guest_xfd_err(guest_fpu);
guest_fps->in_use = false;
fpu->fpstate = fpu->__task_fpstate;
fpu->__task_fpstate = NULL;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f3c61205bbf4..ea51b986ee67 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -219,6 +219,11 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
kvm_apic_set_version(vcpu);
}
+ /* Enable saving guest XFD_ERR */
+ best = kvm_find_cpuid_entry(vcpu, 7, 0);
+ if (best && cpuid_entry_has(best, X86_FEATURE_AMX_TILE))
+ vcpu->arch.guest_fpu.xfd_err = 0;
+
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
if (!best)
vcpu->arch.guest_supported_xcr0 = 0;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6198b13c4846..0db8bdf273e2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -161,6 +161,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
MSR_GS_BASE,
MSR_KERNEL_GS_BASE,
MSR_IA32_XFD,
+ MSR_IA32_XFD_ERR,
#endif
MSR_IA32_SYSENTER_CS,
MSR_IA32_SYSENTER_ESP,
@@ -7153,6 +7154,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
static void vmx_update_intercept_xfd(struct kvm_vcpu *vcpu)
{
vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_R, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_RW, false);
}
static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bf9d3051cd6c..0a00242a91e7 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,7 +340,7 @@ struct vcpu_vmx {
struct lbr_desc lbr_desc;
/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS 14
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15
struct {
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d127b229dd29..8b033c9241d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_steal_time_set_preempted(vcpu);
srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ if (vcpu->preempted)
+ fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu);
+
static_call(kvm_x86_vcpu_put)(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
}
@@ -9951,6 +9954,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return();
+ fpu_restore_guest_xfd_err(&vcpu->arch.guest_fpu);
+
if (unlikely(vcpu->arch.switch_db_regs)) {
set_debugreg(0, 7);
set_debugreg(vcpu->arch.eff_db[0], 0);
next prev parent reply other threads:[~2021-12-07 15:10 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-08 0:03 [PATCH 00/19] AMX Support in KVM Yang Zhong
2021-12-08 0:03 ` [PATCH 01/19] x86/fpu: Extend prctl() with guest permissions Yang Zhong
2021-12-14 0:16 ` Thomas Gleixner
2021-12-08 0:03 ` [PATCH 02/19] x86/fpu: Prepare KVM for dynamically enabled states Yang Zhong
2021-12-13 9:12 ` Paolo Bonzini
2021-12-13 12:00 ` Thomas Gleixner
2021-12-13 12:45 ` Paolo Bonzini
2021-12-13 19:50 ` Thomas Gleixner
2021-12-08 0:03 ` [PATCH 03/19] kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule Yang Zhong
2021-12-08 0:03 ` [PATCH 04/19] kvm: x86: Check guest xstate permissions when KVM_SET_CPUID2 Yang Zhong
2021-12-08 0:03 ` [PATCH 05/19] x86/fpu: Move xfd initialization out of __fpstate_reset() to the callers Yang Zhong
2021-12-10 22:33 ` Thomas Gleixner
2021-12-08 0:03 ` [PATCH 06/19] x86/fpu: Add reallocation mechanims for KVM Yang Zhong
2021-12-08 0:03 ` [PATCH 07/19] kvm: x86: Propagate fpstate reallocation error to userspace Yang Zhong
2021-12-10 15:44 ` Paolo Bonzini
2021-12-08 0:03 ` [PATCH 08/19] x86/fpu: Move xfd_update_state() to xstate.c and export symbol Yang Zhong
2021-12-10 22:44 ` Thomas Gleixner
2021-12-08 0:03 ` [PATCH 09/19] kvm: x86: Prepare reallocation check Yang Zhong
2021-12-13 9:16 ` Paolo Bonzini
2021-12-14 7:06 ` Tian, Kevin
2021-12-14 10:16 ` Paolo Bonzini
2021-12-14 14:41 ` Liu, Jing2
2021-12-15 7:09 ` Tian, Kevin
2021-12-08 0:03 ` [PATCH 10/19] kvm: x86: Emulate WRMSR of guest IA32_XFD Yang Zhong
2021-12-10 16:02 ` Paolo Bonzini
2021-12-13 7:51 ` Liu, Jing2
2021-12-13 9:01 ` Paolo Bonzini
2021-12-14 10:26 ` Yang Zhong
2021-12-14 11:24 ` Paolo Bonzini
2021-12-10 23:09 ` Thomas Gleixner
2021-12-13 15:06 ` Paolo Bonzini
2021-12-13 19:45 ` Thomas Gleixner
2021-12-13 21:23 ` Thomas Gleixner
2021-12-14 7:16 ` Tian, Kevin
2021-12-08 0:03 ` [PATCH 11/19] kvm: x86: Check fpstate reallocation in XSETBV emulation Yang Zhong
2021-12-08 0:03 ` [PATCH 12/19] x86/fpu: Prepare KVM for bringing XFD state back in-sync Yang Zhong
2021-12-10 23:11 ` Thomas Gleixner
2021-12-08 0:03 ` [PATCH 13/19] kvm: x86: Disable WRMSR interception for IA32_XFD on demand Yang Zhong
2021-12-08 7:23 ` Liu, Jing2
2021-12-08 0:03 ` [PATCH 14/19] x86/fpu: Prepare for KVM XFD_ERR handling Yang Zhong
2021-12-10 16:16 ` Paolo Bonzini
2021-12-10 23:20 ` Thomas Gleixner
2021-12-08 0:03 ` Yang Zhong [this message]
2021-12-10 16:23 ` [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly Paolo Bonzini
2021-12-10 22:01 ` Paolo Bonzini
2021-12-12 13:10 ` Yang Zhong
2021-12-11 0:10 ` Thomas Gleixner
2021-12-11 1:31 ` Paolo Bonzini
2021-12-11 3:23 ` Tian, Kevin
2021-12-11 13:10 ` Thomas Gleixner
2021-12-11 3:07 ` Tian, Kevin
2021-12-11 13:29 ` Thomas Gleixner
2021-12-12 1:50 ` Tian, Kevin
2021-12-12 9:10 ` Paolo Bonzini
2021-12-08 0:03 ` [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Yang Zhong
2021-12-10 16:25 ` Paolo Bonzini
2021-12-10 16:30 ` Paolo Bonzini
2021-12-10 22:13 ` Paolo Bonzini
2021-12-13 8:23 ` Wang, Wei W
2021-12-13 9:24 ` Paolo Bonzini
2021-12-14 6:06 ` Wang, Wei W
2021-12-14 6:18 ` Paolo Bonzini
2021-12-15 2:39 ` Wang, Wei W
2021-12-15 13:42 ` Paolo Bonzini
2021-12-16 8:25 ` Wang, Wei W
2021-12-16 10:28 ` Paolo Bonzini
2021-12-20 17:54 ` State Component 18 and Palette 1 (Re: [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl) Nakajima, Jun
2021-12-22 14:44 ` Paolo Bonzini
2021-12-22 23:47 ` Nakajima, Jun
2021-12-22 14:52 ` Dave Hansen
2021-12-22 23:51 ` Nakajima, Jun
2021-12-13 10:10 ` [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Thomas Gleixner
2021-12-13 10:43 ` Paolo Bonzini
2021-12-13 12:40 ` Thomas Gleixner
2021-12-08 0:03 ` [PATCH 17/19] docs: virt: api.rst: Document the new KVM_{G, S}ET_XSAVE2 ioctls Yang Zhong
2021-12-08 0:03 ` [PATCH 18/19] kvm: x86: AMX XCR0 support for guest Yang Zhong
2021-12-10 16:30 ` Paolo Bonzini
2021-12-08 0:03 ` [PATCH 19/19] kvm: x86: Add AMX CPUIDs support Yang Zhong
2021-12-10 21:52 ` Paolo Bonzini
2021-12-11 21:20 ` [PATCH 00/19] AMX Support in KVM Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211208000359.2853257-16-yang.zhong@intel.com \
--to=yang.zhong@intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=jing2.liu@intel.com \
--cc=jing2.liu@linux.intel.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).