linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Zhong <yang.zhong@intel.com>
To: x86@kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	pbonzini@redhat.com
Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com,
	jing2.liu@linux.intel.com, jing2.liu@intel.com,
	yang.zhong@intel.com
Subject: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly
Date: Tue,  7 Dec 2021 19:03:55 -0500	[thread overview]
Message-ID: <20211208000359.2853257-16-yang.zhong@intel.com> (raw)
In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com>

KVM needs to save the guest XFD_ERR value before this register
might be accessed by the host and restore it before entering the
guest.

This implementation saves guest XFD_ERR in two transition points:

  - When the vCPU thread exits to the userspace VMM;
  - When the vCPU thread is preempted;

XFD_ERR is cleared to ZERO right after saving the previous guest
value. Otherwise a stale guest value may confuse the host #NM
handler to misinterpret a non-XFD-related #NM as XFD related.

There is no need to save the host XFD_ERR value because the only
place where XFD_ERR is consumed outside of KVM is in #NM handler
(which can not be preempted by a vCPU thread). XFD_ERR should
always be observed as ZER0 outside of #NM hanlder, thus clearing
XFD_ERR meets the host expectation here.

The saved guest value is restored to XFD_ERR right before entering
the guest (with preemption disabled).

Current implementation still has two opens which we would like
to hear suggestions:

  1) Will #NM be triggered in host kernel?

  Now the code is written assuming above is true, and it's the only
  reason for saving guest XFD_ERR at preemption time. Otherwise the
  save is only required when the CPU enters ring-3 (either from the
  vCPU itself or other threads), by leveraging the "user-return
  notifier" machinery as suggested by Paolo.

  2) When to enable XFD_ERR save/restore?

  There are four options on the table:

    a) As long as guest cpuid has xfd enabled

       XFD_ERR save/restore is enabled in every VM-exit (if preemption
       or ret-to-userspace happens)

    b) When the guest sets IA32_XFD to 1 for the first time

       Indicate that guest OS supports XFD features. Because guest OS
       usually initializes IA32_XFD at boot time, XFD_ERR save/restore
       is enabled for almost every VM-exit (if preemption or ret-to-
       userspace happens).

       No save/restore for legacy guest OS which doesn't support XFD
       features at all (thus won't touch IA32_XFD).

    c) When the guest sets IA32_XFD to 0 for the first time

       Lazily enabling XFD_ERR save/restore until XFD features are
       used inside guest. However, this option doesn't work because
       XFD_ERR is set when #NM is raised. An VM-exit could happen
       between CPU raising #NM and guest #NM handler reading XFD_ERR
       (before setting XFD to 0). The very first XFD_ERR might be
       already clobbered by the host due to no save/restore in that
       small window.

    d) When the 1st guest #NM with non-zero XFD_ERR occurs

       Lazily enabling XFD_ERR save/restore until XFD features are
       used inside guest. This requires intercepting guest #NM until
       non-zero XFD_ERR occurs. If a guest with XFD in cpuid never
       launches an AMX application, it implies that #NM is always
       trapped thus adding a constant overhead which may be even
       higher than doing RDMSR in preemption path in a) and b):

         #preempts < #VMEXITS (no #NM trap) < #VMEXITS (#NM trap)

       The number of preemptions and ret-to-userspaces should be a
       small portion of total #VMEXITs in a healthy virtualization
       environment. Our gut-feeling is that adding at most one MSR
       read and one MSR write to the preempt/user-ret paths is possibly
       more efficient than increasing #VMEXITs due to trapping #NM.

For above analysis we plan to go option b), although this version
currently implements a). But we would like to hear other suggestions
before making this change.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 arch/x86/kernel/fpu/core.c | 2 ++
 arch/x86/kvm/cpuid.c       | 5 +++++
 arch/x86/kvm/vmx/vmx.c     | 2 ++
 arch/x86/kvm/vmx/vmx.h     | 2 +-
 arch/x86/kvm/x86.c         | 5 +++++
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 5089f2e7dc22..9811dc98d550 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 	fpstate->is_guest	= true;
 
 	gfpu->fpstate		= fpstate;
+	gfpu->xfd_err           = XFD_ERR_GUEST_DISABLED;
 	gfpu->user_xfeatures	= fpu_user_cfg.default_features;
 	gfpu->user_perm		= fpu_user_cfg.default_features;
 	fpu_init_guest_permissions(gfpu);
@@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
 		fpu->fpstate = guest_fps;
 		guest_fps->in_use = true;
 	} else {
+		fpu_save_guest_xfd_err(guest_fpu);
 		guest_fps->in_use = false;
 		fpu->fpstate = fpu->__task_fpstate;
 		fpu->__task_fpstate = NULL;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f3c61205bbf4..ea51b986ee67 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -219,6 +219,11 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		kvm_apic_set_version(vcpu);
 	}
 
+	/* Enable saving guest XFD_ERR */
+	best = kvm_find_cpuid_entry(vcpu, 7, 0);
+	if (best && cpuid_entry_has(best, X86_FEATURE_AMX_TILE))
+		vcpu->arch.guest_fpu.xfd_err = 0;
+
 	best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
 	if (!best)
 		vcpu->arch.guest_supported_xcr0 = 0;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6198b13c4846..0db8bdf273e2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -161,6 +161,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
 	MSR_GS_BASE,
 	MSR_KERNEL_GS_BASE,
 	MSR_IA32_XFD,
+	MSR_IA32_XFD_ERR,
 #endif
 	MSR_IA32_SYSENTER_CS,
 	MSR_IA32_SYSENTER_ESP,
@@ -7153,6 +7154,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 static void vmx_update_intercept_xfd(struct kvm_vcpu *vcpu)
 {
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_R, false);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_RW, false);
 }
 
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bf9d3051cd6c..0a00242a91e7 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,7 +340,7 @@ struct vcpu_vmx {
 	struct lbr_desc lbr_desc;
 
 	/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS	14
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS	15
 	struct {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d127b229dd29..8b033c9241d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 		kvm_steal_time_set_preempted(vcpu);
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 
+	if (vcpu->preempted)
+		fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu);
+
 	static_call(kvm_x86_vcpu_put)(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
 }
@@ -9951,6 +9954,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		switch_fpu_return();
 
+	fpu_restore_guest_xfd_err(&vcpu->arch.guest_fpu);
+
 	if (unlikely(vcpu->arch.switch_db_regs)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);

  parent reply	other threads:[~2021-12-07 15:10 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08  0:03 [PATCH 00/19] AMX Support in KVM Yang Zhong
2021-12-08  0:03 ` [PATCH 01/19] x86/fpu: Extend prctl() with guest permissions Yang Zhong
2021-12-14  0:16   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 02/19] x86/fpu: Prepare KVM for dynamically enabled states Yang Zhong
2021-12-13  9:12   ` Paolo Bonzini
2021-12-13 12:00     ` Thomas Gleixner
2021-12-13 12:45       ` Paolo Bonzini
2021-12-13 19:50         ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 03/19] kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule Yang Zhong
2021-12-08  0:03 ` [PATCH 04/19] kvm: x86: Check guest xstate permissions when KVM_SET_CPUID2 Yang Zhong
2021-12-08  0:03 ` [PATCH 05/19] x86/fpu: Move xfd initialization out of __fpstate_reset() to the callers Yang Zhong
2021-12-10 22:33   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 06/19] x86/fpu: Add reallocation mechanims for KVM Yang Zhong
2021-12-08  0:03 ` [PATCH 07/19] kvm: x86: Propagate fpstate reallocation error to userspace Yang Zhong
2021-12-10 15:44   ` Paolo Bonzini
2021-12-08  0:03 ` [PATCH 08/19] x86/fpu: Move xfd_update_state() to xstate.c and export symbol Yang Zhong
2021-12-10 22:44   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 09/19] kvm: x86: Prepare reallocation check Yang Zhong
2021-12-13  9:16   ` Paolo Bonzini
2021-12-14  7:06     ` Tian, Kevin
2021-12-14 10:16       ` Paolo Bonzini
2021-12-14 14:41         ` Liu, Jing2
2021-12-15  7:09           ` Tian, Kevin
2021-12-08  0:03 ` [PATCH 10/19] kvm: x86: Emulate WRMSR of guest IA32_XFD Yang Zhong
2021-12-10 16:02   ` Paolo Bonzini
2021-12-13  7:51     ` Liu, Jing2
2021-12-13  9:01       ` Paolo Bonzini
2021-12-14 10:26     ` Yang Zhong
2021-12-14 11:24       ` Paolo Bonzini
2021-12-10 23:09   ` Thomas Gleixner
2021-12-13 15:06   ` Paolo Bonzini
2021-12-13 19:45     ` Thomas Gleixner
2021-12-13 21:23       ` Thomas Gleixner
2021-12-14  7:16         ` Tian, Kevin
2021-12-08  0:03 ` [PATCH 11/19] kvm: x86: Check fpstate reallocation in XSETBV emulation Yang Zhong
2021-12-08  0:03 ` [PATCH 12/19] x86/fpu: Prepare KVM for bringing XFD state back in-sync Yang Zhong
2021-12-10 23:11   ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 13/19] kvm: x86: Disable WRMSR interception for IA32_XFD on demand Yang Zhong
2021-12-08  7:23   ` Liu, Jing2
2021-12-08  0:03 ` [PATCH 14/19] x86/fpu: Prepare for KVM XFD_ERR handling Yang Zhong
2021-12-10 16:16   ` Paolo Bonzini
2021-12-10 23:20   ` Thomas Gleixner
2021-12-08  0:03 ` Yang Zhong [this message]
2021-12-10 16:23   ` [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly Paolo Bonzini
2021-12-10 22:01   ` Paolo Bonzini
2021-12-12 13:10     ` Yang Zhong
2021-12-11  0:10   ` Thomas Gleixner
2021-12-11  1:31     ` Paolo Bonzini
2021-12-11  3:23       ` Tian, Kevin
2021-12-11 13:10       ` Thomas Gleixner
2021-12-11  3:07     ` Tian, Kevin
2021-12-11 13:29       ` Thomas Gleixner
2021-12-12  1:50         ` Tian, Kevin
2021-12-12  9:10           ` Paolo Bonzini
2021-12-08  0:03 ` [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Yang Zhong
2021-12-10 16:25   ` Paolo Bonzini
2021-12-10 16:30   ` Paolo Bonzini
2021-12-10 22:13     ` Paolo Bonzini
2021-12-13  8:23       ` Wang, Wei W
2021-12-13  9:24         ` Paolo Bonzini
2021-12-14  6:06           ` Wang, Wei W
2021-12-14  6:18             ` Paolo Bonzini
2021-12-15  2:39               ` Wang, Wei W
2021-12-15 13:42                 ` Paolo Bonzini
2021-12-16  8:25                   ` Wang, Wei W
2021-12-16 10:28                     ` Paolo Bonzini
2021-12-20 17:54       ` State Component 18 and Palette 1 (Re: [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl) Nakajima, Jun
2021-12-22 14:44         ` Paolo Bonzini
2021-12-22 23:47           ` Nakajima, Jun
2021-12-22 14:52         ` Dave Hansen
2021-12-22 23:51           ` Nakajima, Jun
2021-12-13 10:10     ` [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Thomas Gleixner
2021-12-13 10:43       ` Paolo Bonzini
2021-12-13 12:40         ` Thomas Gleixner
2021-12-08  0:03 ` [PATCH 17/19] docs: virt: api.rst: Document the new KVM_{G, S}ET_XSAVE2 ioctls Yang Zhong
2021-12-08  0:03 ` [PATCH 18/19] kvm: x86: AMX XCR0 support for guest Yang Zhong
2021-12-10 16:30   ` Paolo Bonzini
2021-12-08  0:03 ` [PATCH 19/19] kvm: x86: Add AMX CPUIDs support Yang Zhong
2021-12-10 21:52   ` Paolo Bonzini
2021-12-11 21:20 ` [PATCH 00/19] AMX Support in KVM Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211208000359.2853257-16-yang.zhong@intel.com \
    --to=yang.zhong@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jing2.liu@intel.com \
    --cc=jing2.liu@linux.intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).