From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM Date: Mon, 8 Oct 2012 14:30:55 -0300 Message-ID: <20121008173055.GC12735@amt.cnet> References: <96EC5A4F3149B74492D2D9B9B1602C2728B61A0D@ORSMSX108.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "kvm@vger.kernel.org" , Avi Kivity , "Zhang, Xiantao" To: "Auld, Will" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:46903 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752690Ab2JHRd5 convert rfc822-to-8bit (ORCPT ); Mon, 8 Oct 2012 13:33:57 -0400 Content-Disposition: inline In-Reply-To: <96EC5A4F3149B74492D2D9B9B1602C2728B61A0D@ORSMSX108.amr.corp.intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote: > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 20= 01 > From: Will Auld > Date: Wed, 12 Sep 2012 18:10:56 -0700 > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM >=20 > CPUID.7.0.EBX[1]=3D1 indicates IA32_TSC_ADJUST MSR 0x3b is supported >=20 > Basic design is to emulate the MSR by allowing reads and writes to a = guest vcpu specific location to store the value of the emulated MSR whi= le adding the value to the vmcs tsc_offset. In this way the IA32_TSC_AD= JUST value will be included in all reads to the TSC MSR whether through= rdmsr or rdtsc. This is of course as long as the "use TSC counter offs= etting" VM-execution control is enabled as well as the IA32_TSC_ADJUST = control. >=20 > However, because hardware will only return the TSC + IA32_TSC_ADJUST = + vmsc tsc_offset for a guest process when it does and rdtsc (with the = correct settings) the value of our virtualized IA32_TSC_ADJUST must be = stored in one of these three locations. The argument against storing it= in the actual MSR is performance. This is likely to be seldom used whi= le the save/restore is required on every transition. IA32_TSC_ADJUST wa= s created as a way to solve some issues with writing TSC itself so that= is not an option either. The remaining option, defined above as our so= lution has the problem of returning incorrect vmcs tsc_offset values (u= nless we intercept and fix, not done here) as mentioned above. However,= more problematic is that storing the data in vmcs tsc_offset will have= a different semantic effect on the system than does using the actual M= SR. This is illustrated in the following example: The hypervisor set th= e IA32_TSC_ADJUST, then the guest sets it and a guest process perfor! > ms a rdtsc. In this case the guest process will get TSC + IA32_TSC_A= DJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. Whi= le the total system semantics changed the semantics as seen by the gues= t do not and hence this will not cause a problem. > --- > arch/x86/include/asm/cpufeature.h | 1 + > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/include/asm/msr-index.h | 1 + > arch/x86/kvm/cpuid.c | 4 ++-- > arch/x86/kvm/vmx.c | 12 ++++++++++++ > arch/x86/kvm/x86.c | 1 + > 6 files changed, 19 insertions(+), 2 deletions(-) >=20 > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm= /cpufeature.h > index 6b7ee5f..e574d81 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -199,6 +199,7 @@ > =20 > /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word = 9 */ > #define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE instruc= tions*/ > +#define X86_FEATURE_TSC_ADJUST (9*32+ 1) /* TSC adjustment MSR 0x3b= */ > #define X86_FEATURE_BMI1 (9*32+ 3) /* 1st group bit manipulation ext= ensions */ > #define X86_FEATURE_HLE (9*32+ 4) /* Hardware Lock Elision */ > #define X86_FEATURE_AVX2 (9*32+ 5) /* AVX2 instructions */ > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/k= vm_host.h > index 09155d6..8a001a4 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch { > u32 virtual_tsc_mult; > u32 virtual_tsc_khz; > =20 > + s64 tsc_adjust; > + > atomic_t nmi_queued; /* unprocessed asynchronous NMIs */ > unsigned nmi_pending; /* NMI queued after currently running handler= */ > bool nmi_injected; /* Trying to inject an NMI this entry */ > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/= msr-index.h > index 957ec87..8e82e29 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -231,6 +231,7 @@ > #define MSR_IA32_EBL_CR_POWERON 0x0000002a > #define MSR_EBC_FREQUENCY_ID 0x0000002c > #define MSR_IA32_FEATURE_CONTROL 0x0000003a > +#define MSR_TSC_ADJUST 0x0000003b > =20 > #define FEATURE_CONTROL_LOCKED (1<<0) > #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 0595f13..8f5943e 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *= entry, u32 function, > =20 > /* cpuid 7.0.ebx */ > const u32 kvm_supported_word9_x86_features =3D > - F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | > - F(BMI2) | F(ERMS) | f_invpcid | F(RTM); > + F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) | > + F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM); > =20 > /* all calls to cpuid_count() should be made on the same cpu */ > get_cpu(); > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index c00f03d..35d11b3 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u= 32 msr_index, u64 *pdata) > case MSR_IA32_SYSENTER_ESP: > data =3D vmcs_readl(GUEST_SYSENTER_ESP); > break; > + case MSR_TSC_ADJUST: > + data =3D (u64)vcpu->arch.tsc_adjust; > + break; > case MSR_TSC_AUX: > if (!to_vmx(vcpu)->rdtscp_enabled) > return 1; =46rom Intel's manual: =E2=80=A2 If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR a= dds (or subtracts) value X from the TSC, the logical processor also adds (or subtracts) value X from the IA32_TSC_ADJUST MSR. This is not handled in the patch.=20 To support migration, it will be necessary to differentiate between guest initiated and userspace-model initiated msr write. That is,=20 only guest initiated TSC writes should affect the value of=20 IA32_TSC_ADJUST MSR. Avi, any better idea? Will, please write a test case, see http://git.kernel.org/?p=3Dvirt/kvm/kvm-unit-tests.git;a=3Dsummary > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, = u32 msr_index, u64 data) > } > ret =3D kvm_set_msr_common(vcpu, msr_index, data); > break; > + case MSR_TSC_ADJUST: > +#define DUMMY 1 > + vmx_adjust_tsc_offset(vcpu, > + (s64)(data-vcpu->arch.tsc_adjust), > + (bool)DUMMY); > + vcpu->arch.tsc_adjust =3D (s64)data; > + break; > case MSR_TSC_AUX: > if (!vmx->rdtscp_enabled) > return 1; > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu= ) > =20 > vcpu->arch.regs_avail =3D ~((1 << VCPU_REGS_RIP) | (1 << VCPU_REGS_= RSP)); > =20 > + vcpu->arch.tsc_adjust =3D 0x0; > + > vmx->rmode.vm86_active =3D 0; > =20 > vmx->soft_vnmi_blocked =3D 0; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 42bce48..6c50f6c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] =3D { > static unsigned num_msrs_to_save; > =20 > static u32 emulated_msrs[] =3D { > + MSR_TSC_ADJUST, > MSR_IA32_TSCDEADLINE, > MSR_IA32_MISC_ENABLE, > MSR_IA32_MCG_STATUS, > --=20 > 1.7.1 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html