All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Enabling IA32_TSC_ADJUST for guest VM
@ 2012-09-19 17:44 Auld, Will
  2012-09-20 11:13 ` Avi Kivity
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Auld, Will @ 2012-09-19 17:44 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, Auld, Will, Zhang, Xiantao

>From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 2001
From: Will Auld <will.auld@intel.com>
Date: Wed, 12 Sep 2012 18:10:56 -0700
Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.

However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic
  effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem.
---
 arch/x86/include/asm/cpufeature.h |    1 +
 arch/x86/include/asm/kvm_host.h   |    2 ++
 arch/x86/include/asm/msr-index.h  |    1 +
 arch/x86/kvm/cpuid.c              |    4 ++--
 arch/x86/kvm/vmx.c                |   12 ++++++++++++
 arch/x86/kvm/x86.c                |    1 +
 6 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 6b7ee5f..e574d81 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -199,6 +199,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
 #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
+#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b */
 #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
 #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
 #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..8a001a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
 	u32 virtual_tsc_mult;
 	u32 virtual_tsc_khz;
 
+	s64 tsc_adjust;
+
 	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
 	unsigned nmi_pending; /* NMI queued after currently running handler */
 	bool nmi_injected;    /* Trying to inject an NMI this entry */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 957ec87..8e82e29 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -231,6 +231,7 @@
 #define MSR_IA32_EBL_CR_POWERON		0x0000002a
 #define MSR_EBC_FREQUENCY_ID		0x0000002c
 #define MSR_IA32_FEATURE_CONTROL        0x0000003a
+#define MSR_TSC_ADJUST				0x0000003b
 
 #define FEATURE_CONTROL_LOCKED				(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0595f13..8f5943e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.ebx */
 	const u32 kvm_supported_word9_x86_features =
-		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
-		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
+		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
+		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c00f03d..35d11b3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 	case MSR_IA32_SYSENTER_ESP:
 		data = vmcs_readl(GUEST_SYSENTER_ESP);
 		break;
+	case MSR_TSC_ADJUST:
+		data = (u64)vcpu->arch.tsc_adjust;
+		break;
 	case MSR_TSC_AUX:
 		if (!to_vmx(vcpu)->rdtscp_enabled)
 			return 1;
@@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 		}
 		ret = kvm_set_msr_common(vcpu, msr_index, data);
 		break;
+	case MSR_TSC_ADJUST:
+#define DUMMY 1
+		vmx_adjust_tsc_offset(vcpu,
+				(s64)(data-vcpu->arch.tsc_adjust),
+				(bool)DUMMY);
+		vcpu->arch.tsc_adjust = (s64)data;
+		break;
 	case MSR_TSC_AUX:
 		if (!vmx->rdtscp_enabled)
 			return 1;
@@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << VCPU_REGS_RSP));
 
+	vcpu->arch.tsc_adjust = 0x0;
+
 	vmx->rmode.vm86_active = 0;
 
 	vmx->soft_vnmi_blocked = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..6c50f6c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {
 static unsigned num_msrs_to_save;
 
 static u32 emulated_msrs[] = {
+	MSR_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
 	MSR_IA32_MISC_ENABLE,
 	MSR_IA32_MCG_STATUS,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-19 17:44 [PATCH] Enabling IA32_TSC_ADJUST for guest VM Auld, Will
@ 2012-09-20 11:13 ` Avi Kivity
  2012-09-20 11:15 ` Avi Kivity
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2012-09-20 11:13 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Zhang, Xiantao

On 09/19/2012 08:44 PM, Auld, Will wrote:
> From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 2001
> From: Will Auld <will.auld@intel.com>
> Date: Wed, 12 Sep 2012 18:10:56 -0700
> Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> 
> Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> 
> However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semant
 ic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process perf!
 or!
>  ms a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem.
> +++ b/arch/x86/kvm/cpuid.c
> @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  
>  	/* cpuid 7.0.ebx */
>  	const u32 kvm_supported_word9_x86_features =
> -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
>  

You're exposing this feature unconditionally, but part of the
implementation is in vmx.c.  This means that if an AMD processor arrives
that implements the feature, we will expose the feature even though we
lack some of the implementation.

So we need to mask the feature here based on a callback from kvm_x86_ops.

>  	/* all calls to cpuid_count() should be made on the same cpu */
>  	get_cpu();
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c00f03d..35d11b3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  	case MSR_IA32_SYSENTER_ESP:
>  		data = vmcs_readl(GUEST_SYSENTER_ESP);
>  		break;
> +	case MSR_TSC_ADJUST:
> +		data = (u64)vcpu->arch.tsc_adjust;
> +		break;

Can be moved to common code.

>  	case MSR_TSC_AUX:
>  		if (!to_vmx(vcpu)->rdtscp_enabled)
>  			return 1;
> @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  		}
>  		ret = kvm_set_msr_common(vcpu, msr_index, data);
>  		break;
> +	case MSR_TSC_ADJUST:
> +#define DUMMY 1

What is this?

> +		vmx_adjust_tsc_offset(vcpu,
> +				(s64)(data-vcpu->arch.tsc_adjust),

Cast unneeded; space between operands please.

> +				(bool)DUMMY);
> +		vcpu->arch.tsc_adjust = (s64)data;

Cast is unneeded.

> +		break;
>  	case MSR_TSC_AUX:
>  		if (!vmx->rdtscp_enabled)
>  			return 1;
> @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << VCPU_REGS_RSP));
>  
> +	vcpu->arch.tsc_adjust = 0x0;
> +

Can be moved to common code.

>  	vmx->rmode.vm86_active = 0;
>  

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-19 17:44 [PATCH] Enabling IA32_TSC_ADJUST for guest VM Auld, Will
  2012-09-20 11:13 ` Avi Kivity
@ 2012-09-20 11:15 ` Avi Kivity
  2012-09-26 21:34 ` Marcelo Tosatti
  2012-10-08 17:30 ` Marcelo Tosatti
  3 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2012-09-20 11:15 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Zhang, Xiantao

On 09/19/2012 08:44 PM, Auld, Will wrote:
> @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  		}
>  		ret = kvm_set_msr_common(vcpu, msr_index, data);
>  		break;
> +	case MSR_TSC_ADJUST:
> +#define DUMMY 1
> +		vmx_adjust_tsc_offset(vcpu,
> +				(s64)(data-vcpu->arch.tsc_adjust),
> +				(bool)DUMMY);
> +		vcpu->arch.tsc_adjust = (s64)data;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!vmx->rdtscp_enabled)
>  			return 1;

Writes to MSR_IA32_TSC also need to adjust MSR_IA32_TSC_ADJUST.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-19 17:44 [PATCH] Enabling IA32_TSC_ADJUST for guest VM Auld, Will
  2012-09-20 11:13 ` Avi Kivity
  2012-09-20 11:15 ` Avi Kivity
@ 2012-09-26 21:34 ` Marcelo Tosatti
  2012-09-26 22:58   ` Auld, Will
  2012-10-08 17:30 ` Marcelo Tosatti
  3 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-09-26 21:34 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao

On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 2001
> From: Will Auld <will.auld@intel.com>
> Date: Wed, 12 Sep 2012 18:10:56 -0700
> Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> 
> Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> 
> However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.

The purpose of the IA32_TSC_ADJUST control is to make it easier 
for the operating system (host) to decrease the delta between cores
to an acceptable value, so that applications can make use of direct
RDTSC, correct?

Why is it necessary for the guests to make use of such interface,
if the hypervisor could provide proper TSC?

(not against exposing it to the guests, just thinking out loud).

That is, if the purpose of the IA32_TSC_ADJUST is to provide
proper synchronized TSC across cores, and newer guests which
should already make use of paravirt clock interface, what 
is the point of exposing the feature?

>  The argument against storing it in the actual MSR is performance.
>  This is likely to be seldom used while the save/restore is required
>  on every transition. IA32_TSC_ADJUST was created as a way to solve
>  some issues with writing TSC itself so that is not an option
>  either. The remaining option, defined above as our solution has
>  the problem of returning incorrect vmcs tsc_offset values (unless
>  we intercept and fix, not done here) as mentioned above. However,
>  more problematic is that storing the data in vmcs tsc_offset will
>  have a different semantic effect on the system than does using
>  the actual MSR. This is illustrated in the following example: The
>  hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a
>  guest process perfor! ms a rdtsc. In this case the guest process will
>  get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
>  IA32_TSC_ADJUST_guest. While the total system semantics changed the
>  semantics as seen by the guest do not and hence this will not cause a
>  problem.
> ---

>  arch/x86/include/asm/cpufeature.h |    1 +
>  arch/x86/include/asm/kvm_host.h   |    2 ++
>  arch/x86/include/asm/msr-index.h  |    1 +
>  arch/x86/kvm/cpuid.c              |    4 ++--
>  arch/x86/kvm/vmx.c                |   12 ++++++++++++
>  arch/x86/kvm/x86.c                |    1 +
>  6 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 6b7ee5f..e574d81 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -199,6 +199,7 @@
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
>  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b */
>  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
>  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
>  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09155d6..8a001a4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
>  	u32 virtual_tsc_mult;
>  	u32 virtual_tsc_khz;
>  
> +	s64 tsc_adjust;
> +
>  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
>  	unsigned nmi_pending; /* NMI queued after currently running handler */
>  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 957ec87..8e82e29 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -231,6 +231,7 @@
>  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
>  #define MSR_EBC_FREQUENCY_ID		0x0000002c
>  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> +#define MSR_TSC_ADJUST				0x0000003b
>  
>  #define FEATURE_CONTROL_LOCKED				(1<<0)
>  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0595f13..8f5943e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  
>  	/* cpuid 7.0.ebx */
>  	const u32 kvm_supported_word9_x86_features =
> -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
>  
>  	/* all calls to cpuid_count() should be made on the same cpu */
>  	get_cpu();
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c00f03d..35d11b3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  	case MSR_IA32_SYSENTER_ESP:
>  		data = vmcs_readl(GUEST_SYSENTER_ESP);
>  		break;
> +	case MSR_TSC_ADJUST:
> +		data = (u64)vcpu->arch.tsc_adjust;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!to_vmx(vcpu)->rdtscp_enabled)
>  			return 1;
> @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  		}
>  		ret = kvm_set_msr_common(vcpu, msr_index, data);
>  		break;
> +	case MSR_TSC_ADJUST:
> +#define DUMMY 1
> +		vmx_adjust_tsc_offset(vcpu,
> +				(s64)(data-vcpu->arch.tsc_adjust),
> +				(bool)DUMMY);
> +		vcpu->arch.tsc_adjust = (s64)data;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!vmx->rdtscp_enabled)
>  			return 1;
> @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << VCPU_REGS_RSP));
>  
> +	vcpu->arch.tsc_adjust = 0x0;
> +
>  	vmx->rmode.vm86_active = 0;
>  
>  	vmx->soft_vnmi_blocked = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 42bce48..6c50f6c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {
>  static unsigned num_msrs_to_save;
>  
>  static u32 emulated_msrs[] = {
> +	MSR_TSC_ADJUST,
>  	MSR_IA32_TSCDEADLINE,
>  	MSR_IA32_MISC_ENABLE,
>  	MSR_IA32_MCG_STATUS,
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-26 21:34 ` Marcelo Tosatti
@ 2012-09-26 22:58   ` Auld, Will
  2012-09-27  0:29     ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Auld, Will @ 2012-09-26 22:58 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Auld, Will, Liu, Jinsong

Avi, Still working on your suggestions.

Marcelo,

The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 

Thanks,

Will

>The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
>
>Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
>
>(not against exposing it to the guests, just thinking out loud).
>
>That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?

-----Original Message-----
From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
Sent: Wednesday, September 26, 2012 2:35 PM
To: Auld, Will
Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 
> >2001
> From: Will Auld <will.auld@intel.com>
> Date: Wed, 12 Sep 2012 18:10:56 -0700
> Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> 
> Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> 
> However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.

The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?

Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?

(not against exposing it to the guests, just thinking out loud).

That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?

>  The argument against storing it in the actual MSR is performance.
>  This is likely to be seldom used while the save/restore is required  
> on every transition. IA32_TSC_ADJUST was created as a way to solve  
> some issues with writing TSC itself so that is not an option  either. 
> The remaining option, defined above as our solution has  the problem 
> of returning incorrect vmcs tsc_offset values (unless  we intercept 
> and fix, not done here) as mentioned above. However,  more problematic 
> is that storing the data in vmcs tsc_offset will  have a different 
> semantic effect on the system than does using  the actual MSR. This is 
> illustrated in the following example: The  hypervisor set the 
> IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor! 
> ms a rdtsc. In this case the guest process will  get TSC + 
> IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including  
> IA32_TSC_ADJUST_guest. While the total system semantics changed the  
> semantics as seen by the guest do not and hence this will not cause a  
> problem.
> ---

>  arch/x86/include/asm/cpufeature.h |    1 +
>  arch/x86/include/asm/kvm_host.h   |    2 ++
>  arch/x86/include/asm/msr-index.h  |    1 +
>  arch/x86/kvm/cpuid.c              |    4 ++--
>  arch/x86/kvm/vmx.c                |   12 ++++++++++++
>  arch/x86/kvm/x86.c                |    1 +
>  6 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index 6b7ee5f..e574d81 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -199,6 +199,7 @@
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
>  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b 
> +*/
>  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
>  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
>  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> diff --git a/arch/x86/include/asm/kvm_host.h 
> b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
>  	u32 virtual_tsc_mult;
>  	u32 virtual_tsc_khz;
>  
> +	s64 tsc_adjust;
> +
>  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
>  	unsigned nmi_pending; /* NMI queued after currently running handler */
>  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> diff --git a/arch/x86/include/asm/msr-index.h 
> b/arch/x86/include/asm/msr-index.h
> index 957ec87..8e82e29 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -231,6 +231,7 @@
>  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
>  #define MSR_EBC_FREQUENCY_ID		0x0000002c
>  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> +#define MSR_TSC_ADJUST				0x0000003b
>  
>  #define FEATURE_CONTROL_LOCKED				(1<<0)
>  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> 0595f13..8f5943e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>  
>  	/* cpuid 7.0.ebx */
>  	const u32 kvm_supported_word9_x86_features =
> -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
>  
>  	/* all calls to cpuid_count() should be made on the same cpu */
>  	get_cpu();
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 
> c00f03d..35d11b3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  	case MSR_IA32_SYSENTER_ESP:
>  		data = vmcs_readl(GUEST_SYSENTER_ESP);
>  		break;
> +	case MSR_TSC_ADJUST:
> +		data = (u64)vcpu->arch.tsc_adjust;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!to_vmx(vcpu)->rdtscp_enabled)
>  			return 1;
> @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  		}
>  		ret = kvm_set_msr_common(vcpu, msr_index, data);
>  		break;
> +	case MSR_TSC_ADJUST:
> +#define DUMMY 1
> +		vmx_adjust_tsc_offset(vcpu,
> +				(s64)(data-vcpu->arch.tsc_adjust),
> +				(bool)DUMMY);
> +		vcpu->arch.tsc_adjust = (s64)data;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!vmx->rdtscp_enabled)
>  			return 1;
> @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> VCPU_REGS_RSP));
>  
> +	vcpu->arch.tsc_adjust = 0x0;
> +
>  	vmx->rmode.vm86_active = 0;
>  
>  	vmx->soft_vnmi_blocked = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> 42bce48..6c50f6c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static unsigned 
> num_msrs_to_save;
>  
>  static u32 emulated_msrs[] = {
> +	MSR_TSC_ADJUST,
>  	MSR_IA32_TSCDEADLINE,
>  	MSR_IA32_MISC_ENABLE,
>  	MSR_IA32_MCG_STATUS,
> --
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the 
> body of a message to majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-26 22:58   ` Auld, Will
@ 2012-09-27  0:29     ` Marcelo Tosatti
  2012-09-27  0:30       ` Marcelo Tosatti
  2012-09-27  0:50       ` Auld, Will
  0 siblings, 2 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-09-27  0:29 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong

On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> Avi, Still working on your suggestions.
> 
> Marcelo,
> 
> The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 

Will, you can choose to not expose the feature. Correct?

Because this conflicts with the model that has been envisioned and
developed by Zachary... for that model to continue to be functional
you'll have to make sure the TSC emulation is adjusted accordingly to
consider IA32_TSC_ADJUST (for example, when trapping TSC).

>From that point of view, the patch below is incomplete.

... or KVM can choose to never expose the feature via CPUID and handle
TSC consistency itself (i understand your perspective of getting a task
complete, but unfortunately from my POV its not so simple).

> Thanks,
> 
> Will
> 
> >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> >
> >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> >
> >(not against exposing it to the guests, just thinking out loud).
> >
> >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> 
> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
> Sent: Wednesday, September 26, 2012 2:35 PM
> To: Auld, Will
> Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 
> > >2001
> > From: Will Auld <will.auld@intel.com>
> > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> > 
> > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > 
> > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> 
> The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> 
> Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> 
> (not against exposing it to the guests, just thinking out loud).
> 
> That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> 
> >  The argument against storing it in the actual MSR is performance.
> >  This is likely to be seldom used while the save/restore is required  
> > on every transition. IA32_TSC_ADJUST was created as a way to solve  
> > some issues with writing TSC itself so that is not an option  either. 
> > The remaining option, defined above as our solution has  the problem 
> > of returning incorrect vmcs tsc_offset values (unless  we intercept 
> > and fix, not done here) as mentioned above. However,  more problematic 
> > is that storing the data in vmcs tsc_offset will  have a different 
> > semantic effect on the system than does using  the actual MSR. This is 
> > illustrated in the following example: The  hypervisor set the 
> > IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor! 
> > ms a rdtsc. In this case the guest process will  get TSC + 
> > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including  
> > IA32_TSC_ADJUST_guest. While the total system semantics changed the  
> > semantics as seen by the guest do not and hence this will not cause a  
> > problem.
> > ---
> 
> >  arch/x86/include/asm/cpufeature.h |    1 +
> >  arch/x86/include/asm/kvm_host.h   |    2 ++
> >  arch/x86/include/asm/msr-index.h  |    1 +
> >  arch/x86/kvm/cpuid.c              |    4 ++--
> >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> >  arch/x86/kvm/x86.c                |    1 +
> >  6 files changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h 
> > b/arch/x86/include/asm/cpufeature.h
> > index 6b7ee5f..e574d81 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -199,6 +199,7 @@
> >  
> >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b 
> > +*/
> >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> >  	u32 virtual_tsc_mult;
> >  	u32 virtual_tsc_khz;
> >  
> > +	s64 tsc_adjust;
> > +
> >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > diff --git a/arch/x86/include/asm/msr-index.h 
> > b/arch/x86/include/asm/msr-index.h
> > index 957ec87..8e82e29 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -231,6 +231,7 @@
> >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > +#define MSR_TSC_ADJUST				0x0000003b
> >  
> >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > 0595f13..8f5943e 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 
> > *entry, u32 function,
> >  
> >  	/* cpuid 7.0.ebx */
> >  	const u32 kvm_supported_word9_x86_features =
> > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> >  
> >  	/* all calls to cpuid_count() should be made on the same cpu */
> >  	get_cpu();
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 
> > c00f03d..35d11b3 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >  	case MSR_IA32_SYSENTER_ESP:
> >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> >  		break;
> > +	case MSR_TSC_ADJUST:
> > +		data = (u64)vcpu->arch.tsc_adjust;
> > +		break;
> >  	case MSR_TSC_AUX:
> >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> >  			return 1;
> > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> >  		}
> >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> >  		break;
> > +	case MSR_TSC_ADJUST:
> > +#define DUMMY 1
> > +		vmx_adjust_tsc_offset(vcpu,
> > +				(s64)(data-vcpu->arch.tsc_adjust),
> > +				(bool)DUMMY);
> > +		vcpu->arch.tsc_adjust = (s64)data;
> > +		break;
> >  	case MSR_TSC_AUX:
> >  		if (!vmx->rdtscp_enabled)
> >  			return 1;
> > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> >  
> >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > VCPU_REGS_RSP));
> >  
> > +	vcpu->arch.tsc_adjust = 0x0;
> > +
> >  	vmx->rmode.vm86_active = 0;
> >  
> >  	vmx->soft_vnmi_blocked = 0;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > 42bce48..6c50f6c 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static unsigned 
> > num_msrs_to_save;
> >  
> >  static u32 emulated_msrs[] = {
> > +	MSR_TSC_ADJUST,
> >  	MSR_IA32_TSCDEADLINE,
> >  	MSR_IA32_MISC_ENABLE,
> >  	MSR_IA32_MCG_STATUS,
> > --
> > 1.7.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in the 
> > body of a message to majordomo@vger.kernel.org More majordomo info at  
> > http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-27  0:29     ` Marcelo Tosatti
@ 2012-09-27  0:30       ` Marcelo Tosatti
  2012-09-27  0:50       ` Auld, Will
  1 sibling, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-09-27  0:30 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong

On Wed, Sep 26, 2012 at 09:29:29PM -0300, Marcelo Tosatti wrote:
> On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> > Avi, Still working on your suggestions.
> > 
> > Marcelo,
> > 
> > The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 
> 
> Will, you can choose to not expose the feature. Correct?
> 
> Because this conflicts with the model that has been envisioned and
> developed by Zachary... for that model to continue to be functional
> you'll have to make sure the TSC emulation is adjusted accordingly to
> consider IA32_TSC_ADJUST (for example, when trapping TSC).
> 
> From that point of view, the patch below is incomplete.
> 
> ... or KVM can choose to never expose the feature via CPUID and handle
> TSC consistency itself (i understand your perspective of getting a task
> complete, but unfortunately from my POV its not so simple).

BTW, do we have patches for the Linux host to make use of this?

Is there anyone from Intel working on them?

> 
> > Thanks,
> > 
> > Will
> > 
> > >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> > >
> > >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> > >
> > >(not against exposing it to the guests, just thinking out loud).
> > >
> > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> > 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
> > Sent: Wednesday, September 26, 2012 2:35 PM
> > To: Auld, Will
> > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 
> > > >2001
> > > From: Will Auld <will.auld@intel.com>
> > > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > 
> > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> > > 
> > > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > > 
> > > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> > 
> > The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> > 
> > Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> > 
> > (not against exposing it to the guests, just thinking out loud).
> > 
> > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> > 
> > >  The argument against storing it in the actual MSR is performance.
> > >  This is likely to be seldom used while the save/restore is required  
> > > on every transition. IA32_TSC_ADJUST was created as a way to solve  
> > > some issues with writing TSC itself so that is not an option  either. 
> > > The remaining option, defined above as our solution has  the problem 
> > > of returning incorrect vmcs tsc_offset values (unless  we intercept 
> > > and fix, not done here) as mentioned above. However,  more problematic 
> > > is that storing the data in vmcs tsc_offset will  have a different 
> > > semantic effect on the system than does using  the actual MSR. This is 
> > > illustrated in the following example: The  hypervisor set the 
> > > IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor! 
> > > ms a rdtsc. In this case the guest process will  get TSC + 
> > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including  
> > > IA32_TSC_ADJUST_guest. While the total system semantics changed the  
> > > semantics as seen by the guest do not and hence this will not cause a  
> > > problem.
> > > ---
> > 
> > >  arch/x86/include/asm/cpufeature.h |    1 +
> > >  arch/x86/include/asm/kvm_host.h   |    2 ++
> > >  arch/x86/include/asm/msr-index.h  |    1 +
> > >  arch/x86/kvm/cpuid.c              |    4 ++--
> > >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> > >  arch/x86/kvm/x86.c                |    1 +
> > >  6 files changed, 19 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/cpufeature.h 
> > > b/arch/x86/include/asm/cpufeature.h
> > > index 6b7ee5f..e574d81 100644
> > > --- a/arch/x86/include/asm/cpufeature.h
> > > +++ b/arch/x86/include/asm/cpufeature.h
> > > @@ -199,6 +199,7 @@
> > >  
> > >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> > >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b 
> > > +*/
> > >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> > >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> > >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > >  	u32 virtual_tsc_mult;
> > >  	u32 virtual_tsc_khz;
> > >  
> > > +	s64 tsc_adjust;
> > > +
> > >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> > >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > > diff --git a/arch/x86/include/asm/msr-index.h 
> > > b/arch/x86/include/asm/msr-index.h
> > > index 957ec87..8e82e29 100644
> > > --- a/arch/x86/include/asm/msr-index.h
> > > +++ b/arch/x86/include/asm/msr-index.h
> > > @@ -231,6 +231,7 @@
> > >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> > >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> > >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > > +#define MSR_TSC_ADJUST				0x0000003b
> > >  
> > >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> > >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > > 0595f13..8f5943e 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 
> > > *entry, u32 function,
> > >  
> > >  	/* cpuid 7.0.ebx */
> > >  	const u32 kvm_supported_word9_x86_features =
> > > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > >  
> > >  	/* all calls to cpuid_count() should be made on the same cpu */
> > >  	get_cpu();
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 
> > > c00f03d..35d11b3 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> > >  	case MSR_IA32_SYSENTER_ESP:
> > >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> > >  		break;
> > > +	case MSR_TSC_ADJUST:
> > > +		data = (u64)vcpu->arch.tsc_adjust;
> > > +		break;
> > >  	case MSR_TSC_AUX:
> > >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> > >  			return 1;
> > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > >  		}
> > >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> > >  		break;
> > > +	case MSR_TSC_ADJUST:
> > > +#define DUMMY 1
> > > +		vmx_adjust_tsc_offset(vcpu,
> > > +				(s64)(data-vcpu->arch.tsc_adjust),
> > > +				(bool)DUMMY);
> > > +		vcpu->arch.tsc_adjust = (s64)data;
> > > +		break;
> > >  	case MSR_TSC_AUX:
> > >  		if (!vmx->rdtscp_enabled)
> > >  			return 1;
> > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> > >  
> > >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > > VCPU_REGS_RSP));
> > >  
> > > +	vcpu->arch.tsc_adjust = 0x0;
> > > +
> > >  	vmx->rmode.vm86_active = 0;
> > >  
> > >  	vmx->soft_vnmi_blocked = 0;
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > 42bce48..6c50f6c 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static unsigned 
> > > num_msrs_to_save;
> > >  
> > >  static u32 emulated_msrs[] = {
> > > +	MSR_TSC_ADJUST,
> > >  	MSR_IA32_TSCDEADLINE,
> > >  	MSR_IA32_MISC_ENABLE,
> > >  	MSR_IA32_MCG_STATUS,
> > > --
> > > 1.7.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in the 
> > > body of a message to majordomo@vger.kernel.org More majordomo info at  
> > > http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-27  0:29     ` Marcelo Tosatti
  2012-09-27  0:30       ` Marcelo Tosatti
@ 2012-09-27  0:50       ` Auld, Will
  2012-09-27 11:31         ` Marcelo Tosatti
  1 sibling, 1 reply; 23+ messages in thread
From: Auld, Will @ 2012-09-27  0:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong, Auld, Will

Marcelo,

I think I am missing something. There should be no needed changes to current algorithms that exist today. Does it seem that I have broken Zachary's implementation somehow?

Thanks,

Will

-----Original Message-----
From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
Sent: Wednesday, September 26, 2012 5:29 PM
To: Auld, Will
Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> Avi, Still working on your suggestions.
> 
> Marcelo,
> 
> The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 

Will, you can choose to not expose the feature. Correct?

Because this conflicts with the model that has been envisioned and developed by Zachary... for that model to continue to be functional you'll have to make sure the TSC emulation is adjusted accordingly to consider IA32_TSC_ADJUST (for example, when trapping TSC).

>From that point of view, the patch below is incomplete.

... or KVM can choose to never expose the feature via CPUID and handle TSC consistency itself (i understand your perspective of getting a task complete, but unfortunately from my POV its not so simple).

> Thanks,
> 
> Will
> 
> >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> >
> >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> >
> >(not against exposing it to the guests, just thinking out loud).
> >
> >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> 
> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Wednesday, September 26, 2012 2:35 PM
> To: Auld, Will
> Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00
> > >2001
> > From: Will Auld <will.auld@intel.com>
> > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> > 
> > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > 
> > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> 
> The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> 
> Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> 
> (not against exposing it to the guests, just thinking out loud).
> 
> That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> 
> >  The argument against storing it in the actual MSR is performance.
> >  This is likely to be seldom used while the save/restore is required 
> > on every transition. IA32_TSC_ADJUST was created as a way to solve 
> > some issues with writing TSC itself so that is not an option  either.
> > The remaining option, defined above as our solution has  the problem 
> > of returning incorrect vmcs tsc_offset values (unless  we intercept 
> > and fix, not done here) as mentioned above. However,  more 
> > problematic is that storing the data in vmcs tsc_offset will  have a 
> > different semantic effect on the system than does using  the actual 
> > MSR. This is illustrated in the following example: The  hypervisor 
> > set the IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor!
> > ms a rdtsc. In this case the guest process will  get TSC + 
> > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including 
> > IA32_TSC_ADJUST_guest. While the total system semantics changed the 
> > semantics as seen by the guest do not and hence this will not cause 
> > a problem.
> > ---
> 
> >  arch/x86/include/asm/cpufeature.h |    1 +
> >  arch/x86/include/asm/kvm_host.h   |    2 ++
> >  arch/x86/include/asm/msr-index.h  |    1 +
> >  arch/x86/kvm/cpuid.c              |    4 ++--
> >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> >  arch/x86/kvm/x86.c                |    1 +
> >  6 files changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h
> > b/arch/x86/include/asm/cpufeature.h
> > index 6b7ee5f..e574d81 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -199,6 +199,7 @@
> >  
> >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 
> > +0x3b */
> >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> >  	u32 virtual_tsc_mult;
> >  	u32 virtual_tsc_khz;
> >  
> > +	s64 tsc_adjust;
> > +
> >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index 957ec87..8e82e29 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -231,6 +231,7 @@
> >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > +#define MSR_TSC_ADJUST				0x0000003b
> >  
> >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > 0595f13..8f5943e 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 
> > *entry, u32 function,
> >  
> >  	/* cpuid 7.0.ebx */
> >  	const u32 kvm_supported_word9_x86_features =
> > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> >  
> >  	/* all calls to cpuid_count() should be made on the same cpu */
> >  	get_cpu();
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > c00f03d..35d11b3 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >  	case MSR_IA32_SYSENTER_ESP:
> >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> >  		break;
> > +	case MSR_TSC_ADJUST:
> > +		data = (u64)vcpu->arch.tsc_adjust;
> > +		break;
> >  	case MSR_TSC_AUX:
> >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> >  			return 1;
> > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> >  		}
> >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> >  		break;
> > +	case MSR_TSC_ADJUST:
> > +#define DUMMY 1
> > +		vmx_adjust_tsc_offset(vcpu,
> > +				(s64)(data-vcpu->arch.tsc_adjust),
> > +				(bool)DUMMY);
> > +		vcpu->arch.tsc_adjust = (s64)data;
> > +		break;
> >  	case MSR_TSC_AUX:
> >  		if (!vmx->rdtscp_enabled)
> >  			return 1;
> > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu 
> > *vcpu)
> >  
> >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > VCPU_REGS_RSP));
> >  
> > +	vcpu->arch.tsc_adjust = 0x0;
> > +
> >  	vmx->rmode.vm86_active = 0;
> >  
> >  	vmx->soft_vnmi_blocked = 0;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > 42bce48..6c50f6c 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static unsigned 
> > num_msrs_to_save;
> >  
> >  static u32 emulated_msrs[] = {
> > +	MSR_TSC_ADJUST,
> >  	MSR_IA32_TSCDEADLINE,
> >  	MSR_IA32_MISC_ENABLE,
> >  	MSR_IA32_MCG_STATUS,
> > --
> > 1.7.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in 
> > the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-27  0:50       ` Auld, Will
@ 2012-09-27 11:31         ` Marcelo Tosatti
  2012-09-27 11:48           ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-09-27 11:31 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong

On Thu, Sep 27, 2012 at 12:50:16AM +0000, Auld, Will wrote:
> Marcelo,
> 
> I think I am missing something. There should be no needed changes to current algorithms that exist today. Does it seem that I have broken Zachary's implementation somehow?

Yes. compute_guest_tsc() function must take ia32_tsc_adjust into
account. guest_read_tsc (and the SVM equivalent) also.


> 
> Thanks,
> 
> Will
> 
> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
> Sent: Wednesday, September 26, 2012 5:29 PM
> To: Auld, Will
> Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> > Avi, Still working on your suggestions.
> > 
> > Marcelo,
> > 
> > The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 
> 
> Will, you can choose to not expose the feature. Correct?
> 
> Because this conflicts with the model that has been envisioned and developed by Zachary... for that model to continue to be functional you'll have to make sure the TSC emulation is adjusted accordingly to consider IA32_TSC_ADJUST (for example, when trapping TSC).
> 
> >From that point of view, the patch below is incomplete.
> 
> ... or KVM can choose to never expose the feature via CPUID and handle TSC consistency itself (i understand your perspective of getting a task complete, but unfortunately from my POV its not so simple).
> 
> > Thanks,
> > 
> > Will
> > 
> > >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> > >
> > >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> > >
> > >(not against exposing it to the guests, just thinking out loud).
> > >
> > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> > 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Wednesday, September 26, 2012 2:35 PM
> > To: Auld, Will
> > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00
> > > >2001
> > > From: Will Auld <will.auld@intel.com>
> > > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > 
> > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> > > 
> > > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > > 
> > > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> > 
> > The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> > 
> > Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> > 
> > (not against exposing it to the guests, just thinking out loud).
> > 
> > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> > 
> > >  The argument against storing it in the actual MSR is performance.
> > >  This is likely to be seldom used while the save/restore is required 
> > > on every transition. IA32_TSC_ADJUST was created as a way to solve 
> > > some issues with writing TSC itself so that is not an option  either.
> > > The remaining option, defined above as our solution has  the problem 
> > > of returning incorrect vmcs tsc_offset values (unless  we intercept 
> > > and fix, not done here) as mentioned above. However,  more 
> > > problematic is that storing the data in vmcs tsc_offset will  have a 
> > > different semantic effect on the system than does using  the actual 
> > > MSR. This is illustrated in the following example: The  hypervisor 
> > > set the IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor!
> > > ms a rdtsc. In this case the guest process will  get TSC + 
> > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including 
> > > IA32_TSC_ADJUST_guest. While the total system semantics changed the 
> > > semantics as seen by the guest do not and hence this will not cause 
> > > a problem.
> > > ---
> > 
> > >  arch/x86/include/asm/cpufeature.h |    1 +
> > >  arch/x86/include/asm/kvm_host.h   |    2 ++
> > >  arch/x86/include/asm/msr-index.h  |    1 +
> > >  arch/x86/kvm/cpuid.c              |    4 ++--
> > >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> > >  arch/x86/kvm/x86.c                |    1 +
> > >  6 files changed, 19 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/cpufeature.h
> > > b/arch/x86/include/asm/cpufeature.h
> > > index 6b7ee5f..e574d81 100644
> > > --- a/arch/x86/include/asm/cpufeature.h
> > > +++ b/arch/x86/include/asm/cpufeature.h
> > > @@ -199,6 +199,7 @@
> > >  
> > >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> > >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 
> > > +0x3b */
> > >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> > >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> > >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > >  	u32 virtual_tsc_mult;
> > >  	u32 virtual_tsc_khz;
> > >  
> > > +	s64 tsc_adjust;
> > > +
> > >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> > >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > > diff --git a/arch/x86/include/asm/msr-index.h
> > > b/arch/x86/include/asm/msr-index.h
> > > index 957ec87..8e82e29 100644
> > > --- a/arch/x86/include/asm/msr-index.h
> > > +++ b/arch/x86/include/asm/msr-index.h
> > > @@ -231,6 +231,7 @@
> > >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> > >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> > >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > > +#define MSR_TSC_ADJUST				0x0000003b
> > >  
> > >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> > >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > > 0595f13..8f5943e 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 
> > > *entry, u32 function,
> > >  
> > >  	/* cpuid 7.0.ebx */
> > >  	const u32 kvm_supported_word9_x86_features =
> > > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > >  
> > >  	/* all calls to cpuid_count() should be made on the same cpu */
> > >  	get_cpu();
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > > c00f03d..35d11b3 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> > >  	case MSR_IA32_SYSENTER_ESP:
> > >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> > >  		break;
> > > +	case MSR_TSC_ADJUST:
> > > +		data = (u64)vcpu->arch.tsc_adjust;
> > > +		break;
> > >  	case MSR_TSC_AUX:
> > >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> > >  			return 1;
> > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > >  		}
> > >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> > >  		break;
> > > +	case MSR_TSC_ADJUST:
> > > +#define DUMMY 1
> > > +		vmx_adjust_tsc_offset(vcpu,
> > > +				(s64)(data-vcpu->arch.tsc_adjust),
> > > +				(bool)DUMMY);
> > > +		vcpu->arch.tsc_adjust = (s64)data;
> > > +		break;
> > >  	case MSR_TSC_AUX:
> > >  		if (!vmx->rdtscp_enabled)
> > >  			return 1;
> > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu 
> > > *vcpu)
> > >  
> > >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > > VCPU_REGS_RSP));
> > >  
> > > +	vcpu->arch.tsc_adjust = 0x0;
> > > +
> > >  	vmx->rmode.vm86_active = 0;
> > >  
> > >  	vmx->soft_vnmi_blocked = 0;
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > 42bce48..6c50f6c 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static unsigned 
> > > num_msrs_to_save;
> > >  
> > >  static u32 emulated_msrs[] = {
> > > +	MSR_TSC_ADJUST,
> > >  	MSR_IA32_TSCDEADLINE,
> > >  	MSR_IA32_MISC_ENABLE,
> > >  	MSR_IA32_MCG_STATUS,
> > > --
> > > 1.7.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in 
> > > the body of a message to majordomo@vger.kernel.org More majordomo 
> > > info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-27 11:31         ` Marcelo Tosatti
@ 2012-09-27 11:48           ` Marcelo Tosatti
  2012-09-28  2:07             ` Auld, Will
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-09-27 11:48 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong

On Thu, Sep 27, 2012 at 08:31:22AM -0300, Marcelo Tosatti wrote:
> On Thu, Sep 27, 2012 at 12:50:16AM +0000, Auld, Will wrote:
> > Marcelo,
> > 
> > I think I am missing something. There should be no needed changes to current algorithms that exist today. Does it seem that I have broken Zachary's implementation somehow?
> 
> Yes. compute_guest_tsc() function must take ia32_tsc_adjust into
> account. guest_read_tsc (and the SVM equivalent) also.

Also, must take into account VMX->SVM migration. In that case, you
should export IA32_TSC_ADJUST along with IA32_TSC MSR.

Which brings us back to the initial question, if there are other means
to provide stable TSC, why use this MSR? For example, VMWare guests have
no need to use this MSR (because the hypervisor provides TSC
guarantees).

Then we come back to the two questions: 

- Is there anyone from Intel working on the Linux host side, where it
  makes sense to use this?

- Are you sure its worthwhile to expose this to KVM guests?

> > 
> > Thanks,
> > 
> > Will
> > 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
> > Sent: Wednesday, September 26, 2012 5:29 PM
> > To: Auld, Will
> > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> > > Avi, Still working on your suggestions.
> > > 
> > > Marcelo,
> > > 
> > > The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 
> > 
> > Will, you can choose to not expose the feature. Correct?
> > 
> > Because this conflicts with the model that has been envisioned and developed by Zachary... for that model to continue to be functional you'll have to make sure the TSC emulation is adjusted accordingly to consider IA32_TSC_ADJUST (for example, when trapping TSC).
> > 
> > >From that point of view, the patch below is incomplete.
> > 
> > ... or KVM can choose to never expose the feature via CPUID and handle TSC consistency itself (i understand your perspective of getting a task complete, but unfortunately from my POV its not so simple).
> > 
> > > Thanks,
> > > 
> > > Will
> > > 
> > > >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> > > >
> > > >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> > > >
> > > >(not against exposing it to the guests, just thinking out loud).
> > > >
> > > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> > > 
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Wednesday, September 26, 2012 2:35 PM
> > > To: Auld, Will
> > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > 
> > > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00
> > > > >2001
> > > > From: Will Auld <will.auld@intel.com>
> > > > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > > 
> > > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> > > > 
> > > > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > > > 
> > > > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> > > 
> > > The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> > > 
> > > Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> > > 
> > > (not against exposing it to the guests, just thinking out loud).
> > > 
> > > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> > > 
> > > >  The argument against storing it in the actual MSR is performance.
> > > >  This is likely to be seldom used while the save/restore is required 
> > > > on every transition. IA32_TSC_ADJUST was created as a way to solve 
> > > > some issues with writing TSC itself so that is not an option  either.
> > > > The remaining option, defined above as our solution has  the problem 
> > > > of returning incorrect vmcs tsc_offset values (unless  we intercept 
> > > > and fix, not done here) as mentioned above. However,  more 
> > > > problematic is that storing the data in vmcs tsc_offset will  have a 
> > > > different semantic effect on the system than does using  the actual 
> > > > MSR. This is illustrated in the following example: The  hypervisor 
> > > > set the IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor!
> > > > ms a rdtsc. In this case the guest process will  get TSC + 
> > > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including 
> > > > IA32_TSC_ADJUST_guest. While the total system semantics changed the 
> > > > semantics as seen by the guest do not and hence this will not cause 
> > > > a problem.
> > > > ---
> > > 
> > > >  arch/x86/include/asm/cpufeature.h |    1 +
> > > >  arch/x86/include/asm/kvm_host.h   |    2 ++
> > > >  arch/x86/include/asm/msr-index.h  |    1 +
> > > >  arch/x86/kvm/cpuid.c              |    4 ++--
> > > >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> > > >  arch/x86/kvm/x86.c                |    1 +
> > > >  6 files changed, 19 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/cpufeature.h
> > > > b/arch/x86/include/asm/cpufeature.h
> > > > index 6b7ee5f..e574d81 100644
> > > > --- a/arch/x86/include/asm/cpufeature.h
> > > > +++ b/arch/x86/include/asm/cpufeature.h
> > > > @@ -199,6 +199,7 @@
> > > >  
> > > >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> > > >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > > > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 
> > > > +0x3b */
> > > >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> > > >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> > > >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > > >  	u32 virtual_tsc_mult;
> > > >  	u32 virtual_tsc_khz;
> > > >  
> > > > +	s64 tsc_adjust;
> > > > +
> > > >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > > >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> > > >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > b/arch/x86/include/asm/msr-index.h
> > > > index 957ec87..8e82e29 100644
> > > > --- a/arch/x86/include/asm/msr-index.h
> > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > @@ -231,6 +231,7 @@
> > > >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> > > >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> > > >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > > > +#define MSR_TSC_ADJUST				0x0000003b
> > > >  
> > > >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> > > >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > > > 0595f13..8f5943e 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 
> > > > *entry, u32 function,
> > > >  
> > > >  	/* cpuid 7.0.ebx */
> > > >  	const u32 kvm_supported_word9_x86_features =
> > > > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > > > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > > > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > >  
> > > >  	/* all calls to cpuid_count() should be made on the same cpu */
> > > >  	get_cpu();
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > > > c00f03d..35d11b3 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> > > >  	case MSR_IA32_SYSENTER_ESP:
> > > >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> > > >  		break;
> > > > +	case MSR_TSC_ADJUST:
> > > > +		data = (u64)vcpu->arch.tsc_adjust;
> > > > +		break;
> > > >  	case MSR_TSC_AUX:
> > > >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> > > >  			return 1;
> > > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > > >  		}
> > > >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> > > >  		break;
> > > > +	case MSR_TSC_ADJUST:
> > > > +#define DUMMY 1
> > > > +		vmx_adjust_tsc_offset(vcpu,
> > > > +				(s64)(data-vcpu->arch.tsc_adjust),
> > > > +				(bool)DUMMY);
> > > > +		vcpu->arch.tsc_adjust = (s64)data;
> > > > +		break;
> > > >  	case MSR_TSC_AUX:
> > > >  		if (!vmx->rdtscp_enabled)
> > > >  			return 1;
> > > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu 
> > > > *vcpu)
> > > >  
> > > >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > > > VCPU_REGS_RSP));
> > > >  
> > > > +	vcpu->arch.tsc_adjust = 0x0;
> > > > +
> > > >  	vmx->rmode.vm86_active = 0;
> > > >  
> > > >  	vmx->soft_vnmi_blocked = 0;
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > > 42bce48..6c50f6c 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static unsigned 
> > > > num_msrs_to_save;
> > > >  
> > > >  static u32 emulated_msrs[] = {
> > > > +	MSR_TSC_ADJUST,
> > > >  	MSR_IA32_TSCDEADLINE,
> > > >  	MSR_IA32_MISC_ENABLE,
> > > >  	MSR_IA32_MCG_STATUS,
> > > > --
> > > > 1.7.1
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in 
> > > > the body of a message to majordomo@vger.kernel.org More majordomo 
> > > > info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-27 11:48           ` Marcelo Tosatti
@ 2012-09-28  2:07             ` Auld, Will
  2012-09-28 13:24               ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Auld, Will @ 2012-09-28  2:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong, Auld, Will

Marcelo,

I tagged my comments below with "[auld]" to make it easier to read. 

Thanks,

Will

-----Original Message-----
From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
Sent: Thursday, September 27, 2012 4:49 AM
To: Auld, Will
Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

On Thu, Sep 27, 2012 at 08:31:22AM -0300, Marcelo Tosatti wrote:
> On Thu, Sep 27, 2012 at 12:50:16AM +0000, Auld, Will wrote:
> > Marcelo,
> > 
> > I think I am missing something. There should be no needed changes to current algorithms that exist today. Does it seem that I have broken Zachary's implementation somehow?
> 
> Yes. compute_guest_tsc() function must take ia32_tsc_adjust into 
> account. guest_read_tsc (and the SVM equivalent) also.

[auld] I don't see how that function is broken. 

Also, must take into account VMX->SVM migration. In that case, you should export IA32_TSC_ADJUST along with IA32_TSC MSR.

[auld] I'll give this more thought. Two different ways to go, allow this to only work on host processors with this feature or enable this for all VM independent of the underlying host processor capability. In the former case migrating cross architecture might be disallowed. In the later case sending only IA32_TSC on migration should be enough as the delta would be accounted for in tsc_offset of the control structure.

Which brings us back to the initial question, if there are other means to provide stable TSC, why use this MSR? For example, VMWare guests have no need to use this MSR (because the hypervisor provides TSC guarantees).

[auld] Using this MSR simplifies the process of synchronizing the tsc for each logical processor because its value does not change with the clock. How do you write the same value to all the IA32_TIME_STAMP_COUNTER MSR? Well, figure out what you want to write there, get all the processors to rendezvous at the same time, have all logical processors complete their writes in a very small amount of time. This is in contrast to deciding the offset to write and then having all the logical processors write the offset. No worries about rendezvous, synchronization of the writes in time and such.  

Then we come back to the two questions: 

- Is there anyone from Intel working on the Linux host side, where it
  makes sense to use this?

[auld] I am not aware of anyone working on this for Linux.

- Are you sure its worthwhile to expose this to KVM guests?

[auld] At least one OS is moving to implement this that is commonly used as a guest. 

> > 
> > Thanks,
> > 
> > Will
> > 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Wednesday, September 26, 2012 5:29 PM
> > To: Auld, Will
> > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> > > Avi, Still working on your suggestions.
> > > 
> > > Marcelo,
> > > 
> > > The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 
> > 
> > Will, you can choose to not expose the feature. Correct?
> > 
> > Because this conflicts with the model that has been envisioned and developed by Zachary... for that model to continue to be functional you'll have to make sure the TSC emulation is adjusted accordingly to consider IA32_TSC_ADJUST (for example, when trapping TSC).
> > 
> > >From that point of view, the patch below is incomplete.
> > 
> > ... or KVM can choose to never expose the feature via CPUID and handle TSC consistency itself (i understand your perspective of getting a task complete, but unfortunately from my POV its not so simple).
> > 
> > > Thanks,
> > > 
> > > Will
> > > 
> > > >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> > > >
> > > >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> > > >
> > > >(not against exposing it to the guests, just thinking out loud).
> > > >
> > > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> > > 
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Wednesday, September 26, 2012 2:35 PM
> > > To: Auld, Will
> > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > 
> > > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 
> > > > >00:00:00
> > > > >2001
> > > > From: Will Auld <will.auld@intel.com>
> > > > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > > 
> > > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is 
> > > > supported
> > > > 
> > > > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > > > 
> > > > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> > > 
> > > The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> > > 
> > > Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> > > 
> > > (not against exposing it to the guests, just thinking out loud).
> > > 
> > > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> > > 
> > > >  The argument against storing it in the actual MSR is performance.
> > > >  This is likely to be seldom used while the save/restore is 
> > > > required on every transition. IA32_TSC_ADJUST was created as a 
> > > > way to solve some issues with writing TSC itself so that is not an option  either.
> > > > The remaining option, defined above as our solution has  the 
> > > > problem of returning incorrect vmcs tsc_offset values (unless  
> > > > we intercept and fix, not done here) as mentioned above. 
> > > > However,  more problematic is that storing the data in vmcs 
> > > > tsc_offset will  have a different semantic effect on the system 
> > > > than does using  the actual MSR. This is illustrated in the 
> > > > following example: The  hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor!
> > > > ms a rdtsc. In this case the guest process will  get TSC + 
> > > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including 
> > > > IA32_TSC_ADJUST_guest. While the total system semantics changed 
> > > > the semantics as seen by the guest do not and hence this will 
> > > > not cause a problem.
> > > > ---
> > > 
> > > >  arch/x86/include/asm/cpufeature.h |    1 +
> > > >  arch/x86/include/asm/kvm_host.h   |    2 ++
> > > >  arch/x86/include/asm/msr-index.h  |    1 +
> > > >  arch/x86/kvm/cpuid.c              |    4 ++--
> > > >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> > > >  arch/x86/kvm/x86.c                |    1 +
> > > >  6 files changed, 19 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/cpufeature.h
> > > > b/arch/x86/include/asm/cpufeature.h
> > > > index 6b7ee5f..e574d81 100644
> > > > --- a/arch/x86/include/asm/cpufeature.h
> > > > +++ b/arch/x86/include/asm/cpufeature.h
> > > > @@ -199,6 +199,7 @@
> > > >  
> > > >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> > > >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > > > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 
> > > > +0x3b */
> > > >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> > > >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> > > >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > > >  	u32 virtual_tsc_mult;
> > > >  	u32 virtual_tsc_khz;
> > > >  
> > > > +	s64 tsc_adjust;
> > > > +
> > > >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > > >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> > > >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > b/arch/x86/include/asm/msr-index.h
> > > > index 957ec87..8e82e29 100644
> > > > --- a/arch/x86/include/asm/msr-index.h
> > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > @@ -231,6 +231,7 @@
> > > >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> > > >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> > > >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > > > +#define MSR_TSC_ADJUST				0x0000003b
> > > >  
> > > >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> > > >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > > > 0595f13..8f5943e 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct 
> > > > kvm_cpuid_entry2 *entry, u32 function,
> > > >  
> > > >  	/* cpuid 7.0.ebx */
> > > >  	const u32 kvm_supported_word9_x86_features =
> > > > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > > > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > > > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > >  
> > > >  	/* all calls to cpuid_count() should be made on the same cpu */
> > > >  	get_cpu();
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > > > c00f03d..35d11b3 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> > > >  	case MSR_IA32_SYSENTER_ESP:
> > > >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> > > >  		break;
> > > > +	case MSR_TSC_ADJUST:
> > > > +		data = (u64)vcpu->arch.tsc_adjust;
> > > > +		break;
> > > >  	case MSR_TSC_AUX:
> > > >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> > > >  			return 1;
> > > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > > >  		}
> > > >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> > > >  		break;
> > > > +	case MSR_TSC_ADJUST:
> > > > +#define DUMMY 1
> > > > +		vmx_adjust_tsc_offset(vcpu,
> > > > +				(s64)(data-vcpu->arch.tsc_adjust),
> > > > +				(bool)DUMMY);
> > > > +		vcpu->arch.tsc_adjust = (s64)data;
> > > > +		break;
> > > >  	case MSR_TSC_AUX:
> > > >  		if (!vmx->rdtscp_enabled)
> > > >  			return 1;
> > > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu
> > > > *vcpu)
> > > >  
> > > >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > > > VCPU_REGS_RSP));
> > > >  
> > > > +	vcpu->arch.tsc_adjust = 0x0;
> > > > +
> > > >  	vmx->rmode.vm86_active = 0;
> > > >  
> > > >  	vmx->soft_vnmi_blocked = 0;
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > > 42bce48..6c50f6c 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static 
> > > > unsigned num_msrs_to_save;
> > > >  
> > > >  static u32 emulated_msrs[] = {
> > > > +	MSR_TSC_ADJUST,
> > > >  	MSR_IA32_TSCDEADLINE,
> > > >  	MSR_IA32_MISC_ENABLE,
> > > >  	MSR_IA32_MCG_STATUS,
> > > > --
> > > > 1.7.1
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" 
> > > > in the body of a message to majordomo@vger.kernel.org More 
> > > > majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the 
> body of a message to majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-28  2:07             ` Auld, Will
@ 2012-09-28 13:24               ` Marcelo Tosatti
  0 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-09-28 13:24 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao, Liu, Jinsong

On Fri, Sep 28, 2012 at 02:07:26AM +0000, Auld, Will wrote:
> Marcelo,
> 
> I tagged my comments below with "[auld]" to make it easier to read. 
> 
> Thanks,
> 
> Will
> 
> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
> Sent: Thursday, September 27, 2012 4:49 AM
> To: Auld, Will
> Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On Thu, Sep 27, 2012 at 08:31:22AM -0300, Marcelo Tosatti wrote:
> > On Thu, Sep 27, 2012 at 12:50:16AM +0000, Auld, Will wrote:
> > > Marcelo,
> > > 
> > > I think I am missing something. There should be no needed changes to current algorithms that exist today. Does it seem that I have broken Zachary's implementation somehow?
> > 
> > Yes. compute_guest_tsc() function must take ia32_tsc_adjust into 
> > account. guest_read_tsc (and the SVM equivalent) also.
> 
> [auld] I don't see how that function is broken. 

compute_guest_tsc() should return the TSC value accordingly to what is
emulated via vcpu->arch.virtual_tsc_mult, but this can be fixed later.

> Also, must take into account VMX->SVM migration. In that case, you should export IA32_TSC_ADJUST along with IA32_TSC MSR.
> 
> [auld] I'll give this more thought. Two different ways to go, allow this to only work on host processors with this feature or enable this for all VM independent of the underlying host processor capability. In the former case migrating cross architecture might be disallowed. In the later case sending only IA32_TSC on migration should be enough as the delta would be accounted for in tsc_offset of the control structure.

That is fine, yes, if you want to migrate across, don't expose the
feature.

> 
> Which brings us back to the initial question, if there are other means to provide stable TSC, why use this MSR? For example, VMWare guests have no need to use this MSR (because the hypervisor provides TSC guarantees).
> 
> [auld] Using this MSR simplifies the process of synchronizing the tsc for each logical processor because its value does not change with the clock. How do you write the same value to all the IA32_TIME_STAMP_COUNTER MSR? Well, figure out what you want to write there, get all the processors to rendezvous at the same time, have all logical processors complete their writes in a very small amount of time. This is in contrast to deciding the offset to write and then having all the logical processors write the offset. No worries about rendezvous, synchronization of the writes in time and such.  
> 
> Then we come back to the two questions: 
> 
> - Is there anyone from Intel working on the Linux host side, where it
>   makes sense to use this?
> 
> [auld] I am not aware of anyone working on this for Linux.
> 
> - Are you sure its worthwhile to expose this to KVM guests?
> 
> [auld] At least one OS is moving to implement this that is commonly used as a guest. 

OK thanks.

> 
> > > 
> > > Thanks,
> > > 
> > > Will
> > > 
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Wednesday, September 26, 2012 5:29 PM
> > > To: Auld, Will
> > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > 
> > > On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> > > > Avi, Still working on your suggestions.
> > > > 
> > > > Marcelo,
> > > > 
> > > > The purpose is to be able to run guests that implement this change and not require they revert to the older method of adjusting the TSC. I am making no assumption about whether the guest checks to see if the times are good enough or just runs an algorithm every time but in any case this would allow the simpler, cleaner and less expensive algorithm to run if it exists. 
> > > 
> > > Will, you can choose to not expose the feature. Correct?
> > > 
> > > Because this conflicts with the model that has been envisioned and developed by Zachary... for that model to continue to be functional you'll have to make sure the TSC emulation is adjusted accordingly to consider IA32_TSC_ADJUST (for example, when trapping TSC).
> > > 
> > > >From that point of view, the patch below is incomplete.
> > > 
> > > ... or KVM can choose to never expose the feature via CPUID and handle TSC consistency itself (i understand your perspective of getting a task complete, but unfortunately from my POV its not so simple).
> > > 
> > > > Thanks,
> > > > 
> > > > Will
> > > > 
> > > > >The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system >(host) to decrease the delta between cores to an acceptable value, so that applications >can make use of direct RDTSC, correct?
> > > > >
> > > > >Why is it necessary for the guests to make use of such interface, if the hypervisor >could provide proper TSC?
> > > > >
> > > > >(not against exposing it to the guests, just thinking out loud).
> > > > >
> > > > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC >across cores, and newer guests which should already make use of paravirt clock >interface, what is the point of exposing the feature?
> > > > 
> > > > -----Original Message-----
> > > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > > Sent: Wednesday, September 26, 2012 2:35 PM
> > > > To: Auld, Will
> > > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> > > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > > 
> > > > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > > > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 
> > > > > >00:00:00
> > > > > >2001
> > > > > From: Will Auld <will.auld@intel.com>
> > > > > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > > > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > > > 
> > > > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is 
> > > > > supported
> > > > > 
> > > > > Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> > > > > 
> > > > > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations.
> > > > 
> > > > The purpose of the IA32_TSC_ADJUST control is to make it easier for the operating system (host) to decrease the delta between cores to an acceptable value, so that applications can make use of direct RDTSC, correct?
> > > > 
> > > > Why is it necessary for the guests to make use of such interface, if the hypervisor could provide proper TSC?
> > > > 
> > > > (not against exposing it to the guests, just thinking out loud).
> > > > 
> > > > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper synchronized TSC across cores, and newer guests which should already make use of paravirt clock interface, what is the point of exposing the feature?
> > > > 
> > > > >  The argument against storing it in the actual MSR is performance.
> > > > >  This is likely to be seldom used while the save/restore is 
> > > > > required on every transition. IA32_TSC_ADJUST was created as a 
> > > > > way to solve some issues with writing TSC itself so that is not an option  either.
> > > > > The remaining option, defined above as our solution has  the 
> > > > > problem of returning incorrect vmcs tsc_offset values (unless  
> > > > > we intercept and fix, not done here) as mentioned above. 
> > > > > However,  more problematic is that storing the data in vmcs 
> > > > > tsc_offset will  have a different semantic effect on the system 
> > > > > than does using  the actual MSR. This is illustrated in the 
> > > > > following example: The  hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a  guest process perfor!
> > > > > ms a rdtsc. In this case the guest process will  get TSC + 
> > > > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including 
> > > > > IA32_TSC_ADJUST_guest. While the total system semantics changed 
> > > > > the semantics as seen by the guest do not and hence this will 
> > > > > not cause a problem.
> > > > > ---
> > > > 
> > > > >  arch/x86/include/asm/cpufeature.h |    1 +
> > > > >  arch/x86/include/asm/kvm_host.h   |    2 ++
> > > > >  arch/x86/include/asm/msr-index.h  |    1 +
> > > > >  arch/x86/kvm/cpuid.c              |    4 ++--
> > > > >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> > > > >  arch/x86/kvm/x86.c                |    1 +
> > > > >  6 files changed, 19 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/include/asm/cpufeature.h
> > > > > b/arch/x86/include/asm/cpufeature.h
> > > > > index 6b7ee5f..e574d81 100644
> > > > > --- a/arch/x86/include/asm/cpufeature.h
> > > > > +++ b/arch/x86/include/asm/cpufeature.h
> > > > > @@ -199,6 +199,7 @@
> > > > >  
> > > > >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> > > > >  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> > > > > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 
> > > > > +0x3b */
> > > > >  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
> > > > >  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
> > > > >  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > > > >  	u32 virtual_tsc_mult;
> > > > >  	u32 virtual_tsc_khz;
> > > > >  
> > > > > +	s64 tsc_adjust;
> > > > > +
> > > > >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > > > >  	unsigned nmi_pending; /* NMI queued after currently running handler */
> > > > >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > > b/arch/x86/include/asm/msr-index.h
> > > > > index 957ec87..8e82e29 100644
> > > > > --- a/arch/x86/include/asm/msr-index.h
> > > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > > @@ -231,6 +231,7 @@
> > > > >  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
> > > > >  #define MSR_EBC_FREQUENCY_ID		0x0000002c
> > > > >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > > > > +#define MSR_TSC_ADJUST				0x0000003b
> > > > >  
> > > > >  #define FEATURE_CONTROL_LOCKED				(1<<0)
> > > > >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > > > > 0595f13..8f5943e 100644
> > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct 
> > > > > kvm_cpuid_entry2 *entry, u32 function,
> > > > >  
> > > > >  	/* cpuid 7.0.ebx */
> > > > >  	const u32 kvm_supported_word9_x86_features =
> > > > > -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > > > > -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > > > +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > > > > +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > > >  
> > > > >  	/* all calls to cpuid_count() should be made on the same cpu */
> > > > >  	get_cpu();
> > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > > > > c00f03d..35d11b3 100644
> > > > > --- a/arch/x86/kvm/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> > > > >  	case MSR_IA32_SYSENTER_ESP:
> > > > >  		data = vmcs_readl(GUEST_SYSENTER_ESP);
> > > > >  		break;
> > > > > +	case MSR_TSC_ADJUST:
> > > > > +		data = (u64)vcpu->arch.tsc_adjust;
> > > > > +		break;
> > > > >  	case MSR_TSC_AUX:
> > > > >  		if (!to_vmx(vcpu)->rdtscp_enabled)
> > > > >  			return 1;
> > > > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > > > >  		}
> > > > >  		ret = kvm_set_msr_common(vcpu, msr_index, data);
> > > > >  		break;
> > > > > +	case MSR_TSC_ADJUST:
> > > > > +#define DUMMY 1
> > > > > +		vmx_adjust_tsc_offset(vcpu,
> > > > > +				(s64)(data-vcpu->arch.tsc_adjust),
> > > > > +				(bool)DUMMY);
> > > > > +		vcpu->arch.tsc_adjust = (s64)data;
> > > > > +		break;
> > > > >  	case MSR_TSC_AUX:
> > > > >  		if (!vmx->rdtscp_enabled)
> > > > >  			return 1;
> > > > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu
> > > > > *vcpu)
> > > > >  
> > > > >  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > > > > VCPU_REGS_RSP));
> > > > >  
> > > > > +	vcpu->arch.tsc_adjust = 0x0;
> > > > > +
> > > > >  	vmx->rmode.vm86_active = 0;
> > > > >  
> > > > >  	vmx->soft_vnmi_blocked = 0;
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > > > 42bce48..6c50f6c 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static 
> > > > > unsigned num_msrs_to_save;
> > > > >  
> > > > >  static u32 emulated_msrs[] = {
> > > > > +	MSR_TSC_ADJUST,
> > > > >  	MSR_IA32_TSCDEADLINE,
> > > > >  	MSR_IA32_MISC_ENABLE,
> > > > >  	MSR_IA32_MCG_STATUS,
> > > > > --
> > > > > 1.7.1
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe kvm" 
> > > > > in the body of a message to majordomo@vger.kernel.org More 
> > > > > majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in the 
> > body of a message to majordomo@vger.kernel.org More majordomo info at  
> > http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-09-19 17:44 [PATCH] Enabling IA32_TSC_ADJUST for guest VM Auld, Will
                   ` (2 preceding siblings ...)
  2012-09-26 21:34 ` Marcelo Tosatti
@ 2012-10-08 17:30 ` Marcelo Tosatti
  2012-10-09 12:12   ` Avi Kivity
  3 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-10-08 17:30 UTC (permalink / raw)
  To: Auld, Will; +Cc: kvm, Avi Kivity, Zhang, Xiantao


On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 2001
> From: Will Auld <will.auld@intel.com>
> Date: Wed, 12 Sep 2012 18:10:56 -0700
> Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> 
> Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control.
> 
> However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process perfor!
>  ms a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem.
> ---
>  arch/x86/include/asm/cpufeature.h |    1 +
>  arch/x86/include/asm/kvm_host.h   |    2 ++
>  arch/x86/include/asm/msr-index.h  |    1 +
>  arch/x86/kvm/cpuid.c              |    4 ++--
>  arch/x86/kvm/vmx.c                |   12 ++++++++++++
>  arch/x86/kvm/x86.c                |    1 +
>  6 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 6b7ee5f..e574d81 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -199,6 +199,7 @@
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
>  #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b */
>  #define X86_FEATURE_BMI1	(9*32+ 3) /* 1st group bit manipulation extensions */
>  #define X86_FEATURE_HLE		(9*32+ 4) /* Hardware Lock Elision */
>  #define X86_FEATURE_AVX2	(9*32+ 5) /* AVX2 instructions */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09155d6..8a001a4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
>  	u32 virtual_tsc_mult;
>  	u32 virtual_tsc_khz;
>  
> +	s64 tsc_adjust;
> +
>  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
>  	unsigned nmi_pending; /* NMI queued after currently running handler */
>  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 957ec87..8e82e29 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -231,6 +231,7 @@
>  #define MSR_IA32_EBL_CR_POWERON		0x0000002a
>  #define MSR_EBC_FREQUENCY_ID		0x0000002c
>  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> +#define MSR_TSC_ADJUST				0x0000003b
>  
>  #define FEATURE_CONTROL_LOCKED				(1<<0)
>  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0595f13..8f5943e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  
>  	/* cpuid 7.0.ebx */
>  	const u32 kvm_supported_word9_x86_features =
> -		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> -		F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> +		F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> +		F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
>  
>  	/* all calls to cpuid_count() should be made on the same cpu */
>  	get_cpu();
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c00f03d..35d11b3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  	case MSR_IA32_SYSENTER_ESP:
>  		data = vmcs_readl(GUEST_SYSENTER_ESP);
>  		break;
> +	case MSR_TSC_ADJUST:
> +		data = (u64)vcpu->arch.tsc_adjust;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!to_vmx(vcpu)->rdtscp_enabled)
>  			return 1;

From Intel's manual:

• If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
subtracts) value X from the TSC,
the logical processor also adds (or subtracts) value X from the
IA32_TSC_ADJUST MSR.

This is not handled in the patch. 

To support migration, it will be necessary to differentiate between
guest initiated and userspace-model initiated msr write. That is, 
only guest initiated TSC writes should affect the value of 
IA32_TSC_ADJUST MSR.

Avi, any better idea?

Will, please write a test case, see

http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=summary

> @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  		}
>  		ret = kvm_set_msr_common(vcpu, msr_index, data);
>  		break;
> +	case MSR_TSC_ADJUST:
> +#define DUMMY 1
> +		vmx_adjust_tsc_offset(vcpu,
> +				(s64)(data-vcpu->arch.tsc_adjust),
> +				(bool)DUMMY);
> +		vcpu->arch.tsc_adjust = (s64)data;
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!vmx->rdtscp_enabled)
>  			return 1;
> @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << VCPU_REGS_RSP));
>  
> +	vcpu->arch.tsc_adjust = 0x0;
> +
>  	vmx->rmode.vm86_active = 0;
>  
>  	vmx->soft_vnmi_blocked = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 42bce48..6c50f6c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {
>  static unsigned num_msrs_to_save;
>  
>  static u32 emulated_msrs[] = {
> +	MSR_TSC_ADJUST,
>  	MSR_IA32_TSCDEADLINE,
>  	MSR_IA32_MISC_ENABLE,
>  	MSR_IA32_MCG_STATUS,
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-08 17:30 ` Marcelo Tosatti
@ 2012-10-09 12:12   ` Avi Kivity
  2012-10-09 14:24     ` Marcelo Tosatti
  2012-10-09 16:10     ` Auld, Will
  0 siblings, 2 replies; 23+ messages in thread
From: Avi Kivity @ 2012-10-09 12:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Auld, Will, kvm, Zhang, Xiantao

On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> 
> From Intel's manual:
> 
> • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
> subtracts) value X from the TSC,
> the logical processor also adds (or subtracts) value X from the
> IA32_TSC_ADJUST MSR.
> 
> This is not handled in the patch. 
> 
> To support migration, it will be necessary to differentiate between
> guest initiated and userspace-model initiated msr write. That is, 
> only guest initiated TSC writes should affect the value of 
> IA32_TSC_ADJUST MSR.
> 
> Avi, any better idea?
> 

I think we need that anyway, since there are some read-only MSRs that
need to be configured by the host (nvmx capabilities).  So if we add
that feature it will be useful elsewhere.  I don't think it's possible
to do it in any other way:

"Local offset value of the IA32_TSC for a
logical processor. Reset value is Zero. A
write to IA32_TSC will modify the local
offset in IA32_TSC_ADJUST and the
content of IA32_TSC, but does not affect
the internal invariant TSC hardware."

What we want to do is affect the internal invariant TSC hardware, so we
can't do that through the normal means.

btw, will tsc writes from userspace (after live migration) cause tsc
skew?  If so we should think how to model a guest-wide tsc.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 12:12   ` Avi Kivity
@ 2012-10-09 14:24     ` Marcelo Tosatti
  2012-10-09 14:26       ` Avi Kivity
  2012-10-09 16:10     ` Auld, Will
  1 sibling, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-10-09 14:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Auld, Will, kvm, Zhang, Xiantao

On Tue, Oct 09, 2012 at 02:12:18PM +0200, Avi Kivity wrote:
> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> > 
> > From Intel's manual:
> > 
> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
> > subtracts) value X from the TSC,
> > the logical processor also adds (or subtracts) value X from the
> > IA32_TSC_ADJUST MSR.
> > 
> > This is not handled in the patch. 
> > 
> > To support migration, it will be necessary to differentiate between
> > guest initiated and userspace-model initiated msr write. That is, 
> > only guest initiated TSC writes should affect the value of 
> > IA32_TSC_ADJUST MSR.
> > 
> > Avi, any better idea?
> > 
> 
> I think we need that anyway, since there are some read-only MSRs that
> need to be configured by the host (nvmx capabilities).  So if we add
> that feature it will be useful elsewhere.  I don't think it's possible
> to do it in any other way:
> 
> "Local offset value of the IA32_TSC for a
> logical processor. Reset value is Zero. A
> write to IA32_TSC will modify the local
> offset in IA32_TSC_ADJUST and the
> content of IA32_TSC, but does not affect
> the internal invariant TSC hardware."
> 
> What we want to do is affect the internal invariant TSC hardware, so we
> can't do that through the normal means.
> 
> btw, will tsc writes from userspace (after live migration) cause tsc
> skew?  If so we should think how to model a guest-wide tsc.

No because there is an easy shortcut:

    if (level == KVM_PUT_FULL_STATE) {
        /*
         * KVM is yet unable to synchronize TSC values of multiple VCPUs
         * on
         * writeback. Until this is fixed, we only write the offset to
         * SMP
         * guests after migration, desynchronizing the VCPUs, but
         * avoiding
         * huge jump-backs that would occur without any writeback at
         * all.
         */
        if (smp_cpus == 1 || env->tsc != 0) {
            kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
        }
    }



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 14:24     ` Marcelo Tosatti
@ 2012-10-09 14:26       ` Avi Kivity
  2012-10-09 14:27         ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-10-09 14:26 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Auld, Will, kvm, Zhang, Xiantao

On 10/09/2012 04:24 PM, Marcelo Tosatti wrote:
> On Tue, Oct 09, 2012 at 02:12:18PM +0200, Avi Kivity wrote:
>> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
>> > 
>> > From Intel's manual:
>> > 
>> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
>> > subtracts) value X from the TSC,
>> > the logical processor also adds (or subtracts) value X from the
>> > IA32_TSC_ADJUST MSR.
>> > 
>> > This is not handled in the patch. 
>> > 
>> > To support migration, it will be necessary to differentiate between
>> > guest initiated and userspace-model initiated msr write. That is, 
>> > only guest initiated TSC writes should affect the value of 
>> > IA32_TSC_ADJUST MSR.
>> > 
>> > Avi, any better idea?
>> > 
>> 
>> I think we need that anyway, since there are some read-only MSRs that
>> need to be configured by the host (nvmx capabilities).  So if we add
>> that feature it will be useful elsewhere.  I don't think it's possible
>> to do it in any other way:
>> 
>> "Local offset value of the IA32_TSC for a
>> logical processor. Reset value is Zero. A
>> write to IA32_TSC will modify the local
>> offset in IA32_TSC_ADJUST and the
>> content of IA32_TSC, but does not affect
>> the internal invariant TSC hardware."
>> 
>> What we want to do is affect the internal invariant TSC hardware, so we
>> can't do that through the normal means.
>> 
>> btw, will tsc writes from userspace (after live migration) cause tsc
>> skew?  If so we should think how to model a guest-wide tsc.
> 
> No because there is an easy shortcut:
> 
>     if (level == KVM_PUT_FULL_STATE) {
>         /*
>          * KVM is yet unable to synchronize TSC values of multiple VCPUs
>          * on
>          * writeback. Until this is fixed, we only write the offset to
>          * SMP
>          * guests after migration, desynchronizing the VCPUs, but
>          * avoiding
>          * huge jump-backs that would occur without any writeback at
>          * all.
>          */
>         if (smp_cpus == 1 || env->tsc != 0) {
>             kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>         }
>     }

Still we write back after migration.  So this needs to be fixed (or I
misunderstood you).


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 14:26       ` Avi Kivity
@ 2012-10-09 14:27         ` Marcelo Tosatti
  2012-10-09 14:30           ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-10-09 14:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Auld, Will, kvm, Zhang, Xiantao

On Tue, Oct 09, 2012 at 04:26:32PM +0200, Avi Kivity wrote:
> On 10/09/2012 04:24 PM, Marcelo Tosatti wrote:
> > On Tue, Oct 09, 2012 at 02:12:18PM +0200, Avi Kivity wrote:
> >> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> >> > 
> >> > From Intel's manual:
> >> > 
> >> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
> >> > subtracts) value X from the TSC,
> >> > the logical processor also adds (or subtracts) value X from the
> >> > IA32_TSC_ADJUST MSR.
> >> > 
> >> > This is not handled in the patch. 
> >> > 
> >> > To support migration, it will be necessary to differentiate between
> >> > guest initiated and userspace-model initiated msr write. That is, 
> >> > only guest initiated TSC writes should affect the value of 
> >> > IA32_TSC_ADJUST MSR.
> >> > 
> >> > Avi, any better idea?
> >> > 
> >> 
> >> I think we need that anyway, since there are some read-only MSRs that
> >> need to be configured by the host (nvmx capabilities).  So if we add
> >> that feature it will be useful elsewhere.  I don't think it's possible
> >> to do it in any other way:
> >> 
> >> "Local offset value of the IA32_TSC for a
> >> logical processor. Reset value is Zero. A
> >> write to IA32_TSC will modify the local
> >> offset in IA32_TSC_ADJUST and the
> >> content of IA32_TSC, but does not affect
> >> the internal invariant TSC hardware."
> >> 
> >> What we want to do is affect the internal invariant TSC hardware, so we
> >> can't do that through the normal means.
> >> 
> >> btw, will tsc writes from userspace (after live migration) cause tsc
> >> skew?  If so we should think how to model a guest-wide tsc.
> > 
> > No because there is an easy shortcut:
> > 
> >     if (level == KVM_PUT_FULL_STATE) {
> >         /*
> >          * KVM is yet unable to synchronize TSC values of multiple VCPUs
> >          * on
> >          * writeback. Until this is fixed, we only write the offset to
> >          * SMP
> >          * guests after migration, desynchronizing the VCPUs, but
> >          * avoiding
> >          * huge jump-backs that would occur without any writeback at
> >          * all.
> >          */
> >         if (smp_cpus == 1 || env->tsc != 0) {
> >             kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >         }
> >     }
> 
> Still we write back after migration.  So this needs to be fixed (or I
> misunderstood you).

Handled by kvm_write_tsc() in x86.c. Is this what you mean?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 14:27         ` Marcelo Tosatti
@ 2012-10-09 14:30           ` Avi Kivity
  2012-10-09 15:52             ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-10-09 14:30 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Auld, Will, kvm, Zhang, Xiantao

On 10/09/2012 04:27 PM, Marcelo Tosatti wrote:
> On Tue, Oct 09, 2012 at 04:26:32PM +0200, Avi Kivity wrote:
>> On 10/09/2012 04:24 PM, Marcelo Tosatti wrote:
>> > On Tue, Oct 09, 2012 at 02:12:18PM +0200, Avi Kivity wrote:
>> >> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
>> >> > 
>> >> > From Intel's manual:
>> >> > 
>> >> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
>> >> > subtracts) value X from the TSC,
>> >> > the logical processor also adds (or subtracts) value X from the
>> >> > IA32_TSC_ADJUST MSR.
>> >> > 
>> >> > This is not handled in the patch. 
>> >> > 
>> >> > To support migration, it will be necessary to differentiate between
>> >> > guest initiated and userspace-model initiated msr write. That is, 
>> >> > only guest initiated TSC writes should affect the value of 
>> >> > IA32_TSC_ADJUST MSR.
>> >> > 
>> >> > Avi, any better idea?
>> >> > 
>> >> 
>> >> I think we need that anyway, since there are some read-only MSRs that
>> >> need to be configured by the host (nvmx capabilities).  So if we add
>> >> that feature it will be useful elsewhere.  I don't think it's possible
>> >> to do it in any other way:
>> >> 
>> >> "Local offset value of the IA32_TSC for a
>> >> logical processor. Reset value is Zero. A
>> >> write to IA32_TSC will modify the local
>> >> offset in IA32_TSC_ADJUST and the
>> >> content of IA32_TSC, but does not affect
>> >> the internal invariant TSC hardware."
>> >> 
>> >> What we want to do is affect the internal invariant TSC hardware, so we
>> >> can't do that through the normal means.
>> >> 
>> >> btw, will tsc writes from userspace (after live migration) cause tsc
>> >> skew?  If so we should think how to model a guest-wide tsc.
>> > 
>> > No because there is an easy shortcut:
>> > 
>> >     if (level == KVM_PUT_FULL_STATE) {
>> >         /*
>> >          * KVM is yet unable to synchronize TSC values of multiple VCPUs
>> >          * on
>> >          * writeback. Until this is fixed, we only write the offset to
>> >          * SMP
>> >          * guests after migration, desynchronizing the VCPUs, but
>> >          * avoiding
>> >          * huge jump-backs that would occur without any writeback at
>> >          * all.
>> >          */
>> >         if (smp_cpus == 1 || env->tsc != 0) {
>> >             kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> >         }
>> >     }
>> 
>> Still we write back after migration.  So this needs to be fixed (or I
>> misunderstood you).
> 
> Handled by kvm_write_tsc() in x86.c. Is this what you mean?
> 

It will generate a call to ->write_tsc_offset().  Will the values be the
same for all vcpus?  Note the inputs won't be the same.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 14:30           ` Avi Kivity
@ 2012-10-09 15:52             ` Marcelo Tosatti
  0 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-10-09 15:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Auld, Will, kvm, Zhang, Xiantao

On Tue, Oct 09, 2012 at 04:30:28PM +0200, Avi Kivity wrote:
> On 10/09/2012 04:27 PM, Marcelo Tosatti wrote:
> > On Tue, Oct 09, 2012 at 04:26:32PM +0200, Avi Kivity wrote:
> >> On 10/09/2012 04:24 PM, Marcelo Tosatti wrote:
> >> > On Tue, Oct 09, 2012 at 02:12:18PM +0200, Avi Kivity wrote:
> >> >> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> >> >> > 
> >> >> > From Intel's manual:
> >> >> > 
> >> >> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
> >> >> > subtracts) value X from the TSC,
> >> >> > the logical processor also adds (or subtracts) value X from the
> >> >> > IA32_TSC_ADJUST MSR.
> >> >> > 
> >> >> > This is not handled in the patch. 
> >> >> > 
> >> >> > To support migration, it will be necessary to differentiate between
> >> >> > guest initiated and userspace-model initiated msr write. That is, 
> >> >> > only guest initiated TSC writes should affect the value of 
> >> >> > IA32_TSC_ADJUST MSR.
> >> >> > 
> >> >> > Avi, any better idea?
> >> >> > 
> >> >> 
> >> >> I think we need that anyway, since there are some read-only MSRs that
> >> >> need to be configured by the host (nvmx capabilities).  So if we add
> >> >> that feature it will be useful elsewhere.  I don't think it's possible
> >> >> to do it in any other way:
> >> >> 
> >> >> "Local offset value of the IA32_TSC for a
> >> >> logical processor. Reset value is Zero. A
> >> >> write to IA32_TSC will modify the local
> >> >> offset in IA32_TSC_ADJUST and the
> >> >> content of IA32_TSC, but does not affect
> >> >> the internal invariant TSC hardware."
> >> >> 
> >> >> What we want to do is affect the internal invariant TSC hardware, so we
> >> >> can't do that through the normal means.
> >> >> 
> >> >> btw, will tsc writes from userspace (after live migration) cause tsc
> >> >> skew?  If so we should think how to model a guest-wide tsc.
> >> > 
> >> > No because there is an easy shortcut:
> >> > 
> >> >     if (level == KVM_PUT_FULL_STATE) {
> >> >         /*
> >> >          * KVM is yet unable to synchronize TSC values of multiple VCPUs
> >> >          * on
> >> >          * writeback. Until this is fixed, we only write the offset to
> >> >          * SMP
> >> >          * guests after migration, desynchronizing the VCPUs, but
> >> >          * avoiding
> >> >          * huge jump-backs that would occur without any writeback at
> >> >          * all.
> >> >          */
> >> >         if (smp_cpus == 1 || env->tsc != 0) {
> >> >             kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >> >         }
> >> >     }
> >> 
> >> Still we write back after migration.  So this needs to be fixed (or I
> >> misunderstood you).
> > 
> > Handled by kvm_write_tsc() in x86.c. Is this what you mean?
> > 
> 
> It will generate a call to ->write_tsc_offset().  Will the values be the
> same for all vcpus?  Note the inputs won't be the same.

Yes:

         * Special case: TSC write with a small delta (1 second) of
         * virtual
         * cycle time against real time is interpreted as an attempt to
         * synchronize the CPU.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 12:12   ` Avi Kivity
  2012-10-09 14:24     ` Marcelo Tosatti
@ 2012-10-09 16:10     ` Auld, Will
  2012-10-10 12:52       ` Marcelo Tosatti
  1 sibling, 1 reply; 23+ messages in thread
From: Auld, Will @ 2012-10-09 16:10 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Zhang, Xiantao, Auld, Will, Liu, Jinsong

I am just testing the second version of this patch. It addresses all the comments so far except Marcelo's issue with breaking the function compute_guest_tsc(). 

I needed to put the call for updating the TSC_ADJUST_MSR in kvm_write_tsc() to ensure it is only called from user space. Other changes added to vmcs offset should not be tracked in TSC_ADJUST_MSR. 

I had some trouble with the order of initialization during live migration. TSC_ADJUST is initialized first but then wiped out by multiple initializations of tsc. The fix for this is to not update TSC_ADJUST if the vmcs offset is not actually changing with the tsc write. So, after migration outcome is that vmcs offset gets defined independent from the migrating value of TSC_ADJUST. I believe this is what we want to happen.

Thanks,

Will 

-----Original Message-----
From: Avi Kivity [mailto:avi@redhat.com] 
Sent: Tuesday, October 09, 2012 5:12 AM
To: Marcelo Tosatti
Cc: Auld, Will; kvm@vger.kernel.org; Zhang, Xiantao
Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> 
> From Intel's manual:
> 
> • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
> subtracts) value X from the TSC,
> the logical processor also adds (or subtracts) value X from the 
> IA32_TSC_ADJUST MSR.
> 
> This is not handled in the patch. 
> 
> To support migration, it will be necessary to differentiate between 
> guest initiated and userspace-model initiated msr write. That is, only 
> guest initiated TSC writes should affect the value of IA32_TSC_ADJUST 
> MSR.
> 
> Avi, any better idea?
> 

I think we need that anyway, since there are some read-only MSRs that need to be configured by the host (nvmx capabilities).  So if we add that feature it will be useful elsewhere.  I don't think it's possible to do it in any other way:

"Local offset value of the IA32_TSC for a logical processor. Reset value is Zero. A write to IA32_TSC will modify the local offset in IA32_TSC_ADJUST and the content of IA32_TSC, but does not affect the internal invariant TSC hardware."

What we want to do is affect the internal invariant TSC hardware, so we can't do that through the normal means.

btw, will tsc writes from userspace (after live migration) cause tsc skew?  If so we should think how to model a guest-wide tsc.

--
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-09 16:10     ` Auld, Will
@ 2012-10-10 12:52       ` Marcelo Tosatti
  2012-10-11  0:47         ` Auld, Will
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-10-10 12:52 UTC (permalink / raw)
  To: Auld, Will; +Cc: Avi Kivity, kvm, Zhang, Xiantao, Liu, Jinsong

On Tue, Oct 09, 2012 at 04:10:30PM +0000, Auld, Will wrote:
> I am just testing the second version of this patch. It addresses all the comments so far except Marcelo's issue with breaking the function compute_guest_tsc(). 

Lets try to merge the missing patch from Zachary first (that'll make it
clear).

> 
> I needed to put the call for updating the TSC_ADJUST_MSR in kvm_write_tsc() to ensure it is only called from user space. Other changes added to vmcs offset should not be tracked in TSC_ADJUST_MSR. 

Please have a separate, earlier patch making that explicit (by passing a
bool to kvm_x86_ops->set_msr then to kvm_set_msr_common). "that" =
whether msr write is guest initiated or not.

> I had some trouble with the order of initialization during live migration. TSC_ADJUST is initialized first but then wiped out by multiple initializations of tsc. The fix for this is to not update TSC_ADJUST if the vmcs offset is not actually changing with the tsc write. So, after migration outcome is that vmcs offset gets defined independent from the migrating value of TSC_ADJUST. I believe this is what we want to happen.

Can you please be more explicit regarding "wiped out by multiple
initializations of tsc" ? 

It is probably best to maintain TSC_ADJUST separately, in software, and
then calculate TSC_OFFSET.

> Thanks,
> 
> Will 
> 
> -----Original Message-----
> From: Avi Kivity [mailto:avi@redhat.com] 
> Sent: Tuesday, October 09, 2012 5:12 AM
> To: Marcelo Tosatti
> Cc: Auld, Will; kvm@vger.kernel.org; Zhang, Xiantao
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> > 
> > From Intel's manual:
> > 
> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds (or
> > subtracts) value X from the TSC,
> > the logical processor also adds (or subtracts) value X from the 
> > IA32_TSC_ADJUST MSR.
> > 
> > This is not handled in the patch. 
> > 
> > To support migration, it will be necessary to differentiate between 
> > guest initiated and userspace-model initiated msr write. That is, only 
> > guest initiated TSC writes should affect the value of IA32_TSC_ADJUST 
> > MSR.
> > 
> > Avi, any better idea?
> > 
> 
> I think we need that anyway, since there are some read-only MSRs that need to be configured by the host (nvmx capabilities).  So if we add that feature it will be useful elsewhere.  I don't think it's possible to do it in any other way:
> 
> "Local offset value of the IA32_TSC for a logical processor. Reset value is Zero. A write to IA32_TSC will modify the local offset in IA32_TSC_ADJUST and the content of IA32_TSC, but does not affect the internal invariant TSC hardware."
> 
> What we want to do is affect the internal invariant TSC hardware, so we can't do that through the normal means.
> 
> btw, will tsc writes from userspace (after live migration) cause tsc skew?  If so we should think how to model a guest-wide tsc.
> 
> --
> error compiling committee.c: too many arguments to function
> N?????r??y????b?X??ǧv?^?)޺{.n?+????h??\x17??ܨ}???Ơz?&j:+v???\a????zZ+??+zf???h???~????i???z?\x1e?w?????????&?)ߢ^[f

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-10 12:52       ` Marcelo Tosatti
@ 2012-10-11  0:47         ` Auld, Will
  2012-10-11  8:56           ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Auld, Will @ 2012-10-11  0:47 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, Zhang, Xiantao, Liu, Jinsong, Auld, Will

Marcelo,

You are suggesting that I:
Kvm_host.h:

Struct kvm_x86_ops {

...
change
	Int (*set_msr)(struct kvm_vcpu * vcpu, u32 mrs_index, u64 data);
to
	Int (*set_msr)(struct kvm_vcpu * vcpu, u32 mrs_index, u64 data, bool from_guest);
...
};

and so on down the line to set_msr_common(), kvm_write_tsc()... in a separate patch before other related patches?

As far as the initialization after live migration, I will provide some output with explanation once I am able to again. At the moment, I have hosed my system and need to figure out what's wrong and fix it first. 

Thanks,

Will

-----Original Message-----
From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
Sent: Wednesday, October 10, 2012 5:53 AM
To: Auld, Will
Cc: Avi Kivity; kvm@vger.kernel.org; Zhang, Xiantao; Liu, Jinsong
Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

On Tue, Oct 09, 2012 at 04:10:30PM +0000, Auld, Will wrote:
> I am just testing the second version of this patch. It addresses all the comments so far except Marcelo's issue with breaking the function compute_guest_tsc(). 

Lets try to merge the missing patch from Zachary first (that'll make it clear).

> 
> I needed to put the call for updating the TSC_ADJUST_MSR in kvm_write_tsc() to ensure it is only called from user space. Other changes added to vmcs offset should not be tracked in TSC_ADJUST_MSR. 

Please have a separate, earlier patch making that explicit (by passing a bool to kvm_x86_ops->set_msr then to kvm_set_msr_common). "that" = whether msr write is guest initiated or not.

> I had some trouble with the order of initialization during live migration. TSC_ADJUST is initialized first but then wiped out by multiple initializations of tsc. The fix for this is to not update TSC_ADJUST if the vmcs offset is not actually changing with the tsc write. So, after migration outcome is that vmcs offset gets defined independent from the migrating value of TSC_ADJUST. I believe this is what we want to happen.

Can you please be more explicit regarding "wiped out by multiple initializations of tsc" ? 

It is probably best to maintain TSC_ADJUST separately, in software, and then calculate TSC_OFFSET.

> Thanks,
> 
> Will
> 
> -----Original Message-----
> From: Avi Kivity [mailto:avi@redhat.com]
> Sent: Tuesday, October 09, 2012 5:12 AM
> To: Marcelo Tosatti
> Cc: Auld, Will; kvm@vger.kernel.org; Zhang, Xiantao
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> > 
> > From Intel's manual:
> > 
> > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds 
> > (or
> > subtracts) value X from the TSC,
> > the logical processor also adds (or subtracts) value X from the 
> > IA32_TSC_ADJUST MSR.
> > 
> > This is not handled in the patch. 
> > 
> > To support migration, it will be necessary to differentiate between 
> > guest initiated and userspace-model initiated msr write. That is, 
> > only guest initiated TSC writes should affect the value of 
> > IA32_TSC_ADJUST MSR.
> > 
> > Avi, any better idea?
> > 
> 
> I think we need that anyway, since there are some read-only MSRs that need to be configured by the host (nvmx capabilities).  So if we add that feature it will be useful elsewhere.  I don't think it's possible to do it in any other way:
> 
> "Local offset value of the IA32_TSC for a logical processor. Reset value is Zero. A write to IA32_TSC will modify the local offset in IA32_TSC_ADJUST and the content of IA32_TSC, but does not affect the internal invariant TSC hardware."
> 
> What we want to do is affect the internal invariant TSC hardware, so we can't do that through the normal means.
> 
> btw, will tsc writes from userspace (after live migration) cause tsc skew?  If so we should think how to model a guest-wide tsc.
> 
> --
> error compiling committee.c: too many arguments to function 
> N?????r??y????b?X??ǧv?^?)޺{.n?+????h??\x17??ܨ}???Ơz?&j:+v??? ????zZ+??+zf???h???~????i???z?\x1e?w?????????&?)ߢ^[f

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
  2012-10-11  0:47         ` Auld, Will
@ 2012-10-11  8:56           ` Marcelo Tosatti
  0 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-10-11  8:56 UTC (permalink / raw)
  To: Auld, Will; +Cc: Avi Kivity, kvm, Zhang, Xiantao, Liu, Jinsong

On Thu, Oct 11, 2012 at 12:47:39AM +0000, Auld, Will wrote:
> Marcelo,
> 
> You are suggesting that I:
> Kvm_host.h:
> 
> Struct kvm_x86_ops {
> 
> ...
> change
> 	Int (*set_msr)(struct kvm_vcpu * vcpu, u32 mrs_index, u64 data);
> to
> 	Int (*set_msr)(struct kvm_vcpu * vcpu, u32 mrs_index, u64 data, bool from_guest);
> ...
> };
> 
> and so on down the line to set_msr_common(), kvm_write_tsc()... in a separate patch before other related patches?

Yes. 'bool guest_initiated' is nicer IMO.

> As far as the initialization after live migration, I will provide some output with explanation once I am able to again. At the moment, I have hosed my system and need to figure out what's wrong and fix it first. 

Ok no problem.

> Thanks,
> 
> Will
> 
> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com] 
> Sent: Wednesday, October 10, 2012 5:53 AM
> To: Auld, Will
> Cc: Avi Kivity; kvm@vger.kernel.org; Zhang, Xiantao; Liu, Jinsong
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On Tue, Oct 09, 2012 at 04:10:30PM +0000, Auld, Will wrote:
> > I am just testing the second version of this patch. It addresses all the comments so far except Marcelo's issue with breaking the function compute_guest_tsc(). 
> 
> Lets try to merge the missing patch from Zachary first (that'll make it clear).
> 
> > 
> > I needed to put the call for updating the TSC_ADJUST_MSR in kvm_write_tsc() to ensure it is only called from user space. Other changes added to vmcs offset should not be tracked in TSC_ADJUST_MSR. 
> 
> Please have a separate, earlier patch making that explicit (by passing a bool to kvm_x86_ops->set_msr then to kvm_set_msr_common). "that" = whether msr write is guest initiated or not.
> 
> > I had some trouble with the order of initialization during live migration. TSC_ADJUST is initialized first but then wiped out by multiple initializations of tsc. The fix for this is to not update TSC_ADJUST if the vmcs offset is not actually changing with the tsc write. So, after migration outcome is that vmcs offset gets defined independent from the migrating value of TSC_ADJUST. I believe this is what we want to happen.
> 
> Can you please be more explicit regarding "wiped out by multiple initializations of tsc" ? 
> 
> It is probably best to maintain TSC_ADJUST separately, in software, and then calculate TSC_OFFSET.
> 
> > Thanks,
> > 
> > Will
> > 
> > -----Original Message-----
> > From: Avi Kivity [mailto:avi@redhat.com]
> > Sent: Tuesday, October 09, 2012 5:12 AM
> > To: Marcelo Tosatti
> > Cc: Auld, Will; kvm@vger.kernel.org; Zhang, Xiantao
> > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > 
> > On 10/08/2012 07:30 PM, Marcelo Tosatti wrote:
> > > 
> > > From Intel's manual:
> > > 
> > > • If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds 
> > > (or
> > > subtracts) value X from the TSC,
> > > the logical processor also adds (or subtracts) value X from the 
> > > IA32_TSC_ADJUST MSR.
> > > 
> > > This is not handled in the patch. 
> > > 
> > > To support migration, it will be necessary to differentiate between 
> > > guest initiated and userspace-model initiated msr write. That is, 
> > > only guest initiated TSC writes should affect the value of 
> > > IA32_TSC_ADJUST MSR.
> > > 
> > > Avi, any better idea?
> > > 
> > 
> > I think we need that anyway, since there are some read-only MSRs that need to be configured by the host (nvmx capabilities).  So if we add that feature it will be useful elsewhere.  I don't think it's possible to do it in any other way:
> > 
> > "Local offset value of the IA32_TSC for a logical processor. Reset value is Zero. A write to IA32_TSC will modify the local offset in IA32_TSC_ADJUST and the content of IA32_TSC, but does not affect the internal invariant TSC hardware."
> > 
> > What we want to do is affect the internal invariant TSC hardware, so we can't do that through the normal means.
> > 
> > btw, will tsc writes from userspace (after live migration) cause tsc skew?  If so we should think how to model a guest-wide tsc.
> > 
> > --
> > error compiling committee.c: too many arguments to function 
> > N?????r??y????b?X??ǧv?^?)޺{.n?+????h??\x17??ܨ}???Ơz?&j:+v??? ????zZ+??+zf???h???~????i???z?\x1e?w?????????&?)ߢ^[f

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-10-11  9:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-19 17:44 [PATCH] Enabling IA32_TSC_ADJUST for guest VM Auld, Will
2012-09-20 11:13 ` Avi Kivity
2012-09-20 11:15 ` Avi Kivity
2012-09-26 21:34 ` Marcelo Tosatti
2012-09-26 22:58   ` Auld, Will
2012-09-27  0:29     ` Marcelo Tosatti
2012-09-27  0:30       ` Marcelo Tosatti
2012-09-27  0:50       ` Auld, Will
2012-09-27 11:31         ` Marcelo Tosatti
2012-09-27 11:48           ` Marcelo Tosatti
2012-09-28  2:07             ` Auld, Will
2012-09-28 13:24               ` Marcelo Tosatti
2012-10-08 17:30 ` Marcelo Tosatti
2012-10-09 12:12   ` Avi Kivity
2012-10-09 14:24     ` Marcelo Tosatti
2012-10-09 14:26       ` Avi Kivity
2012-10-09 14:27         ` Marcelo Tosatti
2012-10-09 14:30           ` Avi Kivity
2012-10-09 15:52             ` Marcelo Tosatti
2012-10-09 16:10     ` Auld, Will
2012-10-10 12:52       ` Marcelo Tosatti
2012-10-11  0:47         ` Auld, Will
2012-10-11  8:56           ` Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.