kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/11] KVM: Implement nested TSC scaling
@ 2021-05-26 18:44 Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 01/11] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

KVM currently supports hardware-assisted TSC scaling but only for L1;
the feature is not exposed to nested guests. This patch series adds
support for nested TSC scaling and allows both L1 and L2 to be scaled
with different scaling factors. That is achieved by "merging" the 01 and
02 values together.

Most of the logic in this series is implemented in common code (by doing
the necessary restructurings), however the patches add support for VMX
only. Adding support for SVM should be easy at this point and Maxim
Levitsky has volunteered to do this (thanks!).

Changelog:

Only patch 9 needs reviewing at this point, but I am re-sending the
whole series as I also applied nitpicks suggested to some of the other
pathces.

v4:
  - Added vendor callbacks for writing the TSC multiplier
  - Moved the VMCS multiplier writes from the VMCS load path to common
    code that only gets called on TSC ratio changes
  - Merged together patches 10 and 11 of v3
  - Applied all nitpick-feedback of the previous versions

v3:
  - Applied Sean's feedback
  - Refactored patches 7 to 10

v2:
  - Applied all of Maxim's feedback
  - Added a mul_s64_u64_shr function in math64.h
  - Added a separate kvm_scale_tsc_l1 function instead of passing an
    argument to kvm_scale_tsc
  - Implemented the 02 fields calculations in common code
  - Moved all of write_l1_tsc_offset's logic to common code
  - Added a check for whether the TSC is stable in patch 10
  - Used a random L1 factor and a negative offset in patch 10

Ilias Stamatis (11):
  math64.h: Add mul_s64_u64_shr()
  KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  KVM: X86: Rename kvm_compute_tsc_offset() to
    kvm_compute_l1_tsc_offset()
  KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  KVM: nVMX: Add a TSC multiplier field in VMCS12
  KVM: X86: Add functions for retrieving L2 TSC fields from common code
  KVM: X86: Add functions that calculate the nested TSC fields
  KVM: X86: Move write_l1_tsc_offset() logic to common code and rename
    it
  KVM: X86: Add vendor callbacks for writing the TSC multiplier
  KVM: nVMX: Enable nested TSC scaling
  KVM: selftests: x86: Add vmx_nested_tsc_scaling_test

 arch/x86/include/asm/kvm-x86-ops.h            |   5 +-
 arch/x86/include/asm/kvm_host.h               |  15 +-
 arch/x86/kvm/svm/svm.c                        |  35 ++-
 arch/x86/kvm/vmx/nested.c                     |  33 ++-
 arch/x86/kvm/vmx/vmcs12.c                     |   1 +
 arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
 arch/x86/kvm/vmx/vmx.c                        |  55 ++--
 arch/x86/kvm/vmx/vmx.h                        |  11 +-
 arch/x86/kvm/x86.c                            | 114 +++++++--
 include/linux/math64.h                        |  19 ++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
 13 files changed, 451 insertions(+), 85 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 01/11] math64.h: Add mul_s64_u64_shr()
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 02/11] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

This function is needed for KVM's nested virtualization. The nested TSC
scaling implementation requires multiplying the signed TSC offset with
the unsigned TSC multiplier.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 include/linux/math64.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/math64.h b/include/linux/math64.h
index 66deb1fdc2ef..2928f03d6d46 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -3,6 +3,7 @@
 #define _LINUX_MATH64_H
 
 #include <linux/types.h>
+#include <linux/math.h>
 #include <vdso/math64.h>
 #include <asm/div64.h>
 
@@ -234,6 +235,24 @@ static inline u64 mul_u64_u64_shr(u64 a, u64 b, unsigned int shift)
 
 #endif
 
+#ifndef mul_s64_u64_shr
+static inline u64 mul_s64_u64_shr(s64 a, u64 b, unsigned int shift)
+{
+	u64 ret;
+
+	/*
+	 * Extract the sign before the multiplication and put it back
+	 * afterwards if needed.
+	 */
+	ret = mul_u64_u64_shr(abs(a), b, shift);
+
+	if (a < 0)
+		ret = -((s64) ret);
+
+	return ret;
+}
+#endif /* mul_s64_u64_shr */
+
 #ifndef mul_u64_u32_div
 static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 02/11] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 01/11] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 03/11] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_l1_tsc_offset() Ilias Stamatis
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Store L1's scaling ratio in the kvm_vcpu_arch struct like we already do
for L1's TSC offset. This allows for easy save/restore when we enter and
then exit the nested guest.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++--
 arch/x86/kvm/vmx/vmx.c          | 4 ++--
 arch/x86/kvm/x86.c              | 6 ++++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..7dfc609eacd6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -707,7 +707,7 @@ struct kvm_vcpu_arch {
 	} st;
 
 	u64 l1_tsc_offset;
-	u64 tsc_offset;
+	u64 tsc_offset; /* current tsc offset */
 	u64 last_guest_tsc;
 	u64 last_host_tsc;
 	u64 tsc_offset_adjustment;
@@ -721,7 +721,8 @@ struct kvm_vcpu_arch {
 	u32 virtual_tsc_khz;
 	s64 ia32_tsc_adjust_msr;
 	u64 msr_ia32_power_ctl;
-	u64 tsc_scaling_ratio;
+	u64 l1_tsc_scaling_ratio;
+	u64 tsc_scaling_ratio; /* current scaling ratio */
 
 	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
 	unsigned nmi_pending; /* NMI queued after currently running handler */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bceb5ca3a89..3e4dda8177bb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7453,10 +7453,10 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 		delta_tsc = 0;
 
 	/* Convert to host delta tsc if tsc scaling is enabled */
-	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
+	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
 	    delta_tsc && u64_shl_div_u64(delta_tsc,
 				kvm_tsc_scaling_ratio_frac_bits,
-				vcpu->arch.tsc_scaling_ratio, &delta_tsc))
+				vcpu->arch.l1_tsc_scaling_ratio, &delta_tsc))
 		return -ERANGE;
 
 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbc4e04e67ad..6ab95ac188a5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2185,6 +2185,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 
 	/* Guest TSC same frequency as host TSC? */
 	if (!scale) {
+		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return 0;
 	}
@@ -2211,7 +2212,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 		return -1;
 	}
 
-	vcpu->arch.tsc_scaling_ratio = ratio;
+	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
 	return 0;
 }
 
@@ -2223,6 +2224,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
 	/* tsc_khz can be zero if TSC calibration fails */
 	if (user_tsc_khz == 0) {
 		/* set tsc_scaling_ratio to a safe value */
+		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return -1;
 	}
@@ -2459,7 +2461,7 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
 
 static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 {
-	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
+	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
 		WARN_ON(adjustment < 0);
 	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
 	adjust_tsc_offset_guest(vcpu, adjustment);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 03/11] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_l1_tsc_offset()
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 01/11] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 02/11] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 04/11] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

All existing code uses kvm_compute_tsc_offset() passing L1 TSC values to
it. Let's document this by renaming it to kvm_compute_l1_tsc_offset().

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6ab95ac188a5..7359da170a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2319,7 +2319,7 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_scale_tsc);
 
-static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+static u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
 {
 	u64 tsc;
 
@@ -2363,7 +2363,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
-	offset = kvm_compute_tsc_offset(vcpu, data);
+	offset = kvm_compute_l1_tsc_offset(vcpu, data);
 	ns = get_kvmclock_base_ns();
 	elapsed = ns - kvm->arch.last_tsc_nsec;
 
@@ -2402,7 +2402,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 		} else {
 			u64 delta = nsec_to_cycles(vcpu, elapsed);
 			data += delta;
-			offset = kvm_compute_tsc_offset(vcpu, data);
+			offset = kvm_compute_l1_tsc_offset(vcpu, data);
 		}
 		matched = true;
 		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
@@ -3235,7 +3235,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr_info->host_initiated) {
 			kvm_synchronize_tsc(vcpu, data);
 		} else {
-			u64 adj = kvm_compute_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
+			u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
 			adjust_tsc_offset_guest(vcpu, adj);
 			vcpu->arch.ia32_tsc_adjust_msr += adj;
 		}
@@ -4123,7 +4123,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
 		if (kvm_check_tsc_unstable()) {
-			u64 offset = kvm_compute_tsc_offset(vcpu,
+			u64 offset = kvm_compute_l1_tsc_offset(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
 			vcpu->arch.tsc_catchup = 1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 04/11] KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (2 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 03/11] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_l1_tsc_offset() Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 05/11] KVM: nVMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Sometimes kvm_scale_tsc() needs to use the current scaling ratio and
other times (like when reading the TSC from user space) it needs to use
L1's scaling ratio. Have the caller specify this by passing the ratio as
a parameter.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/x86.c              | 27 ++++++++++++++++++---------
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7dfc609eacd6..b14c2b2b2e21 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1788,7 +1788,7 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
 	return kvm_find_user_return_msr(msr) >= 0;
 }
 
-u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
+u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7359da170a5f..8cf0dfb6264d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2307,10 +2307,9 @@ static inline u64 __scale_tsc(u64 ratio, u64 tsc)
 	return mul_u64_u64_shr(tsc, ratio, kvm_tsc_scaling_ratio_frac_bits);
 }
 
-u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
+u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio)
 {
 	u64 _tsc = tsc;
-	u64 ratio = vcpu->arch.tsc_scaling_ratio;
 
 	if (ratio != kvm_default_tsc_scaling_ratio)
 		_tsc = __scale_tsc(ratio, tsc);
@@ -2323,14 +2322,15 @@ static u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
 {
 	u64 tsc;
 
-	tsc = kvm_scale_tsc(vcpu, rdtsc());
+	tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio);
 
 	return target_tsc - tsc;
 }
 
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 {
-	return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
+	return vcpu->arch.l1_tsc_offset +
+		kvm_scale_tsc(vcpu, host_tsc, vcpu->arch.l1_tsc_scaling_ratio);
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
@@ -2463,7 +2463,8 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 {
 	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
 		WARN_ON(adjustment < 0);
-	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
+	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment,
+				   vcpu->arch.l1_tsc_scaling_ratio);
 	adjust_tsc_offset_guest(vcpu, adjustment);
 }
 
@@ -2846,7 +2847,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	/* With all the info we got, fill in the values */
 
 	if (kvm_has_tsc_control)
-		tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz);
+		tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz,
+					    v->arch.l1_tsc_scaling_ratio);
 
 	if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
 		kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
@@ -3537,10 +3539,17 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * return L1's TSC value to ensure backwards-compatible
 		 * behavior for migration.
 		 */
-		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
-							    vcpu->arch.tsc_offset;
+		u64 offset, ratio;
 
-		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
+		if (msr_info->host_initiated) {
+			offset = vcpu->arch.l1_tsc_offset;
+			ratio = vcpu->arch.l1_tsc_scaling_ratio;
+		} else {
+			offset = vcpu->arch.tsc_offset;
+			ratio = vcpu->arch.tsc_scaling_ratio;
+		}
+
+		msr_info->data = kvm_scale_tsc(vcpu, rdtsc(), ratio) + offset;
 		break;
 	}
 	case MSR_MTRRcap:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 05/11] KVM: nVMX: Add a TSC multiplier field in VMCS12
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (3 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 04/11] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 06/11] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

This is required for supporting nested TSC scaling.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/vmcs12.c | 1 +
 arch/x86/kvm/vmx/vmcs12.h | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 034adb6404dc..d9f5d7c56ae3 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -37,6 +37,7 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(VM_ENTRY_MSR_LOAD_ADDR, vm_entry_msr_load_addr),
 	FIELD64(PML_ADDRESS, pml_address),
 	FIELD64(TSC_OFFSET, tsc_offset),
+	FIELD64(TSC_MULTIPLIER, tsc_multiplier),
 	FIELD64(VIRTUAL_APIC_PAGE_ADDR, virtual_apic_page_addr),
 	FIELD64(APIC_ACCESS_ADDR, apic_access_addr),
 	FIELD64(POSTED_INTR_DESC_ADDR, posted_intr_desc_addr),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 13494956d0e9..bb81a23afe89 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -70,7 +70,8 @@ struct __packed vmcs12 {
 	u64 eptp_list_address;
 	u64 pml_address;
 	u64 encls_exiting_bitmap;
-	u64 padding64[2]; /* room for future expansion */
+	u64 tsc_multiplier;
+	u64 padding64[1]; /* room for future expansion */
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -258,6 +259,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(eptp_list_address, 304);
 	CHECK_OFFSET(pml_address, 312);
 	CHECK_OFFSET(encls_exiting_bitmap, 320);
+	CHECK_OFFSET(tsc_multiplier, 328);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 06/11] KVM: X86: Add functions for retrieving L2 TSC fields from common code
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (4 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 05/11] KVM: nVMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 07/11] KVM: X86: Add functions that calculate the nested TSC fields Ilias Stamatis
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

In order to implement as much of the nested TSC scaling logic as
possible in common code, we need these vendor callbacks for retrieving
the TSC offset and the TSC multiplier that L1 has set for L2.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 ++
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/svm/svm.c             | 14 ++++++++++++++
 arch/x86/kvm/vmx/vmx.c             | 23 +++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h             |  3 +++
 5 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 323641097f63..2063616fba1c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -87,6 +87,8 @@ KVM_X86_OP(set_identity_map_addr)
 KVM_X86_OP(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP_NULL(has_wbinvd_exit)
+KVM_X86_OP(get_l2_tsc_offset)
+KVM_X86_OP(get_l2_tsc_multiplier)
 KVM_X86_OP(write_l1_tsc_offset)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b14c2b2b2e21..0f2cf5d1240c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1305,6 +1305,8 @@ struct kvm_x86_ops {
 
 	bool (*has_wbinvd_exit)(void);
 
+	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
+	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
 	/* Returns actual tsc_offset set in active VMCS */
 	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05eca131eaf2..ca70e46f9194 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1082,6 +1082,18 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type)
 	seg->base = 0;
 }
 
+static u64 svm_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->nested.ctl.tsc_offset;
+}
+
+static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	return kvm_default_tsc_scaling_ratio;
+}
+
 static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4526,6 +4538,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.has_wbinvd_exit = svm_has_wbinvd_exit,
 
+	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
 	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3e4dda8177bb..1c83605eccc1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1787,6 +1787,27 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 	vmx->guest_uret_msrs_loaded = false;
 }
 
+u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING))
+		return vmcs12->tsc_offset;
+
+	return 0;
+}
+
+u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING) &&
+	    nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
+		return vmcs12->tsc_multiplier;
+
+	return kvm_default_tsc_scaling_ratio;
+}
+
 static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -7700,6 +7721,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
+	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
 	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 16e4e457ba23..aa97c82e3451 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -404,6 +404,9 @@ void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
 void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
 void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
 
+u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
+u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
+
 static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 					     int type, bool value)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 07/11] KVM: X86: Add functions that calculate the nested TSC fields
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (5 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 06/11] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 08/11] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

When L2 is entered we need to "merge" the TSC multiplier and TSC offset
values of 01 and 12 together.

The merging is done using the following equations:
  offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
  mult_02 = (mult_01 * mult_12) >> shift_bits

Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0f2cf5d1240c..aaf756442ed1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1792,6 +1792,8 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
 
 u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
+u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
+u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8cf0dfb6264d..705de0badc9e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2334,6 +2334,31 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier)
+{
+	u64 nested_offset;
+
+	if (l2_multiplier == kvm_default_tsc_scaling_ratio)
+		nested_offset = l1_offset;
+	else
+		nested_offset = mul_s64_u64_shr((s64) l1_offset, l2_multiplier,
+						kvm_tsc_scaling_ratio_frac_bits);
+
+	nested_offset += l2_offset;
+	return nested_offset;
+}
+EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_offset);
+
+u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
+{
+	if (l2_multiplier != kvm_default_tsc_scaling_ratio)
+		return mul_u64_u64_shr(l1_multiplier, l2_multiplier,
+				       kvm_tsc_scaling_ratio_frac_bits);
+
+	return l1_multiplier;
+}
+EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 08/11] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (6 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 07/11] KVM: X86: Add functions that calculate the nested TSC fields Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier Ilias Stamatis
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

The write_l1_tsc_offset() callback has a misleading name. It does not
set L1's TSC offset, it rather updates the current TSC offset which
might be different if a nested guest is executing. Additionally, both
the vmx and svm implementations use the same logic for calculating the
current TSC before writing it to hardware.

Rename the function and move the common logic to the caller. The vmx/svm
specific code now merely sets the given offset to the corresponding
hardware structure.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/include/asm/kvm_host.h    |  3 +--
 arch/x86/kvm/svm/svm.c             | 21 ++++-----------------
 arch/x86/kvm/vmx/vmx.c             | 23 +++--------------------
 arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++---
 5 files changed, 30 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 2063616fba1c..029c9615378f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -89,7 +89,7 @@ KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP_NULL(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
-KVM_X86_OP(write_l1_tsc_offset)
+KVM_X86_OP(write_tsc_offset)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aaf756442ed1..f099277b993d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1307,8 +1307,7 @@ struct kvm_x86_ops {
 
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
 	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
-	/* Returns actual tsc_offset set in active VMCS */
-	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
 	/*
 	 * Retrieve somewhat arbitrary exit information.  Intended to be used
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca70e46f9194..8dfb2513b72a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1094,26 +1094,13 @@ static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_default_tsc_scaling_ratio;
 }
 
-static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	u64 g_tsc_offset = 0;
-
-	if (is_guest_mode(vcpu)) {
-		/* Write L1's TSC offset.  */
-		g_tsc_offset = svm->vmcb->control.tsc_offset -
-			       svm->vmcb01.ptr->control.tsc_offset;
-		svm->vmcb01.ptr->control.tsc_offset = offset;
-	}
-
-	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-				   svm->vmcb->control.tsc_offset - g_tsc_offset,
-				   offset);
-
-	svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
 
+	svm->vmcb01.ptr->control.tsc_offset = vcpu->arch.l1_tsc_offset;
+	svm->vmcb->control.tsc_offset = offset;
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
-	return svm->vmcb->control.tsc_offset;
 }
 
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
@@ -4540,7 +4527,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
-	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
+	.write_tsc_offset = svm_write_tsc_offset,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1c83605eccc1..4b70431c2edd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1808,26 +1808,9 @@ u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_default_tsc_scaling_ratio;
 }
 
-static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
-	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-	u64 g_tsc_offset = 0;
-
-	/*
-	 * We're here if L1 chose not to trap WRMSR to TSC. According
-	 * to the spec, this should set L1's TSC; The offset that L1
-	 * set for L2 remains unchanged, and still needs to be added
-	 * to the newly set TSC to get L2's TSC.
-	 */
-	if (is_guest_mode(vcpu) &&
-	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
-		g_tsc_offset = vmcs12->tsc_offset;
-
-	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-				   vcpu->arch.tsc_offset - g_tsc_offset,
-				   offset);
-	vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
-	return offset + g_tsc_offset;
+	vmcs_write64(TSC_OFFSET, offset);
 }
 
 /*
@@ -7723,7 +7706,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
-	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
+	.write_tsc_offset = vmx_write_tsc_offset,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 705de0badc9e..801fa1e8e915 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2359,10 +2359,28 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
 }
 EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
 
-static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
 {
-	vcpu->arch.l1_tsc_offset = offset;
-	vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
+	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
+				   vcpu->arch.l1_tsc_offset,
+				   l1_offset);
+
+	vcpu->arch.l1_tsc_offset = l1_offset;
+
+	/*
+	 * If we are here because L1 chose not to trap WRMSR to TSC then
+	 * according to the spec this should set L1's TSC (as opposed to
+	 * setting L1's offset for L2).
+	 */
+	if (is_guest_mode(vcpu))
+		vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
+			l1_offset,
+			static_call(kvm_x86_get_l2_tsc_offset)(vcpu),
+			static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
+	else
+		vcpu->arch.tsc_offset = l1_offset;
+
+	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }
 
 static inline bool kvm_check_tsc_unstable(void)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (7 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 08/11] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-27  8:33   ` Stamatis, Ilias
  2021-05-26 18:44 ` [PATCH v4 10/11] KVM: nVMX: Enable nested TSC scaling Ilias Stamatis
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the
VMCS every time the VMCS is loaded. Instead of doing this, set this
field from common code on initialization and whenever the scaling ratio
changes.

Additionally remove vmx->current_tsc_ratio. This field is redundant as
vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
ratio. The vmx->current_tsc_ratio field is only used for avoiding
unnecessary writes but it is no longer needed after removing the code
from the VMCS load path.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/svm/svm.c             |  6 ++++++
 arch/x86/kvm/vmx/nested.c          |  9 ++++-----
 arch/x86/kvm/vmx/vmx.c             | 11 ++++++-----
 arch/x86/kvm/vmx/vmx.h             |  8 --------
 arch/x86/kvm/x86.c                 | 28 +++++++++++++++++++++++-----
 7 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 029c9615378f..34ad7a17458a 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
 KVM_X86_OP(write_tsc_offset)
+KVM_X86_OP(write_tsc_multiplier)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f099277b993d..a334ce7741ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
 	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
 	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+	void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);
 
 	/*
 	 * Retrieve somewhat arbitrary exit information.  Intended to be used
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8dfb2513b72a..cb701b42b08b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1103,6 +1103,11 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
+static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+{
+	wrmsrl(MSR_AMD64_TSC_RATIO, multiplier);
+}
+
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
 static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 					      struct vcpu_svm *svm)
@@ -4528,6 +4533,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
 	.write_tsc_offset = svm_write_tsc_offset,
+	.write_tsc_multiplier = svm_write_tsc_multiplier,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..239154d3e4e7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2533,9 +2533,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	}
 
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
-
 	if (kvm_has_tsc_control)
-		decache_tsc_multiplier(vmx);
+		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
 
 	nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
 
@@ -4501,12 +4500,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
+	if (kvm_has_tsc_control)
+		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
+
 	if (vmx->nested.l1_tpr_threshold != -1)
 		vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);
 
-	if (kvm_has_tsc_control)
-		decache_tsc_multiplier(vmx);
-
 	if (vmx->nested.change_vmcs01_virtual_apic_mode) {
 		vmx->nested.change_vmcs01_virtual_apic_mode = false;
 		vmx_set_virtual_apic_mode(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4b70431c2edd..bf845a08995e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1390,11 +1390,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 
 		vmx->loaded_vmcs->cpu = cpu;
 	}
-
-	/* Setup TSC multiplier */
-	if (kvm_has_tsc_control &&
-	    vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
-		decache_tsc_multiplier(vmx);
 }
 
 /*
@@ -1813,6 +1808,11 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	vmcs_write64(TSC_OFFSET, offset);
 }
 
+static void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+{
+	vmcs_write64(TSC_MULTIPLIER, multiplier);
+}
+
 /*
  * nested_vmx_allowed() checks whether a guest should be allowed to use VMX
  * instructions and MSRs (i.e., nested VMX). Nested VMX is disabled for
@@ -7707,6 +7707,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
 	.write_tsc_offset = vmx_write_tsc_offset,
+	.write_tsc_multiplier = vmx_write_tsc_multiplier,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index aa97c82e3451..3eaa86a0ba3e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -322,8 +322,6 @@ struct vcpu_vmx {
 	/* apic deadline value in host tsc */
 	u64 hv_deadline_tsc;
 
-	u64 current_tsc_ratio;
-
 	unsigned long host_debugctlmsr;
 
 	/*
@@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
 			      GFP_KERNEL_ACCOUNT);
 }
 
-static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
-{
-	vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
-	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
-}
-
 static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
 {
 	return vmx->secondary_exec_control &
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 801fa1e8e915..8afe2c29183c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2179,14 +2179,15 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
 	return v;
 }
 
+static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier);
+
 static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 {
 	u64 ratio;
 
 	/* Guest TSC same frequency as host TSC? */
 	if (!scale) {
-		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
-		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+		kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
 		return 0;
 	}
 
@@ -2212,7 +2213,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 		return -1;
 	}
 
-	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
+	kvm_vcpu_write_tsc_multiplier(vcpu, ratio);
 	return 0;
 }
 
@@ -2224,8 +2225,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
 	/* tsc_khz can be zero if TSC calibration fails */
 	if (user_tsc_khz == 0) {
 		/* set tsc_scaling_ratio to a safe value */
-		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
-		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+		kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
 		return -1;
 	}
 
@@ -2383,6 +2383,23 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
 	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }
 
+static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
+{
+	vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
+
+	/* Userspace is changing the multiplier while L2 is active */
+	if (is_guest_mode(vcpu))
+		vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
+			l1_multiplier,
+			static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
+	else
+		vcpu->arch.tsc_scaling_ratio = l1_multiplier;
+
+	if (kvm_has_tsc_control)
+		static_call(kvm_x86_write_tsc_multiplier)(
+			vcpu, vcpu->arch.tsc_scaling_ratio);
+}
+
 static inline bool kvm_check_tsc_unstable(void)
 {
 #ifdef CONFIG_X86_64
@@ -10444,6 +10461,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 		return;
 	vcpu_load(vcpu);
 	kvm_synchronize_tsc(vcpu, 0);
+	kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
 	vcpu_put(vcpu);
 
 	/* poll control enabled by default */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 10/11] KVM: nVMX: Enable nested TSC scaling
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (8 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-05-26 18:44 ` [PATCH v4 11/11] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
  2021-06-15  8:44 ` [PATCH v4 00/11] KVM: Implement nested TSC scaling Stamatis, Ilias
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Calculate the TSC offset and multiplier on nested transitions and expose
the TSC scaling feature to L1.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 239154d3e4e7..e8183e224706 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2277,7 +2277,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 				  SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
-				  SECONDARY_EXEC_ENABLE_VMFUNC);
+				  SECONDARY_EXEC_ENABLE_VMFUNC |
+				  SECONDARY_EXEC_TSC_SCALING);
 		if (nested_cpu_has(vmcs12,
 				   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
 			exec_control |= vmcs12->secondary_vm_exec_control;
@@ -2532,6 +2533,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat);
 	}
 
+	vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
+			vcpu->arch.l1_tsc_offset,
+			vmx_get_l2_tsc_offset(vcpu),
+			vmx_get_l2_tsc_multiplier(vcpu));
+
+	vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
+			vcpu->arch.l1_tsc_scaling_ratio,
+			vmx_get_l2_tsc_multiplier(vcpu));
+
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
 	if (kvm_has_tsc_control)
 		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
@@ -3353,8 +3363,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	}
 
 	enter_guest_mode(vcpu);
-	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
-		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
 
 	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
 		exit_reason.basic = EXIT_REASON_INVALID_STATE;
@@ -4462,8 +4470,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	if (nested_cpu_has_preemption_timer(vmcs12))
 		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
 
-	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
-		vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
+	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING)) {
+		vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
+		if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
+			vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
+	}
 
 	if (likely(!vmx->fail)) {
 		sync_vmcs02_to_vmcs12(vcpu, vmcs12);
@@ -6473,7 +6484,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		SECONDARY_EXEC_RDRAND_EXITING |
 		SECONDARY_EXEC_ENABLE_INVPCID |
 		SECONDARY_EXEC_RDSEED_EXITING |
-		SECONDARY_EXEC_XSAVES;
+		SECONDARY_EXEC_XSAVES |
+		SECONDARY_EXEC_TSC_SCALING;
 
 	/*
 	 * We can emulate "VMCS shadowing," even if the hardware
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 11/11] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (9 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 10/11] KVM: nVMX: Enable nested TSC scaling Ilias Stamatis
@ 2021-05-26 18:44 ` Ilias Stamatis
  2021-06-15  8:44 ` [PATCH v4 00/11] KVM: Implement nested TSC scaling Stamatis, Ilias
  11 siblings, 0 replies; 17+ messages in thread
From: Ilias Stamatis @ 2021-05-26 18:44 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Test that nested TSC scaling works as expected with both L1 and L2
scaled.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
 3 files changed, 244 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..cc02022f9951 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -29,6 +29,7 @@
 /x86_64/vmx_preemption_timer_test
 /x86_64/vmx_set_nested_state_test
 /x86_64/vmx_tsc_adjust_test
+/x86_64/vmx_nested_tsc_scaling_test
 /x86_64/xapic_ipi_test
 /x86_64/xen_shinfo_test
 /x86_64/xen_vmcall_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..1078240b1313 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -60,6 +60,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
 TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
 TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
new file mode 100644
index 000000000000..280c01fd2412
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vmx_nested_tsc_scaling_test
+ *
+ * Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This test case verifies that nested TSC scaling behaves as expected when
+ * both L1 and L2 are scaled using different ratios. For this test we scale
+ * L1 down and scale L2 up.
+ */
+
+#include <time.h>
+
+#include "kvm_util.h"
+#include "vmx.h"
+#include "kselftest.h"
+
+
+#define VCPU_ID 0
+
+/* L2 is scaled up (from L1's perspective) by this factor */
+#define L2_SCALE_FACTOR 4ULL
+
+#define TSC_OFFSET_L2 ((uint64_t) -33125236320908)
+#define TSC_MULTIPLIER_L2 (L2_SCALE_FACTOR << 48)
+
+#define L2_GUEST_STACK_SIZE 64
+
+enum { USLEEP, UCHECK_L1, UCHECK_L2 };
+#define GUEST_SLEEP(sec)         ucall(UCALL_SYNC, 2, USLEEP, sec)
+#define GUEST_CHECK(level, freq) ucall(UCALL_SYNC, 2, level, freq)
+
+
+/*
+ * This function checks whether the "actual" TSC frequency of a guest matches
+ * its expected frequency. In order to account for delays in taking the TSC
+ * measurements, a difference of 1% between the actual and the expected value
+ * is tolerated.
+ */
+static void compare_tsc_freq(uint64_t actual, uint64_t expected)
+{
+	uint64_t tolerance, thresh_low, thresh_high;
+
+	tolerance = expected / 100;
+	thresh_low = expected - tolerance;
+	thresh_high = expected + tolerance;
+
+	TEST_ASSERT(thresh_low < actual,
+		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
+		" but it actually is %"PRIu64,
+		thresh_low, thresh_high, actual);
+	TEST_ASSERT(thresh_high > actual,
+		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
+		" but it actually is %"PRIu64,
+		thresh_low, thresh_high, actual);
+}
+
+static void check_tsc_freq(int level)
+{
+	uint64_t tsc_start, tsc_end, tsc_freq;
+
+	/*
+	 * Reading the TSC twice with about a second's difference should give
+	 * us an approximation of the TSC frequency from the guest's
+	 * perspective. Now, this won't be completely accurate, but it should
+	 * be good enough for the purposes of this test.
+	 */
+	tsc_start = rdmsr(MSR_IA32_TSC);
+	GUEST_SLEEP(1);
+	tsc_end = rdmsr(MSR_IA32_TSC);
+
+	tsc_freq = tsc_end - tsc_start;
+
+	GUEST_CHECK(level, tsc_freq);
+}
+
+static void l2_guest_code(void)
+{
+	check_tsc_freq(UCHECK_L2);
+
+	/* exit to L1 */
+	__asm__ __volatile__("vmcall");
+}
+
+static void l1_guest_code(struct vmx_pages *vmx_pages)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	uint32_t control;
+
+	/* check that L1's frequency looks alright before launching L2 */
+	check_tsc_freq(UCHECK_L1);
+
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+
+	/* prepare the VMCS for L2 execution */
+	prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+	/* enable TSC offsetting and TSC scaling for L2 */
+	control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
+	control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
+	vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
+
+	control = vmreadz(SECONDARY_VM_EXEC_CONTROL);
+	control |= SECONDARY_EXEC_TSC_SCALING;
+	vmwrite(SECONDARY_VM_EXEC_CONTROL, control);
+
+	vmwrite(TSC_OFFSET, TSC_OFFSET_L2);
+	vmwrite(TSC_MULTIPLIER, TSC_MULTIPLIER_L2);
+	vmwrite(TSC_MULTIPLIER_HIGH, TSC_MULTIPLIER_L2 >> 32);
+
+	/* launch L2 */
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+
+	/* check that L1's frequency still looks good */
+	check_tsc_freq(UCHECK_L1);
+
+	GUEST_DONE();
+}
+
+static void tsc_scaling_check_supported(void)
+{
+	if (!kvm_check_cap(KVM_CAP_TSC_CONTROL)) {
+		print_skip("TSC scaling not supported by the HW");
+		exit(KSFT_SKIP);
+	}
+}
+
+static void stable_tsc_check_supported(void)
+{
+	FILE *fp;
+	char buf[4];
+
+	fp = fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", "r");
+	if (fp == NULL)
+		goto skip_test;
+
+	if (fgets(buf, sizeof(buf), fp) == NULL)
+		goto skip_test;
+
+	if (strncmp(buf, "tsc", sizeof(buf)))
+		goto skip_test;
+
+	return;
+skip_test:
+	print_skip("Kernel does not use TSC clocksource - assuming that host TSC is not stable");
+	exit(KSFT_SKIP);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	vm_vaddr_t vmx_pages_gva;
+
+	uint64_t tsc_start, tsc_end;
+	uint64_t tsc_khz;
+	uint64_t l1_scale_factor;
+	uint64_t l0_tsc_freq = 0;
+	uint64_t l1_tsc_freq = 0;
+	uint64_t l2_tsc_freq = 0;
+
+	nested_vmx_check_supported();
+	tsc_scaling_check_supported();
+	stable_tsc_check_supported();
+
+	/*
+	 * We set L1's scale factor to be a random number from 2 to 10.
+	 * Ideally we would do the same for L2's factor but that one is
+	 * referenced by both main() and l1_guest_code() and using a global
+	 * variable does not work.
+	 */
+	srand(time(NULL));
+	l1_scale_factor = (rand() % 9) + 2;
+	printf("L1's scale down factor is: %"PRIu64"\n", l1_scale_factor);
+	printf("L2's scale up factor is: %llu\n", L2_SCALE_FACTOR);
+
+	tsc_start = rdtsc();
+	sleep(1);
+	tsc_end = rdtsc();
+
+	l0_tsc_freq = tsc_end - tsc_start;
+	printf("real TSC frequency is around: %"PRIu64"\n", l0_tsc_freq);
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code);
+	vcpu_alloc_vmx(vm, &vmx_pages_gva);
+	vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+
+	tsc_khz = _vcpu_ioctl(vm, VCPU_ID, KVM_GET_TSC_KHZ, NULL);
+	TEST_ASSERT(tsc_khz != -1, "vcpu ioctl KVM_GET_TSC_KHZ failed");
+
+	/* scale down L1's TSC frequency */
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_TSC_KHZ,
+		  (void *) (tsc_khz / l1_scale_factor));
+
+	for (;;) {
+		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+		struct ucall uc;
+
+		vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+			    run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *) uc.args[0]);
+		case UCALL_SYNC:
+			switch (uc.args[0]) {
+			case USLEEP:
+				sleep(uc.args[1]);
+				break;
+			case UCHECK_L1:
+				l1_tsc_freq = uc.args[1];
+				printf("L1's TSC frequency is around: %"PRIu64
+				       "\n", l1_tsc_freq);
+
+				compare_tsc_freq(l1_tsc_freq,
+						 l0_tsc_freq / l1_scale_factor);
+				break;
+			case UCHECK_L2:
+				l2_tsc_freq = uc.args[1];
+				printf("L2's TSC frequency is around: %"PRIu64
+				       "\n", l2_tsc_freq);
+
+				compare_tsc_freq(l2_tsc_freq,
+						 l1_tsc_freq * L2_SCALE_FACTOR);
+				break;
+			}
+			break;
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		}
+	}
+
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier
  2021-05-26 18:44 ` [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier Ilias Stamatis
@ 2021-05-27  8:33   ` Stamatis, Ilias
  2021-05-27 13:08     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Stamatis, Ilias @ 2021-05-27  8:33 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini
  Cc: jmattson, Woodhouse, David, vkuznets, joro, mtosatti, zamsden,
	seanjc, mlevitsk, wanpengli

On Wed, 2021-05-26 at 19:44 +0100, Ilias Stamatis wrote:
> Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the
> VMCS every time the VMCS is loaded. Instead of doing this, set this
> field from common code on initialization and whenever the scaling ratio
> changes.
> 
> Additionally remove vmx->current_tsc_ratio. This field is redundant as
> vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
> ratio. The vmx->current_tsc_ratio field is only used for avoiding
> unnecessary writes but it is no longer needed after removing the code
> from the VMCS load path.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/svm/svm.c             |  6 ++++++
>  arch/x86/kvm/vmx/nested.c          |  9 ++++-----
>  arch/x86/kvm/vmx/vmx.c             | 11 ++++++-----
>  arch/x86/kvm/vmx/vmx.h             |  8 --------
>  arch/x86/kvm/x86.c                 | 28 +++++++++++++++++++++++-----
>  7 files changed, 41 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 029c9615378f..34ad7a17458a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
>  KVM_X86_OP(get_l2_tsc_offset)
>  KVM_X86_OP(get_l2_tsc_multiplier)
>  KVM_X86_OP(write_tsc_offset)
> +KVM_X86_OP(write_tsc_multiplier)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
>  KVM_X86_OP(handle_exit_irqoff)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f099277b993d..a334ce7741ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
>  	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
>  	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
>  	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> +	void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);
>  
>  	/*
>  	 * Retrieve somewhat arbitrary exit information.  Intended to be used
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8dfb2513b72a..cb701b42b08b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1103,6 +1103,11 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
>  }
>  
> +static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
> +{
> +	wrmsrl(MSR_AMD64_TSC_RATIO, multiplier);
> +}
> +
>  /* Evaluate instruction intercepts that depend on guest CPUID features. */
>  static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
>  					      struct vcpu_svm *svm)
> @@ -4528,6 +4533,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
>  	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
>  	.write_tsc_offset = svm_write_tsc_offset,
> +	.write_tsc_multiplier = svm_write_tsc_multiplier,
>  
>  	.load_mmu_pgd = svm_load_mmu_pgd,
>  
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..239154d3e4e7 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2533,9 +2533,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  	}
>  
>  	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
> -
>  	if (kvm_has_tsc_control)
> -		decache_tsc_multiplier(vmx);
> +		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
>  
>  	nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
>  
> @@ -4501,12 +4500,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
>  	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
>  	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
> +	if (kvm_has_tsc_control)
> +		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> +
>  	if (vmx->nested.l1_tpr_threshold != -1)
>  		vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);
>  
> -	if (kvm_has_tsc_control)
> -		decache_tsc_multiplier(vmx);
> -
>  	if (vmx->nested.change_vmcs01_virtual_apic_mode) {
>  		vmx->nested.change_vmcs01_virtual_apic_mode = false;
>  		vmx_set_virtual_apic_mode(vcpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4b70431c2edd..bf845a08995e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1390,11 +1390,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
>  
>  		vmx->loaded_vmcs->cpu = cpu;
>  	}
> -
> -	/* Setup TSC multiplier */
> -	if (kvm_has_tsc_control &&
> -	    vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> -		decache_tsc_multiplier(vmx);
>  }
>  
>  /*
> @@ -1813,6 +1808,11 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  	vmcs_write64(TSC_OFFSET, offset);
>  }
>  
> +static void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
> +{
> +	vmcs_write64(TSC_MULTIPLIER, multiplier);
> +}
> +
>  /*
>   * nested_vmx_allowed() checks whether a guest should be allowed to use VMX
>   * instructions and MSRs (i.e., nested VMX). Nested VMX is disabled for
> @@ -7707,6 +7707,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>  	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
>  	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
>  	.write_tsc_offset = vmx_write_tsc_offset,
> +	.write_tsc_multiplier = vmx_write_tsc_multiplier,
>  
>  	.load_mmu_pgd = vmx_load_mmu_pgd,
>  
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index aa97c82e3451..3eaa86a0ba3e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -322,8 +322,6 @@ struct vcpu_vmx {
>  	/* apic deadline value in host tsc */
>  	u64 hv_deadline_tsc;
>  
> -	u64 current_tsc_ratio;
> -
>  	unsigned long host_debugctlmsr;
>  
>  	/*
> @@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
>  			      GFP_KERNEL_ACCOUNT);
>  }
>  
> -static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
> -{
> -	vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
> -	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
> -}
> -
>  static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
>  {
>  	return vmx->secondary_exec_control &
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 801fa1e8e915..8afe2c29183c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2179,14 +2179,15 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
>  	return v;
>  }
>  
> +static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier);
> +
>  static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  {
>  	u64 ratio;
>  
>  	/* Guest TSC same frequency as host TSC? */
>  	if (!scale) {
> -		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> -		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> +		kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
>  		return 0;
>  	}
>  
> @@ -2212,7 +2213,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  		return -1;
>  	}
>  
> -	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
> +	kvm_vcpu_write_tsc_multiplier(vcpu, ratio);
>  	return 0;
>  }
>  
> @@ -2224,8 +2225,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
>  	/* tsc_khz can be zero if TSC calibration fails */
>  	if (user_tsc_khz == 0) {
>  		/* set tsc_scaling_ratio to a safe value */
> -		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> -		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> +		kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
>  		return -1;
>  	}
>  
> @@ -2383,6 +2383,23 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
>  	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
>  }
>  
> +static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
> +{
> +	vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
> +
> +	/* Userspace is changing the multiplier while L2 is active */
> +	if (is_guest_mode(vcpu))
> +		vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
> +			l1_multiplier,
> +			static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
> +	else
> +		vcpu->arch.tsc_scaling_ratio = l1_multiplier;
> +
> +	if (kvm_has_tsc_control)
> +		static_call(kvm_x86_write_tsc_multiplier)(
> +			vcpu, vcpu->arch.tsc_scaling_ratio);
> +}
> +
>  static inline bool kvm_check_tsc_unstable(void)
>  {
>  #ifdef CONFIG_X86_64
> @@ -10444,6 +10461,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>  		return;
>  	vcpu_load(vcpu);
>  	kvm_synchronize_tsc(vcpu, 0);
> +	kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);

Hmm, I'm actually thinking now that this might not be correct. For example in
case we hotplug a new vCPU but the other vCPUs don't use the default ratio.

How about keeping vmx->current_tsc_ratio after all? Apparently this is another
role it plays. Or move it to kvm->arch?

>  	vcpu_put(vcpu);
>  
>  	/* poll control enabled by default */


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier
  2021-05-27  8:33   ` Stamatis, Ilias
@ 2021-05-27 13:08     ` Paolo Bonzini
  2021-05-28 10:44       ` Stamatis, Ilias
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2021-05-27 13:08 UTC (permalink / raw)
  To: Stamatis, Ilias, linux-kernel, kvm
  Cc: jmattson, Woodhouse, David, vkuznets, joro, mtosatti, zamsden,
	seanjc, mlevitsk, wanpengli

On 27/05/21 10:33, Stamatis, Ilias wrote:
>>   #ifdef CONFIG_X86_64
>> @@ -10444,6 +10461,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>>   		return;
>>   	vcpu_load(vcpu);
>>   	kvm_synchronize_tsc(vcpu, 0);
>> +	kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
> Hmm, I'm actually thinking now that this might not be correct. For example in
> case we hotplug a new vCPU but the other vCPUs don't use the default ratio.

It is correct, the TSC frequency can be set per CPU (which is useless 
except possibly for debugging OS timekeeping, but still).  So, the 
default kHz after hotplug is the host frequency.

It doesn't really matter because it only affects the fixed delta between 
the hotplugged CPU and the others as soon as userspace sets the 
frequency to the correct value.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier
  2021-05-27 13:08     ` Paolo Bonzini
@ 2021-05-28 10:44       ` Stamatis, Ilias
  0 siblings, 0 replies; 17+ messages in thread
From: Stamatis, Ilias @ 2021-05-28 10:44 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini
  Cc: jmattson, Woodhouse, David, vkuznets, joro, mtosatti, zamsden,
	seanjc, mlevitsk, wanpengli

On Thu, 2021-05-27 at 15:08 +0200, Paolo Bonzini wrote:
> On 27/05/21 10:33, Stamatis, Ilias wrote:
> > >   #ifdef CONFIG_X86_64
> > > @@ -10444,6 +10461,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
> > >              return;
> > >      vcpu_load(vcpu);
> > >      kvm_synchronize_tsc(vcpu, 0);
> > > +    kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
> > 
> > Hmm, I'm actually thinking now that this might not be correct. For example in
> > case we hotplug a new vCPU but the other vCPUs don't use the default ratio.
> 
> It is correct, the TSC frequency can be set per CPU (which is useless
> except possibly for debugging OS timekeeping, but still).  So, the
> default kHz after hotplug is the host frequency.
> 
> It doesn't really matter because it only affects the fixed delta between
> the hotplugged CPU and the others as soon as userspace sets the
> frequency to the correct value.
> 
> Paolo
> 

So this patch is wrong anyway. 

kvm_arch_vcpu_create() does a kvm_set_tsc_khz(vcpu, max_tsc_khz) when
initializing the vcpu. This wouldn't normally result in a VMWRITE, but now
(after applying patch 9) it does. The problem is that this write now happens too
early and it raises an exception. To fix this, that line needs to be moved to
kvm_arch_vcpu_postcreate() (like above) but before calling
kvm_synchronize_tsc(vcpu, 0).

I will re-submit this patch with the fix.

Best,
Ilias



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 00/11] KVM: Implement nested TSC scaling
  2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (10 preceding siblings ...)
  2021-05-26 18:44 ` [PATCH v4 11/11] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
@ 2021-06-15  8:44 ` Stamatis, Ilias
  2021-06-15  9:33   ` Stamatis, Ilias
  11 siblings, 1 reply; 17+ messages in thread
From: Stamatis, Ilias @ 2021-06-15  8:44 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini
  Cc: jmattson, Woodhouse, David, vkuznets, joro, mtosatti, zamsden,
	seanjc, mlevitsk, wanpengli

On Wed, 2021-05-26 at 19:44 +0100, Ilias Stamatis wrote:
> KVM currently supports hardware-assisted TSC scaling but only for L1;
> the feature is not exposed to nested guests. This patch series adds
> support for nested TSC scaling and allows both L1 and L2 to be scaled
> with different scaling factors. That is achieved by "merging" the 01 and
> 02 values together.
> 
> Most of the logic in this series is implemented in common code (by doing
> the necessary restructurings), however the patches add support for VMX
> only. Adding support for SVM should be easy at this point and Maxim
> Levitsky has volunteered to do this (thanks!).
> 
> Changelog:
> 
> Only patch 9 needs reviewing at this point, but I am re-sending the
> whole series as I also applied nitpicks suggested to some of the other
> pathces.
> 
> v4:
>   - Added vendor callbacks for writing the TSC multiplier
>   - Moved the VMCS multiplier writes from the VMCS load path to common
>     code that only gets called on TSC ratio changes
>   - Merged together patches 10 and 11 of v3
>   - Applied all nitpick-feedback of the previous versions
> 
> v3:
>   - Applied Sean's feedback
>   - Refactored patches 7 to 10
> 
> v2:
>   - Applied all of Maxim's feedback
>   - Added a mul_s64_u64_shr function in math64.h
>   - Added a separate kvm_scale_tsc_l1 function instead of passing an
>     argument to kvm_scale_tsc
>   - Implemented the 02 fields calculations in common code
>   - Moved all of write_l1_tsc_offset's logic to common code
>   - Added a check for whether the TSC is stable in patch 10
>   - Used a random L1 factor and a negative offset in patch 10
> 
> Ilias Stamatis (11):
>   math64.h: Add mul_s64_u64_shr()
>   KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
>   KVM: X86: Rename kvm_compute_tsc_offset() to
>     kvm_compute_l1_tsc_offset()
>   KVM: X86: Add a ratio parameter to kvm_scale_tsc()
>   KVM: nVMX: Add a TSC multiplier field in VMCS12
>   KVM: X86: Add functions for retrieving L2 TSC fields from common code
>   KVM: X86: Add functions that calculate the nested TSC fields
>   KVM: X86: Move write_l1_tsc_offset() logic to common code and rename
>     it
>   KVM: X86: Add vendor callbacks for writing the TSC multiplier
>   KVM: nVMX: Enable nested TSC scaling
>   KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
> 
>  arch/x86/include/asm/kvm-x86-ops.h            |   5 +-
>  arch/x86/include/asm/kvm_host.h               |  15 +-
>  arch/x86/kvm/svm/svm.c                        |  35 ++-
>  arch/x86/kvm/vmx/nested.c                     |  33 ++-
>  arch/x86/kvm/vmx/vmcs12.c                     |   1 +
>  arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
>  arch/x86/kvm/vmx/vmx.c                        |  55 ++--
>  arch/x86/kvm/vmx/vmx.h                        |  11 +-
>  arch/x86/kvm/x86.c                            | 114 +++++++--
>  include/linux/math64.h                        |  19 ++
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
>  13 files changed, 451 insertions(+), 85 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
> 

Hello,

What is the status of these? I think all patches have been reviewed and
approved at this point.

(There's a new revision v6 of patch 9 that has been reviewed too)

Thanks,
Ilias

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 00/11] KVM: Implement nested TSC scaling
  2021-06-15  8:44 ` [PATCH v4 00/11] KVM: Implement nested TSC scaling Stamatis, Ilias
@ 2021-06-15  9:33   ` Stamatis, Ilias
  0 siblings, 0 replies; 17+ messages in thread
From: Stamatis, Ilias @ 2021-06-15  9:33 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini
  Cc: jmattson, Woodhouse, David, vkuznets, joro, mtosatti, zamsden,
	seanjc, mlevitsk, wanpengli

On Tue, 2021-06-15 at 09:43 +0100, Ilias Stamatis wrote:
> On Wed, 2021-05-26 at 19:44 +0100, Ilias Stamatis wrote:
> > KVM currently supports hardware-assisted TSC scaling but only for L1;
> > the feature is not exposed to nested guests. This patch series adds
> > support for nested TSC scaling and allows both L1 and L2 to be scaled
> > with different scaling factors. That is achieved by "merging" the 01 and
> > 02 values together.
> > 
> > Most of the logic in this series is implemented in common code (by doing
> > the necessary restructurings), however the patches add support for VMX
> > only. Adding support for SVM should be easy at this point and Maxim
> > Levitsky has volunteered to do this (thanks!).
> > 
> > Changelog:
> > 
> > Only patch 9 needs reviewing at this point, but I am re-sending the
> > whole series as I also applied nitpicks suggested to some of the other
> > pathces.
> > 
> > v4:
> >   - Added vendor callbacks for writing the TSC multiplier
> >   - Moved the VMCS multiplier writes from the VMCS load path to common
> >     code that only gets called on TSC ratio changes
> >   - Merged together patches 10 and 11 of v3
> >   - Applied all nitpick-feedback of the previous versions
> > 
> > v3:
> >   - Applied Sean's feedback
> >   - Refactored patches 7 to 10
> > 
> > v2:
> >   - Applied all of Maxim's feedback
> >   - Added a mul_s64_u64_shr function in math64.h
> >   - Added a separate kvm_scale_tsc_l1 function instead of passing an
> >     argument to kvm_scale_tsc
> >   - Implemented the 02 fields calculations in common code
> >   - Moved all of write_l1_tsc_offset's logic to common code
> >   - Added a check for whether the TSC is stable in patch 10
> >   - Used a random L1 factor and a negative offset in patch 10
> > 
> > Ilias Stamatis (11):
> >   math64.h: Add mul_s64_u64_shr()
> >   KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
> >   KVM: X86: Rename kvm_compute_tsc_offset() to
> >     kvm_compute_l1_tsc_offset()
> >   KVM: X86: Add a ratio parameter to kvm_scale_tsc()
> >   KVM: nVMX: Add a TSC multiplier field in VMCS12
> >   KVM: X86: Add functions for retrieving L2 TSC fields from common code
> >   KVM: X86: Add functions that calculate the nested TSC fields
> >   KVM: X86: Move write_l1_tsc_offset() logic to common code and rename
> >     it
> >   KVM: X86: Add vendor callbacks for writing the TSC multiplier
> >   KVM: nVMX: Enable nested TSC scaling
> >   KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
> > 
> >  arch/x86/include/asm/kvm-x86-ops.h            |   5 +-
> >  arch/x86/include/asm/kvm_host.h               |  15 +-
> >  arch/x86/kvm/svm/svm.c                        |  35 ++-
> >  arch/x86/kvm/vmx/nested.c                     |  33 ++-
> >  arch/x86/kvm/vmx/vmcs12.c                     |   1 +
> >  arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
> >  arch/x86/kvm/vmx/vmx.c                        |  55 ++--
> >  arch/x86/kvm/vmx/vmx.h                        |  11 +-
> >  arch/x86/kvm/x86.c                            | 114 +++++++--
> >  include/linux/math64.h                        |  19 ++
> >  tools/testing/selftests/kvm/.gitignore        |   1 +
> >  tools/testing/selftests/kvm/Makefile          |   1 +
> >  .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
> >  13 files changed, 451 insertions(+), 85 deletions(-)
> >  create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
> > 
> 
> Hello,
> 
> What is the status of these? I think all patches have been reviewed and
> approved at this point.
> 
> (There's a new revision v6 of patch 9 that has been reviewed too)
> 
> Thanks,
> Ilias

Never mind this, I was probably checking at the wrong place, I just realised
that they have been queued since last Friday.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-15  9:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26 18:44 [PATCH v4 00/11] KVM: Implement nested TSC scaling Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 01/11] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 02/11] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 03/11] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_l1_tsc_offset() Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 04/11] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 05/11] KVM: nVMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 06/11] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 07/11] KVM: X86: Add functions that calculate the nested TSC fields Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 08/11] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 09/11] KVM: X86: Add vendor callbacks for writing the TSC multiplier Ilias Stamatis
2021-05-27  8:33   ` Stamatis, Ilias
2021-05-27 13:08     ` Paolo Bonzini
2021-05-28 10:44       ` Stamatis, Ilias
2021-05-26 18:44 ` [PATCH v4 10/11] KVM: nVMX: Enable nested TSC scaling Ilias Stamatis
2021-05-26 18:44 ` [PATCH v4 11/11] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
2021-06-15  8:44 ` [PATCH v4 00/11] KVM: Implement nested TSC scaling Stamatis, Ilias
2021-06-15  9:33   ` Stamatis, Ilias

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).