All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/12] KVM: Implement nested TSC scaling
@ 2021-05-21 10:24 Ilias Stamatis
  2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
                   ` (12 more replies)
  0 siblings, 13 replies; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

KVM currently supports hardware-assisted TSC scaling but only for L1;
the feature is not exposed to nested guests. This patch series adds
support for nested TSC scaling and allows both L1 and L2 to be scaled
with different scaling factors. That is achieved by "merging" the 01 and
02 values together.

Most of the logic in this series is implemented in common code (by doing
the necessary restructurings), however the patches add support for VMX
only. Adding support for SVM should be easy at this point and Maxim
Levitsky has volunteered to do this (thanks!).

Changelog:
v3:
  - Applied Sean's feedback
  - Refactored patches 7 to 10

v2:
  - Applied all of Maxim's feedback
  - Added a mul_s64_u64_shr function in math64.h
  - Added a separate kvm_scale_tsc_l1 function instead of passing an
    argument to kvm_scale_tsc
  - Implemented the 02 fields calculations in common code
  - Moved all of write_l1_tsc_offset's logic to common code
  - Added a check for whether the TSC is stable in patch 10
  - Used a random L1 factor and a negative offset in patch 10

Ilias Stamatis (12):
  math64.h: Add mul_s64_u64_shr()
  KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  KVM: X86: Rename kvm_compute_tsc_offset() to
    kvm_compute_tsc_offset_l1()
  KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  KVM: VMX: Add a TSC multiplier field in VMCS12
  KVM: X86: Add functions for retrieving L2 TSC fields from common code
  KVM: X86: Add functions that calculate L2's TSC offset and multiplier
  KVM: X86: Move write_l1_tsc_offset() logic to common code and rename
    it
  KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  KVM: VMX: Expose TSC scaling to L2
  KVM: selftests: x86: Add vmx_nested_tsc_scaling_test

 arch/x86/include/asm/kvm-x86-ops.h            |   4 +-
 arch/x86/include/asm/kvm_host.h               |  14 +-
 arch/x86/kvm/svm/svm.c                        |  29 ++-
 arch/x86/kvm/vmx/nested.c                     |  33 ++-
 arch/x86/kvm/vmx/vmcs12.c                     |   1 +
 arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
 arch/x86/kvm/vmx/vmx.c                        |  49 ++--
 arch/x86/kvm/vmx/vmx.h                        |  11 +-
 arch/x86/kvm/x86.c                            |  91 +++++--
 include/linux/math64.h                        |  19 ++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
 13 files changed, 417 insertions(+), 82 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c

--
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr()
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:49   ` Maxim Levitsky
  2021-05-21 10:24 ` [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

This function is needed for KVM's nested virtualization. The nested TSC
scaling implementation requires multiplying the signed TSC offset with
the unsigned TSC multiplier.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 include/linux/math64.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/math64.h b/include/linux/math64.h
index 66deb1fdc2ef..2928f03d6d46 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -3,6 +3,7 @@
 #define _LINUX_MATH64_H
 
 #include <linux/types.h>
+#include <linux/math.h>
 #include <vdso/math64.h>
 #include <asm/div64.h>
 
@@ -234,6 +235,24 @@ static inline u64 mul_u64_u64_shr(u64 a, u64 b, unsigned int shift)
 
 #endif
 
+#ifndef mul_s64_u64_shr
+static inline u64 mul_s64_u64_shr(s64 a, u64 b, unsigned int shift)
+{
+	u64 ret;
+
+	/*
+	 * Extract the sign before the multiplication and put it back
+	 * afterwards if needed.
+	 */
+	ret = mul_u64_u64_shr(abs(a), b, shift);
+
+	if (a < 0)
+		ret = -((s64) ret);
+
+	return ret;
+}
+#endif /* mul_s64_u64_shr */
+
 #ifndef mul_u64_u32_div
 static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
  2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:49   ` Maxim Levitsky
  2021-05-21 10:24 ` [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1() Ilias Stamatis
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Store L1's scaling ratio in the kvm_vcpu_arch struct like we already do
for L1's TSC offset. This allows for easy save/restore when we enter and
then exit the nested guest.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++--
 arch/x86/kvm/vmx/vmx.c          | 4 ++--
 arch/x86/kvm/x86.c              | 6 ++++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..7dfc609eacd6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -707,7 +707,7 @@ struct kvm_vcpu_arch {
 	} st;
 
 	u64 l1_tsc_offset;
-	u64 tsc_offset;
+	u64 tsc_offset; /* current tsc offset */
 	u64 last_guest_tsc;
 	u64 last_host_tsc;
 	u64 tsc_offset_adjustment;
@@ -721,7 +721,8 @@ struct kvm_vcpu_arch {
 	u32 virtual_tsc_khz;
 	s64 ia32_tsc_adjust_msr;
 	u64 msr_ia32_power_ctl;
-	u64 tsc_scaling_ratio;
+	u64 l1_tsc_scaling_ratio;
+	u64 tsc_scaling_ratio; /* current scaling ratio */
 
 	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
 	unsigned nmi_pending; /* NMI queued after currently running handler */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bceb5ca3a89..3e4dda8177bb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7453,10 +7453,10 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 		delta_tsc = 0;
 
 	/* Convert to host delta tsc if tsc scaling is enabled */
-	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
+	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
 	    delta_tsc && u64_shl_div_u64(delta_tsc,
 				kvm_tsc_scaling_ratio_frac_bits,
-				vcpu->arch.tsc_scaling_ratio, &delta_tsc))
+				vcpu->arch.l1_tsc_scaling_ratio, &delta_tsc))
 		return -ERANGE;
 
 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbc4e04e67ad..6ab95ac188a5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2185,6 +2185,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 
 	/* Guest TSC same frequency as host TSC? */
 	if (!scale) {
+		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return 0;
 	}
@@ -2211,7 +2212,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 		return -1;
 	}
 
-	vcpu->arch.tsc_scaling_ratio = ratio;
+	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
 	return 0;
 }
 
@@ -2223,6 +2224,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
 	/* tsc_khz can be zero if TSC calibration fails */
 	if (user_tsc_khz == 0) {
 		/* set tsc_scaling_ratio to a safe value */
+		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return -1;
 	}
@@ -2459,7 +2461,7 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
 
 static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 {
-	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
+	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
 		WARN_ON(adjustment < 0);
 	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
 	adjust_tsc_offset_guest(vcpu, adjustment);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1()
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
  2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
  2021-05-21 10:24 ` [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 14:21   ` Paolo Bonzini
  2021-05-21 10:24 ` [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

All existing code uses kvm_compute_tsc_offset() passing L1 TSC values to
it. Let's document this by renaming it to kvm_compute_tsc_offset_l1().

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/kvm/x86.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6ab95ac188a5..ac644a1c3285 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2319,7 +2319,7 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_scale_tsc);
 
-static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+static u64 kvm_compute_tsc_offset_l1(struct kvm_vcpu *vcpu, u64 target_tsc)
 {
 	u64 tsc;
 
@@ -2363,7 +2363,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
-	offset = kvm_compute_tsc_offset(vcpu, data);
+	offset = kvm_compute_tsc_offset_l1(vcpu, data);
 	ns = get_kvmclock_base_ns();
 	elapsed = ns - kvm->arch.last_tsc_nsec;
 
@@ -2402,7 +2402,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 		} else {
 			u64 delta = nsec_to_cycles(vcpu, elapsed);
 			data += delta;
-			offset = kvm_compute_tsc_offset(vcpu, data);
+			offset = kvm_compute_tsc_offset_l1(vcpu, data);
 		}
 		matched = true;
 		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
@@ -3235,7 +3235,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr_info->host_initiated) {
 			kvm_synchronize_tsc(vcpu, data);
 		} else {
-			u64 adj = kvm_compute_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
+			u64 adj = kvm_compute_tsc_offset_l1(vcpu, data) - vcpu->arch.l1_tsc_offset;
 			adjust_tsc_offset_guest(vcpu, adj);
 			vcpu->arch.ia32_tsc_adjust_msr += adj;
 		}
@@ -4123,7 +4123,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
 		if (kvm_check_tsc_unstable()) {
-			u64 offset = kvm_compute_tsc_offset(vcpu,
+			u64 offset = kvm_compute_tsc_offset_l1(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
 			vcpu->arch.tsc_catchup = 1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (2 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1() Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 14:23   ` Paolo Bonzini
  2021-05-21 10:24 ` [PATCH v3 05/12] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Sometimes kvm_scale_tsc() needs to use the current scaling ratio and
other times (like when reading the TSC from user space) it needs to use
L1's scaling ratio. Have the caller specify this by passing the ratio as
a parameter.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/x86.c              | 26 ++++++++++++++++----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7dfc609eacd6..b14c2b2b2e21 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1788,7 +1788,7 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
 	return kvm_find_user_return_msr(msr) >= 0;
 }
 
-u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
+u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac644a1c3285..fdcb4f46a003 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2307,10 +2307,9 @@ static inline u64 __scale_tsc(u64 ratio, u64 tsc)
 	return mul_u64_u64_shr(tsc, ratio, kvm_tsc_scaling_ratio_frac_bits);
 }
 
-u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
+u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio)
 {
 	u64 _tsc = tsc;
-	u64 ratio = vcpu->arch.tsc_scaling_ratio;
 
 	if (ratio != kvm_default_tsc_scaling_ratio)
 		_tsc = __scale_tsc(ratio, tsc);
@@ -2323,14 +2322,15 @@ static u64 kvm_compute_tsc_offset_l1(struct kvm_vcpu *vcpu, u64 target_tsc)
 {
 	u64 tsc;
 
-	tsc = kvm_scale_tsc(vcpu, rdtsc());
+	tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio);
 
 	return target_tsc - tsc;
 }
 
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 {
-	return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
+	return vcpu->arch.l1_tsc_offset +
+		kvm_scale_tsc(vcpu, host_tsc, vcpu->arch.l1_tsc_scaling_ratio);
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
@@ -2463,7 +2463,8 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 {
 	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
 		WARN_ON(adjustment < 0);
-	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
+	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment,
+				   vcpu->arch.l1_tsc_scaling_ratio);
 	adjust_tsc_offset_guest(vcpu, adjustment);
 }
 
@@ -2846,7 +2847,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	/* With all the info we got, fill in the values */
 
 	if (kvm_has_tsc_control)
-		tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz);
+		tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz,
+					    v->arch.l1_tsc_scaling_ratio);
 
 	if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
 		kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
@@ -3537,10 +3539,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * return L1's TSC value to ensure backwards-compatible
 		 * behavior for migration.
 		 */
-		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
-							    vcpu->arch.tsc_offset;
-
-		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
+		if (msr_info->host_initiated)
+			msr_info->data = kvm_scale_tsc(
+				vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio
+				) + vcpu->arch.l1_tsc_offset;
+		else
+			msr_info->data = kvm_scale_tsc(
+				vcpu, rdtsc(), vcpu->arch.tsc_scaling_ratio
+				) + vcpu->arch.tsc_offset;
 		break;
 	}
 	case MSR_MTRRcap:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 05/12] KVM: VMX: Add a TSC multiplier field in VMCS12
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (3 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-21 10:24 ` [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

This is required for supporting nested TSC scaling.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/vmcs12.c | 1 +
 arch/x86/kvm/vmx/vmcs12.h | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 034adb6404dc..d9f5d7c56ae3 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -37,6 +37,7 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(VM_ENTRY_MSR_LOAD_ADDR, vm_entry_msr_load_addr),
 	FIELD64(PML_ADDRESS, pml_address),
 	FIELD64(TSC_OFFSET, tsc_offset),
+	FIELD64(TSC_MULTIPLIER, tsc_multiplier),
 	FIELD64(VIRTUAL_APIC_PAGE_ADDR, virtual_apic_page_addr),
 	FIELD64(APIC_ACCESS_ADDR, apic_access_addr),
 	FIELD64(POSTED_INTR_DESC_ADDR, posted_intr_desc_addr),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 13494956d0e9..bb81a23afe89 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -70,7 +70,8 @@ struct __packed vmcs12 {
 	u64 eptp_list_address;
 	u64 pml_address;
 	u64 encls_exiting_bitmap;
-	u64 padding64[2]; /* room for future expansion */
+	u64 tsc_multiplier;
+	u64 padding64[1]; /* room for future expansion */
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -258,6 +259,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(eptp_list_address, 304);
 	CHECK_OFFSET(pml_address, 312);
 	CHECK_OFFSET(encls_exiting_bitmap, 320);
+	CHECK_OFFSET(tsc_multiplier, 328);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (4 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 05/12] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:50   ` Maxim Levitsky
  2021-05-21 10:24 ` [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier Ilias Stamatis
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

In order to implement as much of the nested TSC scaling logic as
possible in common code, we need these vendor callbacks for retrieving
the TSC offset and the TSC multiplier that L1 has set for L2.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 ++
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/svm/svm.c             | 14 ++++++++++++++
 arch/x86/kvm/vmx/vmx.c             | 23 +++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h             |  3 +++
 5 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 323641097f63..2063616fba1c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -87,6 +87,8 @@ KVM_X86_OP(set_identity_map_addr)
 KVM_X86_OP(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP_NULL(has_wbinvd_exit)
+KVM_X86_OP(get_l2_tsc_offset)
+KVM_X86_OP(get_l2_tsc_multiplier)
 KVM_X86_OP(write_l1_tsc_offset)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b14c2b2b2e21..0f2cf5d1240c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1305,6 +1305,8 @@ struct kvm_x86_ops {
 
 	bool (*has_wbinvd_exit)(void);
 
+	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
+	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
 	/* Returns actual tsc_offset set in active VMCS */
 	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05eca131eaf2..ca70e46f9194 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1082,6 +1082,18 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type)
 	seg->base = 0;
 }
 
+static u64 svm_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->nested.ctl.tsc_offset;
+}
+
+static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	return kvm_default_tsc_scaling_ratio;
+}
+
 static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4526,6 +4538,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.has_wbinvd_exit = svm_has_wbinvd_exit,
 
+	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
 	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3e4dda8177bb..1c83605eccc1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1787,6 +1787,27 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 	vmx->guest_uret_msrs_loaded = false;
 }
 
+u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING))
+		return vmcs12->tsc_offset;
+
+	return 0;
+}
+
+u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING) &&
+	    nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
+		return vmcs12->tsc_multiplier;
+
+	return kvm_default_tsc_scaling_ratio;
+}
+
 static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -7700,6 +7721,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
+	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
 	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 16e4e457ba23..aa97c82e3451 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -404,6 +404,9 @@ void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
 void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
 void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
 
+u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
+u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
+
 static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 					     int type, bool value)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (5 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:51   ` Maxim Levitsky
  2021-05-21 10:24 ` [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

When L2 is entered we need to "merge" the TSC multiplier and TSC offset
values of 01 and 12 together.

The merging is done using the following equations:
  offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
  mult_02 = (mult_01 * mult_12) >> shift_bits

Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0f2cf5d1240c..aaf756442ed1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1792,6 +1792,8 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
 
 u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
+u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
+u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fdcb4f46a003..04abaacb9cfc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2334,6 +2334,31 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier)
+{
+	u64 nested_offset;
+
+	if (l2_multiplier == kvm_default_tsc_scaling_ratio)
+		nested_offset = l1_offset;
+	else
+		nested_offset = mul_s64_u64_shr((s64) l1_offset, l2_multiplier,
+						kvm_tsc_scaling_ratio_frac_bits);
+
+	nested_offset += l2_offset;
+	return nested_offset;
+}
+EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_offset);
+
+u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
+{
+	if (l2_multiplier != kvm_default_tsc_scaling_ratio)
+		return mul_u64_u64_shr(l1_multiplier, l2_multiplier,
+				       kvm_tsc_scaling_ratio_frac_bits);
+
+	return l1_multiplier;
+}
+EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (6 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:51   ` Maxim Levitsky
  2021-05-21 10:24 ` [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier() Ilias Stamatis
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

The write_l1_tsc_offset() callback has a misleading name. It does not
set L1's TSC offset, it rather updates the current TSC offset which
might be different if a nested guest is executing. Additionally, both
the vmx and svm implementations use the same logic for calculating the
current TSC before writing it to hardware.

Rename the function and move the common logic to the caller. The vmx/svm
specific code now merely sets the given offset to the corresponding
hardware structure.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/include/asm/kvm_host.h    |  3 +--
 arch/x86/kvm/svm/svm.c             | 21 ++++-----------------
 arch/x86/kvm/vmx/vmx.c             | 23 +++--------------------
 arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++---
 5 files changed, 30 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 2063616fba1c..029c9615378f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -89,7 +89,7 @@ KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP_NULL(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
-KVM_X86_OP(write_l1_tsc_offset)
+KVM_X86_OP(write_tsc_offset)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aaf756442ed1..f099277b993d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1307,8 +1307,7 @@ struct kvm_x86_ops {
 
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
 	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
-	/* Returns actual tsc_offset set in active VMCS */
-	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
 	/*
 	 * Retrieve somewhat arbitrary exit information.  Intended to be used
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca70e46f9194..8dfb2513b72a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1094,26 +1094,13 @@ static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_default_tsc_scaling_ratio;
 }
 
-static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	u64 g_tsc_offset = 0;
-
-	if (is_guest_mode(vcpu)) {
-		/* Write L1's TSC offset.  */
-		g_tsc_offset = svm->vmcb->control.tsc_offset -
-			       svm->vmcb01.ptr->control.tsc_offset;
-		svm->vmcb01.ptr->control.tsc_offset = offset;
-	}
-
-	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-				   svm->vmcb->control.tsc_offset - g_tsc_offset,
-				   offset);
-
-	svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
 
+	svm->vmcb01.ptr->control.tsc_offset = vcpu->arch.l1_tsc_offset;
+	svm->vmcb->control.tsc_offset = offset;
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
-	return svm->vmcb->control.tsc_offset;
 }
 
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
@@ -4540,7 +4527,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
-	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
+	.write_tsc_offset = svm_write_tsc_offset,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1c83605eccc1..4b70431c2edd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1808,26 +1808,9 @@ u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_default_tsc_scaling_ratio;
 }
 
-static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
-	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-	u64 g_tsc_offset = 0;
-
-	/*
-	 * We're here if L1 chose not to trap WRMSR to TSC. According
-	 * to the spec, this should set L1's TSC; The offset that L1
-	 * set for L2 remains unchanged, and still needs to be added
-	 * to the newly set TSC to get L2's TSC.
-	 */
-	if (is_guest_mode(vcpu) &&
-	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
-		g_tsc_offset = vmcs12->tsc_offset;
-
-	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-				   vcpu->arch.tsc_offset - g_tsc_offset,
-				   offset);
-	vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
-	return offset + g_tsc_offset;
+	vmcs_write64(TSC_OFFSET, offset);
 }
 
 /*
@@ -7723,7 +7706,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
-	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
+	.write_tsc_offset = vmx_write_tsc_offset,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04abaacb9cfc..2f91259070e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2359,10 +2359,28 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
 }
 EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
 
-static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
 {
-	vcpu->arch.l1_tsc_offset = offset;
-	vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
+	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
+				   vcpu->arch.l1_tsc_offset,
+				   l1_offset);
+
+	vcpu->arch.l1_tsc_offset = l1_offset;
+
+	/*
+	 * If we are here because L1 chose not to trap WRMSR to TSC then
+	 * according to the spec this should set L1's TSC (as opposed to
+	 * setting L1's offset for L2).
+	 */
+	if (is_guest_mode(vcpu))
+		vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
+			l1_offset,
+			static_call(kvm_x86_get_l2_tsc_offset)(vcpu),
+			static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
+	else
+		vcpu->arch.tsc_offset = l1_offset;
+
+	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }
 
 static inline bool kvm_check_tsc_unstable(void)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (7 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:53   ` Maxim Levitsky
  2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

The vmx->current_tsc_ratio field is redundant as
vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
ratio. Removing this field makes decache_tsc_multiplier() an one-liner
so remove that too and do a vmcs_write64() directly in order to be more
consistent with surrounding code.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/kvm/vmx/nested.c | 9 ++++-----
 arch/x86/kvm/vmx/vmx.c    | 5 ++---
 arch/x86/kvm/vmx/vmx.h    | 8 --------
 3 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..239154d3e4e7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2533,9 +2533,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	}
 
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
-
 	if (kvm_has_tsc_control)
-		decache_tsc_multiplier(vmx);
+		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
 
 	nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
 
@@ -4501,12 +4500,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
+	if (kvm_has_tsc_control)
+		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
+
 	if (vmx->nested.l1_tpr_threshold != -1)
 		vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);
 
-	if (kvm_has_tsc_control)
-		decache_tsc_multiplier(vmx);
-
 	if (vmx->nested.change_vmcs01_virtual_apic_mode) {
 		vmx->nested.change_vmcs01_virtual_apic_mode = false;
 		vmx_set_virtual_apic_mode(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4b70431c2edd..7c52c697cfe3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1392,9 +1392,8 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 	}
 
 	/* Setup TSC multiplier */
-	if (kvm_has_tsc_control &&
-	    vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
-		decache_tsc_multiplier(vmx);
+	if (kvm_has_tsc_control)
+		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
 }
 
 /*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index aa97c82e3451..3eaa86a0ba3e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -322,8 +322,6 @@ struct vcpu_vmx {
 	/* apic deadline value in host tsc */
 	u64 hv_deadline_tsc;
 
-	u64 current_tsc_ratio;
-
 	unsigned long host_debugctlmsr;
 
 	/*
@@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
 			      GFP_KERNEL_ACCOUNT);
 }
 
-static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
-{
-	vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
-	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
-}
-
 static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
 {
 	return vmx->secondary_exec_control &
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (8 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier() Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:54   ` Maxim Levitsky
  2021-05-25 16:05   ` Sean Christopherson
  2021-05-21 10:24 ` [PATCH v3 11/12] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Calculate the nested TSC offset and multiplier on entering L2 using the
corresponding functions. Restore the L1 values on L2's exit.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/kvm/vmx/nested.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 239154d3e4e7..f75c4174cbcf 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2532,6 +2532,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat);
 	}
 
+	vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
+			vcpu->arch.l1_tsc_offset,
+			vmx_get_l2_tsc_offset(vcpu),
+			vmx_get_l2_tsc_multiplier(vcpu));
+
+	vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
+			vcpu->arch.l1_tsc_scaling_ratio,
+			vmx_get_l2_tsc_multiplier(vcpu));
+
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
 	if (kvm_has_tsc_control)
 		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
@@ -3353,8 +3362,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	}
 
 	enter_guest_mode(vcpu);
-	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
-		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
 
 	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
 		exit_reason.basic = EXIT_REASON_INVALID_STATE;
@@ -4462,8 +4469,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	if (nested_cpu_has_preemption_timer(vmcs12))
 		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
 
-	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
-		vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
+	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING)) {
+		vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
+		if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
+			vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
+	}
 
 	if (likely(!vmx->fail)) {
 		sync_vmcs02_to_vmcs12(vcpu, vmcs12);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 11/12] KVM: VMX: Expose TSC scaling to L2
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (9 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-21 10:24 ` [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
  2021-05-24 15:37 ` [PATCH v3 00/12] KVM: Implement nested TSC scaling Paolo Bonzini
  12 siblings, 0 replies; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Expose the TSC scaling feature to nested guests.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f75c4174cbcf..e8183e224706 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2277,7 +2277,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 				  SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
-				  SECONDARY_EXEC_ENABLE_VMFUNC);
+				  SECONDARY_EXEC_ENABLE_VMFUNC |
+				  SECONDARY_EXEC_TSC_SCALING);
 		if (nested_cpu_has(vmcs12,
 				   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
 			exec_control |= vmcs12->secondary_vm_exec_control;
@@ -6483,7 +6484,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		SECONDARY_EXEC_RDRAND_EXITING |
 		SECONDARY_EXEC_ENABLE_INVPCID |
 		SECONDARY_EXEC_RDSEED_EXITING |
-		SECONDARY_EXEC_XSAVES;
+		SECONDARY_EXEC_XSAVES |
+		SECONDARY_EXEC_TSC_SCALING;
 
 	/*
 	 * We can emulate "VMCS shadowing," even if the hardware
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (10 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 11/12] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
@ 2021-05-21 10:24 ` Ilias Stamatis
  2021-05-24 17:55   ` Maxim Levitsky
  2021-05-24 15:37 ` [PATCH v3 00/12] KVM: Implement nested TSC scaling Paolo Bonzini
  12 siblings, 1 reply; 39+ messages in thread
From: Ilias Stamatis @ 2021-05-21 10:24 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Test that nested TSC scaling works as expected with both L1 and L2
scaled.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
 3 files changed, 244 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..cc02022f9951 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -29,6 +29,7 @@
 /x86_64/vmx_preemption_timer_test
 /x86_64/vmx_set_nested_state_test
 /x86_64/vmx_tsc_adjust_test
+/x86_64/vmx_nested_tsc_scaling_test
 /x86_64/xapic_ipi_test
 /x86_64/xen_shinfo_test
 /x86_64/xen_vmcall_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..1078240b1313 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -60,6 +60,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
 TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
 TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
new file mode 100644
index 000000000000..2c130250fe3b
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vmx_nested_tsc_scaling_test
+ *
+ * Copyright (C) 2021 Amazon.com, Inc. or its affiliates.
+ *
+ * This test case verifies that nested TSC scaling behaves as expected when
+ * both L1 and L2 are scaled using different ratios. For this test we scale
+ * L1 down and scale L2 up.
+ */
+
+#include <time.h>
+
+#include "kvm_util.h"
+#include "vmx.h"
+#include "kselftest.h"
+
+
+#define VCPU_ID 0
+
+/* L2 is scaled up (from L1's perspective) by this factor */
+#define L2_SCALE_FACTOR 4ULL
+
+#define TSC_OFFSET_L2 ((uint64_t) -33125236320908)
+#define TSC_MULTIPLIER_L2 (L2_SCALE_FACTOR << 48)
+
+#define L2_GUEST_STACK_SIZE 64
+
+enum { USLEEP, UCHECK_L1, UCHECK_L2 };
+#define GUEST_SLEEP(sec)         ucall(UCALL_SYNC, 2, USLEEP, sec)
+#define GUEST_CHECK(level, freq) ucall(UCALL_SYNC, 2, level, freq)
+
+
+/*
+ * This function checks whether the "actual" TSC frequency of a guest matches
+ * its expected frequency. In order to account for delays in taking the TSC
+ * measurements, a difference of 1% between the actual and the expected value
+ * is tolerated.
+ */
+static void compare_tsc_freq(uint64_t actual, uint64_t expected)
+{
+	uint64_t tolerance, thresh_low, thresh_high;
+
+	tolerance = expected / 100;
+	thresh_low = expected - tolerance;
+	thresh_high = expected + tolerance;
+
+	TEST_ASSERT(thresh_low < actual,
+		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
+		" but it actually is %"PRIu64,
+		thresh_low, thresh_high, actual);
+	TEST_ASSERT(thresh_high > actual,
+		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
+		" but it actually is %"PRIu64,
+		thresh_low, thresh_high, actual);
+}
+
+static void check_tsc_freq(int level)
+{
+	uint64_t tsc_start, tsc_end, tsc_freq;
+
+	/*
+	 * Reading the TSC twice with about a second's difference should give
+	 * us an approximation of the TSC frequency from the guest's
+	 * perspective. Now, this won't be completely accurate, but it should
+	 * be good enough for the purposes of this test.
+	 */
+	tsc_start = rdmsr(MSR_IA32_TSC);
+	GUEST_SLEEP(1);
+	tsc_end = rdmsr(MSR_IA32_TSC);
+
+	tsc_freq = tsc_end - tsc_start;
+
+	GUEST_CHECK(level, tsc_freq);
+}
+
+static void l2_guest_code(void)
+{
+	check_tsc_freq(UCHECK_L2);
+
+	/* exit to L1 */
+	__asm__ __volatile__("vmcall");
+}
+
+static void l1_guest_code(struct vmx_pages *vmx_pages)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	uint32_t control;
+
+	/* check that L1's frequency looks alright before launching L2 */
+	check_tsc_freq(UCHECK_L1);
+
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+
+	/* prepare the VMCS for L2 execution */
+	prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+	/* enable TSC offsetting and TSC scaling for L2 */
+	control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
+	control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
+	vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
+
+	control = vmreadz(SECONDARY_VM_EXEC_CONTROL);
+	control |= SECONDARY_EXEC_TSC_SCALING;
+	vmwrite(SECONDARY_VM_EXEC_CONTROL, control);
+
+	vmwrite(TSC_OFFSET, TSC_OFFSET_L2);
+	vmwrite(TSC_MULTIPLIER, TSC_MULTIPLIER_L2);
+	vmwrite(TSC_MULTIPLIER_HIGH, TSC_MULTIPLIER_L2 >> 32);
+
+	/* launch L2 */
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+
+	/* check that L1's frequency still looks good */
+	check_tsc_freq(UCHECK_L1);
+
+	GUEST_DONE();
+}
+
+static void tsc_scaling_check_supported(void)
+{
+	if (!kvm_check_cap(KVM_CAP_TSC_CONTROL)) {
+		print_skip("TSC scaling not supported by the HW");
+		exit(KSFT_SKIP);
+	}
+}
+
+static void stable_tsc_check_supported(void)
+{
+	FILE *fp;
+	char buf[4];
+
+	fp = fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", "r");
+	if (fp == NULL)
+		goto skip_test;
+
+	if (fgets(buf, sizeof(buf), fp) == NULL)
+		goto skip_test;
+
+	if (strncmp(buf, "tsc", sizeof(buf)))
+		goto skip_test;
+
+	return;
+skip_test:
+	print_skip("TSC is not stable");
+	exit(KSFT_SKIP);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	vm_vaddr_t vmx_pages_gva;
+
+	uint64_t tsc_start, tsc_end;
+	uint64_t tsc_khz;
+	uint64_t l1_scale_factor;
+	uint64_t l0_tsc_freq = 0;
+	uint64_t l1_tsc_freq = 0;
+	uint64_t l2_tsc_freq = 0;
+
+	nested_vmx_check_supported();
+	tsc_scaling_check_supported();
+	stable_tsc_check_supported();
+
+	/*
+	 * We set L1's scale factor to be a random number from 2 to 10.
+	 * Ideally we would do the same for L2's factor but that one is
+	 * referenced by both main() and l1_guest_code() and using a global
+	 * variable does not work.
+	 */
+	srand(time(NULL));
+	l1_scale_factor = (rand() % 9) + 2;
+	printf("L1's scale down factor is: %"PRIu64"\n", l1_scale_factor);
+	printf("L2's scale up factor is: %llu\n", L2_SCALE_FACTOR);
+
+	tsc_start = rdtsc();
+	sleep(1);
+	tsc_end = rdtsc();
+
+	l0_tsc_freq = tsc_end - tsc_start;
+	printf("real TSC frequency is around: %"PRIu64"\n", l0_tsc_freq);
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code);
+	vcpu_alloc_vmx(vm, &vmx_pages_gva);
+	vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+
+	tsc_khz = _vcpu_ioctl(vm, VCPU_ID, KVM_GET_TSC_KHZ, NULL);
+	TEST_ASSERT(tsc_khz != -1, "vcpu ioctl KVM_GET_TSC_KHZ failed");
+
+	/* scale down L1's TSC frequency */
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_TSC_KHZ,
+		  (void *) (tsc_khz / l1_scale_factor));
+
+	for (;;) {
+		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+		struct ucall uc;
+
+		vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+			    run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *) uc.args[0]);
+		case UCALL_SYNC:
+			switch (uc.args[0]) {
+			case USLEEP:
+				sleep(uc.args[1]);
+				break;
+			case UCHECK_L1:
+				l1_tsc_freq = uc.args[1];
+				printf("L1's TSC frequency is around: %"PRIu64
+				       "\n", l1_tsc_freq);
+
+				compare_tsc_freq(l1_tsc_freq,
+						 l0_tsc_freq / l1_scale_factor);
+				break;
+			case UCHECK_L2:
+				l2_tsc_freq = uc.args[1];
+				printf("L2's TSC frequency is around: %"PRIu64
+				       "\n", l2_tsc_freq);
+
+				compare_tsc_freq(l2_tsc_freq,
+						 l1_tsc_freq * L2_SCALE_FACTOR);
+				break;
+			}
+			break;
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		}
+	}
+
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1()
  2021-05-21 10:24 ` [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1() Ilias Stamatis
@ 2021-05-24 14:21   ` Paolo Bonzini
  2021-05-24 17:49     ` Maxim Levitsky
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2021-05-24 14:21 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw

On 21/05/21 12:24, Ilias Stamatis wrote:
> +			u64 adj = kvm_compute_tsc_offset_l1(vcpu, data) - vcpu->arch.l1_tsc_offset;

Better: kvm_compute_l1_tsc_offset.  So far anyway I can adjust this myself.

Paolo


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  2021-05-21 10:24 ` [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
@ 2021-05-24 14:23   ` Paolo Bonzini
  2021-05-24 15:48     ` Sean Christopherson
  2021-05-24 17:50     ` Maxim Levitsky
  0 siblings, 2 replies; 39+ messages in thread
From: Paolo Bonzini @ 2021-05-24 14:23 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw

On 21/05/21 12:24, Ilias Stamatis wrote:
> @@ -3537,10 +3539,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		 * return L1's TSC value to ensure backwards-compatible
>   		 * behavior for migration.
>   		 */
> -		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> -							    vcpu->arch.tsc_offset;
> -
> -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> +		if (msr_info->host_initiated)
> +			msr_info->data = kvm_scale_tsc(
> +				vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio
> +				) + vcpu->arch.l1_tsc_offset;

Better indentation:

	msr_info->data = vcpu->arch.l1_tsc_offset +
		kvm_scale_tsc(vcpu, rdtsc(),
			      vcpu->arch.tsc_scaling_ratio);

Same below.

Paolo

> +		else
> +			msr_info->data = kvm_scale_tsc(
> +				vcpu, rdtsc(), vcpu->arch.tsc_scaling_ratio
> +				) + vcpu->arch.tsc_offset;
>   		break;
>   	}
>   	case MSR_MTRRcap:
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/12] KVM: Implement nested TSC scaling
  2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (11 preceding siblings ...)
  2021-05-21 10:24 ` [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
@ 2021-05-24 15:37 ` Paolo Bonzini
  12 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2021-05-24 15:37 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw

On 21/05/21 12:24, Ilias Stamatis wrote:
> KVM currently supports hardware-assisted TSC scaling but only for L1;
> the feature is not exposed to nested guests. This patch series adds
> support for nested TSC scaling and allows both L1 and L2 to be scaled
> with different scaling factors. That is achieved by "merging" the 01 and
> 02 values together.
> 
> Most of the logic in this series is implemented in common code (by doing
> the necessary restructurings), however the patches add support for VMX
> only. Adding support for SVM should be easy at this point and Maxim
> Levitsky has volunteered to do this (thanks!).
> 
> Changelog:
> v3:
>    - Applied Sean's feedback
>    - Refactored patches 7 to 10
> 
> v2:
>    - Applied all of Maxim's feedback
>    - Added a mul_s64_u64_shr function in math64.h
>    - Added a separate kvm_scale_tsc_l1 function instead of passing an
>      argument to kvm_scale_tsc
>    - Implemented the 02 fields calculations in common code
>    - Moved all of write_l1_tsc_offset's logic to common code
>    - Added a check for whether the TSC is stable in patch 10
>    - Used a random L1 factor and a negative offset in patch 10
> 
> Ilias Stamatis (12):
>    math64.h: Add mul_s64_u64_shr()
>    KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
>    KVM: X86: Rename kvm_compute_tsc_offset() to
>      kvm_compute_tsc_offset_l1()
>    KVM: X86: Add a ratio parameter to kvm_scale_tsc()
>    KVM: VMX: Add a TSC multiplier field in VMCS12
>    KVM: X86: Add functions for retrieving L2 TSC fields from common code
>    KVM: X86: Add functions that calculate L2's TSC offset and multiplier
>    KVM: X86: Move write_l1_tsc_offset() logic to common code and rename
>      it
>    KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
>    KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
>    KVM: VMX: Expose TSC scaling to L2
>    KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
> 
>   arch/x86/include/asm/kvm-x86-ops.h            |   4 +-
>   arch/x86/include/asm/kvm_host.h               |  14 +-
>   arch/x86/kvm/svm/svm.c                        |  29 ++-
>   arch/x86/kvm/vmx/nested.c                     |  33 ++-
>   arch/x86/kvm/vmx/vmcs12.c                     |   1 +
>   arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
>   arch/x86/kvm/vmx/vmx.c                        |  49 ++--
>   arch/x86/kvm/vmx/vmx.h                        |  11 +-
>   arch/x86/kvm/x86.c                            |  91 +++++--
>   include/linux/math64.h                        |  19 ++
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
>   13 files changed, 417 insertions(+), 82 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
> 
> --
> 2.17.1
> 

Queued, thanks.

The new kvm_x86_ops should go in kvm_x86_ops.nested, but those are not 
yet static_calls so we can leave that for later.

Paolo


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  2021-05-24 14:23   ` Paolo Bonzini
@ 2021-05-24 15:48     ` Sean Christopherson
  2021-05-24 15:56       ` Paolo Bonzini
  2021-05-24 17:50     ` Maxim Levitsky
  1 sibling, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2021-05-24 15:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ilias Stamatis, kvm, linux-kernel, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Mon, May 24, 2021, Paolo Bonzini wrote:
> On 21/05/21 12:24, Ilias Stamatis wrote:
> > @@ -3537,10 +3539,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >   		 * return L1's TSC value to ensure backwards-compatible
> >   		 * behavior for migration.
> >   		 */
> > -		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> > -							    vcpu->arch.tsc_offset;
> > -
> > -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> > +		if (msr_info->host_initiated)
> > +			msr_info->data = kvm_scale_tsc(
> > +				vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio
> > +				) + vcpu->arch.l1_tsc_offset;
> 
> Better indentation:
> 
> 	msr_info->data = vcpu->arch.l1_tsc_offset +
> 		kvm_scale_tsc(vcpu, rdtsc(),

I like this more, but it's still a bit ugly.

> 			      vcpu->arch.tsc_scaling_ratio);

Wrong pairing of ratio and offset (not necessarily what you queued, obviously).

> 
> Same below.

Alternatively, what about something like this?

		u64 offset, ratio;

		if (msr_info->host_initiated) {
			offset = vcpu->arch.l1_tsc_offset;
			ratio = vcpu->arch.l1_tsc_scaling_ratio;
		} else {
			offset = vcpu->arch.tsc_offset;
			ratio = vcpu->arch.tsc_scaling_ratio;
		}
		msr_info->data = kvm_scale_tsc(vcpu, rdtsc(), ratio) + offset;

or an option that would require additional refactoring:

		struct kvm_guest_tsc *tsc;

		tsc = msr_info->host_initiated ? &vcpu->arch.l1_tsc :
						 &vcpu->arch.current_tsc;

		msr_info->data = tsc->offset +
				 kvm_scale_tsc(vcpu, rdtsc(), tsc->scaling_ratio);


> 
> Paolo
> 
> > +		else
> > +			msr_info->data = kvm_scale_tsc(
> > +				vcpu, rdtsc(), vcpu->arch.tsc_scaling_ratio
> > +				) + vcpu->arch.tsc_offset;
> >   		break;
> >   	}
> >   	case MSR_MTRRcap:
> > 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  2021-05-24 15:48     ` Sean Christopherson
@ 2021-05-24 15:56       ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2021-05-24 15:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ilias Stamatis, kvm, linux-kernel, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On 24/05/21 17:48, Sean Christopherson wrote:
> 
> 		if (msr_info->host_initiated) {
> 			offset = vcpu->arch.l1_tsc_offset;
> 			ratio = vcpu->arch.l1_tsc_scaling_ratio;
> 		} else {
> 			offset = vcpu->arch.tsc_offset;
> 			ratio = vcpu->arch.tsc_scaling_ratio;
> 		}
> 		msr_info->data = kvm_scale_tsc(vcpu, rdtsc(), ratio) + offset;

Looks good, indeed I didn't do this just out of laziness really (and 
instead got a typo).

Paolo


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr()
  2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
@ 2021-05-24 17:49   ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:49 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> This function is needed for KVM's nested virtualization. The nested TSC
> scaling implementation requires multiplying the signed TSC offset with
> the unsigned TSC multiplier.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  include/linux/math64.h | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/include/linux/math64.h b/include/linux/math64.h
> index 66deb1fdc2ef..2928f03d6d46 100644
> --- a/include/linux/math64.h
> +++ b/include/linux/math64.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_MATH64_H
>  
>  #include <linux/types.h>
> +#include <linux/math.h>
>  #include <vdso/math64.h>
>  #include <asm/div64.h>
>  
> @@ -234,6 +235,24 @@ static inline u64 mul_u64_u64_shr(u64 a, u64 b, unsigned int shift)
>  
>  #endif
>  
> +#ifndef mul_s64_u64_shr
> +static inline u64 mul_s64_u64_shr(s64 a, u64 b, unsigned int shift)
> +{
> +	u64 ret;
> +
> +	/*
> +	 * Extract the sign before the multiplication and put it back
> +	 * afterwards if needed.
> +	 */
> +	ret = mul_u64_u64_shr(abs(a), b, shift);
> +
> +	if (a < 0)
> +		ret = -((s64) ret);
> +
> +	return ret;
> +}
> +#endif /* mul_s64_u64_shr */
> +
>  #ifndef mul_u64_u32_div
>  static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor)
>  {

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  2021-05-21 10:24 ` [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
@ 2021-05-24 17:49   ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:49 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> Store L1's scaling ratio in the kvm_vcpu_arch struct like we already do
> for L1's TSC offset. This allows for easy save/restore when we enter and
> then exit the nested guest.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 5 +++--
>  arch/x86/kvm/vmx/vmx.c          | 4 ++--
>  arch/x86/kvm/x86.c              | 6 ++++--
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..7dfc609eacd6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -707,7 +707,7 @@ struct kvm_vcpu_arch {
>  	} st;
>  
>  	u64 l1_tsc_offset;
> -	u64 tsc_offset;
> +	u64 tsc_offset; /* current tsc offset */
>  	u64 last_guest_tsc;
>  	u64 last_host_tsc;
>  	u64 tsc_offset_adjustment;
> @@ -721,7 +721,8 @@ struct kvm_vcpu_arch {
>  	u32 virtual_tsc_khz;
>  	s64 ia32_tsc_adjust_msr;
>  	u64 msr_ia32_power_ctl;
> -	u64 tsc_scaling_ratio;
> +	u64 l1_tsc_scaling_ratio;
> +	u64 tsc_scaling_ratio; /* current scaling ratio */
>  
>  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
>  	unsigned nmi_pending; /* NMI queued after currently running handler */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4bceb5ca3a89..3e4dda8177bb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7453,10 +7453,10 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
>  		delta_tsc = 0;
>  
>  	/* Convert to host delta tsc if tsc scaling is enabled */
> -	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
> +	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
>  	    delta_tsc && u64_shl_div_u64(delta_tsc,
>  				kvm_tsc_scaling_ratio_frac_bits,
> -				vcpu->arch.tsc_scaling_ratio, &delta_tsc))
> +				vcpu->arch.l1_tsc_scaling_ratio, &delta_tsc))
>  		return -ERANGE;
>  
>  	/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bbc4e04e67ad..6ab95ac188a5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2185,6 +2185,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  
>  	/* Guest TSC same frequency as host TSC? */
>  	if (!scale) {
> +		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		return 0;
>  	}
> @@ -2211,7 +2212,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  		return -1;
>  	}
>  
> -	vcpu->arch.tsc_scaling_ratio = ratio;
> +	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
>  	return 0;
>  }
>  
> @@ -2223,6 +2224,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
>  	/* tsc_khz can be zero if TSC calibration fails */
>  	if (user_tsc_khz == 0) {
>  		/* set tsc_scaling_ratio to a safe value */
> +		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		return -1;
>  	}
> @@ -2459,7 +2461,7 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
>  
>  static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
>  {
> -	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
> +	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
>  		WARN_ON(adjustment < 0);
>  	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
>  	adjust_tsc_offset_guest(vcpu, adjustment);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1()
  2021-05-24 14:21   ` Paolo Bonzini
@ 2021-05-24 17:49     ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:49 UTC (permalink / raw)
  To: Paolo Bonzini, Ilias Stamatis, kvm, linux-kernel
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Mon, 2021-05-24 at 16:21 +0200, Paolo Bonzini wrote:
> On 21/05/21 12:24, Ilias Stamatis wrote:
> > +			u64 adj = kvm_compute_tsc_offset_l1(vcpu, data) - vcpu->arch.l1_tsc_offset;
> 
> Better: kvm_compute_l1_tsc_offset.  So far anyway I can adjust this myself.

Nothing against this either!
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>



> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc()
  2021-05-24 14:23   ` Paolo Bonzini
  2021-05-24 15:48     ` Sean Christopherson
@ 2021-05-24 17:50     ` Maxim Levitsky
  1 sibling, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:50 UTC (permalink / raw)
  To: Paolo Bonzini, Ilias Stamatis, kvm, linux-kernel
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Mon, 2021-05-24 at 16:23 +0200, Paolo Bonzini wrote:
> On 21/05/21 12:24, Ilias Stamatis wrote:
> > @@ -3537,10 +3539,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >   		 * return L1's TSC value to ensure backwards-compatible
> >   		 * behavior for migration.
> >   		 */
> > -		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> > -							    vcpu->arch.tsc_offset;
> > -
> > -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> > +		if (msr_info->host_initiated)
> > +			msr_info->data = kvm_scale_tsc(
> > +				vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio
> > +				) + vcpu->arch.l1_tsc_offset;
> 
> Better indentation:
> 
> 	msr_info->data = vcpu->arch.l1_tsc_offset +
> 		kvm_scale_tsc(vcpu, rdtsc(),
> 			      vcpu->arch.tsc_scaling_ratio);

Good idea IMHO.
Other than that:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>



> 
> Same below.
> 
> Paolo
> 
> > +		else
> > +			msr_info->data = kvm_scale_tsc(
> > +				vcpu, rdtsc(), vcpu->arch.tsc_scaling_ratio
> > +				) + vcpu->arch.tsc_offset;
> >   		break;
> >   	}
> >   	case MSR_MTRRcap:
> > 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code
  2021-05-21 10:24 ` [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
@ 2021-05-24 17:50   ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:50 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> In order to implement as much of the nested TSC scaling logic as
> possible in common code, we need these vendor callbacks for retrieving
> the TSC offset and the TSC multiplier that L1 has set for L2.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  2 ++
>  arch/x86/include/asm/kvm_host.h    |  2 ++
>  arch/x86/kvm/svm/svm.c             | 14 ++++++++++++++
>  arch/x86/kvm/vmx/vmx.c             | 23 +++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.h             |  3 +++
>  5 files changed, 44 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 323641097f63..2063616fba1c 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -87,6 +87,8 @@ KVM_X86_OP(set_identity_map_addr)
>  KVM_X86_OP(get_mt_mask)
>  KVM_X86_OP(load_mmu_pgd)
>  KVM_X86_OP_NULL(has_wbinvd_exit)
> +KVM_X86_OP(get_l2_tsc_offset)
> +KVM_X86_OP(get_l2_tsc_multiplier)
>  KVM_X86_OP(write_l1_tsc_offset)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b14c2b2b2e21..0f2cf5d1240c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1305,6 +1305,8 @@ struct kvm_x86_ops {
>  
>  	bool (*has_wbinvd_exit)(void);
>  
> +	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
> +	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
>  	/* Returns actual tsc_offset set in active VMCS */
>  	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>  
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 05eca131eaf2..ca70e46f9194 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1082,6 +1082,18 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type)
>  	seg->base = 0;
>  }
>  
> +static u64 svm_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	return svm->nested.ctl.tsc_offset;
> +}
> +
> +static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_default_tsc_scaling_ratio;
I guess this makes sense as long as we don't support nested TSC scaling on SVM.
Or put a WARN_ON here maybe instead. Doesn't matter, as I'll implement this
right after this series is done, which should be very easy.

> +}
> +
>  static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4526,6 +4538,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  
>  	.has_wbinvd_exit = svm_has_wbinvd_exit,
>  
> +	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
> +	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
>  	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
>  
>  	.load_mmu_pgd = svm_load_mmu_pgd,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3e4dda8177bb..1c83605eccc1 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1787,6 +1787,27 @@ static void setup_msrs(struct vcpu_vmx *vmx)
>  	vmx->guest_uret_msrs_loaded = false;
>  }
>  
> +u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
> +{
> +	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +
> +	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING))
> +		return vmcs12->tsc_offset;
> +
> +	return 0;
> +}
> +
> +u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
> +{
> +	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +
> +	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING) &&
> +	    nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
> +		return vmcs12->tsc_multiplier;
> +
> +	return kvm_default_tsc_scaling_ratio;

> +}
> +
>  static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> @@ -7700,6 +7721,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>  
>  	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
>  
> +	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
> +	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
>  	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
>  
>  	.load_mmu_pgd = vmx_load_mmu_pgd,
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 16e4e457ba23..aa97c82e3451 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -404,6 +404,9 @@ void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
>  void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
>  void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
>  
> +u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
> +u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
> +
>  static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
>  					     int type, bool value)
>  {

Looks great.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier
  2021-05-21 10:24 ` [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier Ilias Stamatis
@ 2021-05-24 17:51   ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:51 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> When L2 is entered we need to "merge" the TSC multiplier and TSC offset
> values of 01 and 12 together.
> 
> The merging is done using the following equations:
>   offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
>   mult_02 = (mult_01 * mult_12) >> shift_bits
> 
> Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/x86.c              | 25 +++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0f2cf5d1240c..aaf756442ed1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1792,6 +1792,8 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
>  
>  u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio);
>  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
> +u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
> +u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
>  
>  unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
>  bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fdcb4f46a003..04abaacb9cfc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2334,6 +2334,31 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
>  }
>  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
>  
> +u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier)
> +{
> +	u64 nested_offset;
> +
> +	if (l2_multiplier == kvm_default_tsc_scaling_ratio)
> +		nested_offset = l1_offset;
> +	else
> +		nested_offset = mul_s64_u64_shr((s64) l1_offset, l2_multiplier,
> +						kvm_tsc_scaling_ratio_frac_bits);
> +
> +	nested_offset += l2_offset;
> +	return nested_offset;
> +}
> +EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_offset);
Looks OK.

> +
> +u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
> +{
> +	if (l2_multiplier != kvm_default_tsc_scaling_ratio)
> +		return mul_u64_u64_shr(l1_multiplier, l2_multiplier,
> +				       kvm_tsc_scaling_ratio_frac_bits);
> +
> +	return l1_multiplier;
> +}
> +EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
Looks OK as well.


> +
>  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	vcpu->arch.l1_tsc_offset = offset;


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-21 10:24 ` [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
@ 2021-05-24 17:51   ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:51 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> The write_l1_tsc_offset() callback has a misleading name. It does not
> set L1's TSC offset, it rather updates the current TSC offset which
> might be different if a nested guest is executing. Additionally, both
> the vmx and svm implementations use the same logic for calculating the
> current TSC before writing it to hardware.
> 
> Rename the function and move the common logic to the caller. The vmx/svm
> specific code now merely sets the given offset to the corresponding
> hardware structure.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  2 +-
>  arch/x86/include/asm/kvm_host.h    |  3 +--
>  arch/x86/kvm/svm/svm.c             | 21 ++++-----------------
>  arch/x86/kvm/vmx/vmx.c             | 23 +++--------------------
>  arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++---
>  5 files changed, 30 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 2063616fba1c..029c9615378f 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -89,7 +89,7 @@ KVM_X86_OP(load_mmu_pgd)
>  KVM_X86_OP_NULL(has_wbinvd_exit)
>  KVM_X86_OP(get_l2_tsc_offset)
>  KVM_X86_OP(get_l2_tsc_multiplier)
> -KVM_X86_OP(write_l1_tsc_offset)
> +KVM_X86_OP(write_tsc_offset)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
>  KVM_X86_OP(handle_exit_irqoff)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aaf756442ed1..f099277b993d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1307,8 +1307,7 @@ struct kvm_x86_ops {
>  
>  	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
>  	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
> -	/* Returns actual tsc_offset set in active VMCS */
> -	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> +	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>  
>  	/*
>  	 * Retrieve somewhat arbitrary exit information.  Intended to be used
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ca70e46f9194..8dfb2513b72a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1094,26 +1094,13 @@ static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
>  	return kvm_default_tsc_scaling_ratio;
>  }
>  
> -static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> +static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> -	u64 g_tsc_offset = 0;
> -
> -	if (is_guest_mode(vcpu)) {
> -		/* Write L1's TSC offset.  */
> -		g_tsc_offset = svm->vmcb->control.tsc_offset -
> -			       svm->vmcb01.ptr->control.tsc_offset;
> -		svm->vmcb01.ptr->control.tsc_offset = offset;
> -	}
> -
> -	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> -				   svm->vmcb->control.tsc_offset - g_tsc_offset,
> -				   offset);
> -
> -	svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
>  
> +	svm->vmcb01.ptr->control.tsc_offset = vcpu->arch.l1_tsc_offset;
> +	svm->vmcb->control.tsc_offset = offset;
>  	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
> -	return svm->vmcb->control.tsc_offset;
>  }
>  
>  /* Evaluate instruction intercepts that depend on guest CPUID features. */
> @@ -4540,7 +4527,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  
>  	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
>  	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
> -	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
> +	.write_tsc_offset = svm_write_tsc_offset,
>  
>  	.load_mmu_pgd = svm_load_mmu_pgd,
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1c83605eccc1..4b70431c2edd 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1808,26 +1808,9 @@ u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
>  	return kvm_default_tsc_scaling_ratio;
>  }
>  
> -static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> +static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
> -	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> -	u64 g_tsc_offset = 0;
> -
> -	/*
> -	 * We're here if L1 chose not to trap WRMSR to TSC. According
> -	 * to the spec, this should set L1's TSC; The offset that L1
> -	 * set for L2 remains unchanged, and still needs to be added
> -	 * to the newly set TSC to get L2's TSC.
> -	 */
> -	if (is_guest_mode(vcpu) &&
> -	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
> -		g_tsc_offset = vmcs12->tsc_offset;
> -
> -	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> -				   vcpu->arch.tsc_offset - g_tsc_offset,
> -				   offset);
> -	vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
> -	return offset + g_tsc_offset;
> +	vmcs_write64(TSC_OFFSET, offset);
>  }
>  
>  /*
> @@ -7723,7 +7706,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>  
>  	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
>  	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
> -	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
> +	.write_tsc_offset = vmx_write_tsc_offset,
>  
>  	.load_mmu_pgd = vmx_load_mmu_pgd,
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 04abaacb9cfc..2f91259070e9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2359,10 +2359,28 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
>  }
>  EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
>  
> -static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> +static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
>  {
> -	vcpu->arch.l1_tsc_offset = offset;
> -	vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
> +	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> +				   vcpu->arch.l1_tsc_offset,
> +				   l1_offset);
> +
> +	vcpu->arch.l1_tsc_offset = l1_offset;
> +
> +	/*
> +	 * If we are here because L1 chose not to trap WRMSR to TSC then
> +	 * according to the spec this should set L1's TSC (as opposed to
> +	 * setting L1's offset for L2).
> +	 */
> +	if (is_guest_mode(vcpu))
> +		vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
> +			l1_offset,
> +			static_call(kvm_x86_get_l2_tsc_offset)(vcpu),
> +			static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
> +	else
> +		vcpu->arch.tsc_offset = l1_offset;
> +
> +	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
>  }
>  
>  static inline bool kvm_check_tsc_unstable(void)


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-21 10:24 ` [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier() Ilias Stamatis
@ 2021-05-24 17:53   ` Maxim Levitsky
  2021-05-24 18:44     ` Sean Christopherson
  0 siblings, 1 reply; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:53 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> The vmx->current_tsc_ratio field is redundant as
> vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
> ratio. Removing this field makes decache_tsc_multiplier() an one-liner
> so remove that too and do a vmcs_write64() directly in order to be more
> consistent with surrounding code.
Not to mention that 'decache_tsc_multiplier' isn't a good name IMHO
for this....


> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 9 ++++-----
>  arch/x86/kvm/vmx/vmx.c    | 5 ++---
>  arch/x86/kvm/vmx/vmx.h    | 8 --------
>  3 files changed, 6 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..239154d3e4e7 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2533,9 +2533,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  	}
>  
>  	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
> -
>  	if (kvm_has_tsc_control)
> -		decache_tsc_multiplier(vmx);
> +		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
>  
>  	nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
>  
> @@ -4501,12 +4500,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
>  	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
>  	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
> +	if (kvm_has_tsc_control)
> +		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> +
>  	if (vmx->nested.l1_tpr_threshold != -1)
>  		vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);
>  
> -	if (kvm_has_tsc_control)
> -		decache_tsc_multiplier(vmx);
> -
>  	if (vmx->nested.change_vmcs01_virtual_apic_mode) {
>  		vmx->nested.change_vmcs01_virtual_apic_mode = false;
>  		vmx_set_virtual_apic_mode(vcpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4b70431c2edd..7c52c697cfe3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1392,9 +1392,8 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
>  	}
>  
>  	/* Setup TSC multiplier */
> -	if (kvm_has_tsc_control &&
> -	    vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> -		decache_tsc_multiplier(vmx);
> +	if (kvm_has_tsc_control)
> +		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);

This might have an overhead of writing the TSC scaling ratio even if
it is unchanged. I haven't measured how expensive vmread/vmwrites are but
at least when nested, the vmreads/vmwrites can be very expensive (if they
cause a vmexit).

This is why I think the 'vmx->current_tsc_ratio' exists - to have
a cached value of TSC scale ratio to avoid either 'vmread'ing
or 'vmwrite'ing it without a need.


Best regards,
	Maxim Levitsky

>  }
>  
>  /*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index aa97c82e3451..3eaa86a0ba3e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -322,8 +322,6 @@ struct vcpu_vmx {
>  	/* apic deadline value in host tsc */
>  	u64 hv_deadline_tsc;
>  
> -	u64 current_tsc_ratio;
> -
>  	unsigned long host_debugctlmsr;
>  
>  	/*
> @@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
>  			      GFP_KERNEL_ACCOUNT);
>  }
>  
> -static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
> -{
> -	vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
> -	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
> -}
> -
>  static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
>  {
>  	return vmx->secondary_exec_control &



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
@ 2021-05-24 17:54   ` Maxim Levitsky
  2021-05-25 16:05   ` Sean Christopherson
  1 sibling, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:54 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> Calculate the nested TSC offset and multiplier on entering L2 using the
> corresponding functions. Restore the L1 values on L2's exit.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 239154d3e4e7..f75c4174cbcf 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2532,6 +2532,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  		vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat);
>  	}
>  
> +	vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
> +			vcpu->arch.l1_tsc_offset,
> +			vmx_get_l2_tsc_offset(vcpu),
> +			vmx_get_l2_tsc_multiplier(vcpu));
> +
> +	vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
> +			vcpu->arch.l1_tsc_scaling_ratio,
> +			vmx_get_l2_tsc_multiplier(vcpu));
> +

This code can be in theory put to the common x86 code,
since it uses the vendor callbacks anyway, but this is
probably not worth it.

>  	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
>  	if (kvm_has_tsc_control)
>  		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> @@ -3353,8 +3362,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  	}
>  
>  	enter_guest_mode(vcpu);
> -	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> -		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
>  
>  	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
>  		exit_reason.basic = EXIT_REASON_INVALID_STATE;
> @@ -4462,8 +4469,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  	if (nested_cpu_has_preemption_timer(vmcs12))
>  		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
>  
> -	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> -		vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
> +	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING)) {
> +		vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
> +		if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
> +			vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
> +	}

Same here.
>  
>  	if (likely(!vmx->fail)) {
>  		sync_vmcs02_to_vmcs12(vcpu, vmcs12);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
  2021-05-21 10:24 ` [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
@ 2021-05-24 17:55   ` Maxim Levitsky
  0 siblings, 0 replies; 39+ messages in thread
From: Maxim Levitsky @ 2021-05-24 17:55 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, linux-kernel, pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, zamsden, mtosatti, dwmw

On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> Test that nested TSC scaling works as expected with both L1 and L2
> scaled.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
>  3 files changed, 244 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index bd83158e0e0b..cc02022f9951 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -29,6 +29,7 @@
>  /x86_64/vmx_preemption_timer_test
>  /x86_64/vmx_set_nested_state_test
>  /x86_64/vmx_tsc_adjust_test
> +/x86_64/vmx_nested_tsc_scaling_test
>  /x86_64/xapic_ipi_test
>  /x86_64/xen_shinfo_test
>  /x86_64/xen_vmcall_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index e439d027939d..1078240b1313 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -60,6 +60,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
> +TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
>  TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
>  TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
>  TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
> diff --git a/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
> new file mode 100644
> index 000000000000..2c130250fe3b
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
> @@ -0,0 +1,242 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * vmx_nested_tsc_scaling_test
> + *
> + * Copyright (C) 2021 Amazon.com, Inc. or its affiliates.
> + *
> + * This test case verifies that nested TSC scaling behaves as expected when
> + * both L1 and L2 are scaled using different ratios. For this test we scale
> + * L1 down and scale L2 up.
> + */
> +
> +#include <time.h>
> +
> +#include "kvm_util.h"
> +#include "vmx.h"
> +#include "kselftest.h"
> +
> +
> +#define VCPU_ID 0
> +
> +/* L2 is scaled up (from L1's perspective) by this factor */
> +#define L2_SCALE_FACTOR 4ULL
> +
> +#define TSC_OFFSET_L2 ((uint64_t) -33125236320908)
> +#define TSC_MULTIPLIER_L2 (L2_SCALE_FACTOR << 48)
> +
> +#define L2_GUEST_STACK_SIZE 64
> +
> +enum { USLEEP, UCHECK_L1, UCHECK_L2 };
> +#define GUEST_SLEEP(sec)         ucall(UCALL_SYNC, 2, USLEEP, sec)
> +#define GUEST_CHECK(level, freq) ucall(UCALL_SYNC, 2, level, freq)
> +
> +
> +/*
> + * This function checks whether the "actual" TSC frequency of a guest matches
> + * its expected frequency. In order to account for delays in taking the TSC
> + * measurements, a difference of 1% between the actual and the expected value
> + * is tolerated.
> + */
> +static void compare_tsc_freq(uint64_t actual, uint64_t expected)
> +{
> +	uint64_t tolerance, thresh_low, thresh_high;
> +
> +	tolerance = expected / 100;
> +	thresh_low = expected - tolerance;
> +	thresh_high = expected + tolerance;
> +
> +	TEST_ASSERT(thresh_low < actual,
> +		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
> +		" but it actually is %"PRIu64,
> +		thresh_low, thresh_high, actual);
> +	TEST_ASSERT(thresh_high > actual,
> +		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
> +		" but it actually is %"PRIu64,
> +		thresh_low, thresh_high, actual);
> +}
> +
> +static void check_tsc_freq(int level)
> +{
> +	uint64_t tsc_start, tsc_end, tsc_freq;
> +
> +	/*
> +	 * Reading the TSC twice with about a second's difference should give
> +	 * us an approximation of the TSC frequency from the guest's
> +	 * perspective. Now, this won't be completely accurate, but it should
> +	 * be good enough for the purposes of this test.
> +	 */
> +	tsc_start = rdmsr(MSR_IA32_TSC);
> +	GUEST_SLEEP(1);
> +	tsc_end = rdmsr(MSR_IA32_TSC);
> +
> +	tsc_freq = tsc_end - tsc_start;
> +
> +	GUEST_CHECK(level, tsc_freq);
> +}
> +
> +static void l2_guest_code(void)
> +{
> +	check_tsc_freq(UCHECK_L2);
> +
> +	/* exit to L1 */
> +	__asm__ __volatile__("vmcall");
> +}
> +
> +static void l1_guest_code(struct vmx_pages *vmx_pages)
> +{
> +	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> +	uint32_t control;
> +
> +	/* check that L1's frequency looks alright before launching L2 */
> +	check_tsc_freq(UCHECK_L1);
> +
> +	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
> +	GUEST_ASSERT(load_vmcs(vmx_pages));
> +
> +	/* prepare the VMCS for L2 execution */
> +	prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
> +
> +	/* enable TSC offsetting and TSC scaling for L2 */
> +	control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
> +	control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
> +	vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
> +
> +	control = vmreadz(SECONDARY_VM_EXEC_CONTROL);
> +	control |= SECONDARY_EXEC_TSC_SCALING;
> +	vmwrite(SECONDARY_VM_EXEC_CONTROL, control);
> +
> +	vmwrite(TSC_OFFSET, TSC_OFFSET_L2);
> +	vmwrite(TSC_MULTIPLIER, TSC_MULTIPLIER_L2);
> +	vmwrite(TSC_MULTIPLIER_HIGH, TSC_MULTIPLIER_L2 >> 32);
> +
> +	/* launch L2 */
> +	GUEST_ASSERT(!vmlaunch());
> +	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
> +
> +	/* check that L1's frequency still looks good */
> +	check_tsc_freq(UCHECK_L1);
> +
> +	GUEST_DONE();
> +}
> +
> +static void tsc_scaling_check_supported(void)
> +{
> +	if (!kvm_check_cap(KVM_CAP_TSC_CONTROL)) {
> +		print_skip("TSC scaling not supported by the HW");
> +		exit(KSFT_SKIP);
> +	}
> +}
> +
> +static void stable_tsc_check_supported(void)
> +{
> +	FILE *fp;
> +	char buf[4];
> +
> +	fp = fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", "r");
> +	if (fp == NULL)
> +		goto skip_test;
> +
> +	if (fgets(buf, sizeof(buf), fp) == NULL)
> +		goto skip_test;
> +
> +	if (strncmp(buf, "tsc", sizeof(buf)))
> +		goto skip_test;
> +
> +	return;
> +skip_test:
> +	print_skip("TSC is not stable");

Tiny nitpick: I would print a message that expains the hack a bit better,
something like

"Kernel doesn't use TSC clocksource - assuming that host TSC is not stable - skipping test"


> +	exit(KSFT_SKIP);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_vm *vm;
> +	vm_vaddr_t vmx_pages_gva;
> +
> +	uint64_t tsc_start, tsc_end;
> +	uint64_t tsc_khz;
> +	uint64_t l1_scale_factor;
> +	uint64_t l0_tsc_freq = 0;
> +	uint64_t l1_tsc_freq = 0;
> +	uint64_t l2_tsc_freq = 0;
> +
> +	nested_vmx_check_supported();
> +	tsc_scaling_check_supported();
> +	stable_tsc_check_supported();
> +
> +	/*
> +	 * We set L1's scale factor to be a random number from 2 to 10.
> +	 * Ideally we would do the same for L2's factor but that one is
> +	 * referenced by both main() and l1_guest_code() and using a global
> +	 * variable does not work.
> +	 */
> +	srand(time(NULL));
> +	l1_scale_factor = (rand() % 9) + 2;
> +	printf("L1's scale down factor is: %"PRIu64"\n", l1_scale_factor);
> +	printf("L2's scale up factor is: %llu\n", L2_SCALE_FACTOR);
> +
> +	tsc_start = rdtsc();
> +	sleep(1);
> +	tsc_end = rdtsc();
> +
> +	l0_tsc_freq = tsc_end - tsc_start;
> +	printf("real TSC frequency is around: %"PRIu64"\n", l0_tsc_freq);
> +
> +	vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code);
> +	vcpu_alloc_vmx(vm, &vmx_pages_gva);
> +	vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
> +
> +	tsc_khz = _vcpu_ioctl(vm, VCPU_ID, KVM_GET_TSC_KHZ, NULL);
> +	TEST_ASSERT(tsc_khz != -1, "vcpu ioctl KVM_GET_TSC_KHZ failed");
> +
> +	/* scale down L1's TSC frequency */
> +	vcpu_ioctl(vm, VCPU_ID, KVM_SET_TSC_KHZ,
> +		  (void *) (tsc_khz / l1_scale_factor));
> +
> +	for (;;) {
> +		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
> +		struct ucall uc;
> +
> +		vcpu_run(vm, VCPU_ID);
> +		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
> +			    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
> +			    run->exit_reason,
> +			    exit_reason_str(run->exit_reason));
> +
> +		switch (get_ucall(vm, VCPU_ID, &uc)) {
> +		case UCALL_ABORT:
> +			TEST_FAIL("%s", (const char *) uc.args[0]);
> +		case UCALL_SYNC:
> +			switch (uc.args[0]) {
> +			case USLEEP:
> +				sleep(uc.args[1]);
> +				break;
> +			case UCHECK_L1:
> +				l1_tsc_freq = uc.args[1];
> +				printf("L1's TSC frequency is around: %"PRIu64
> +				       "\n", l1_tsc_freq);
> +
> +				compare_tsc_freq(l1_tsc_freq,
> +						 l0_tsc_freq / l1_scale_factor);
> +				break;
> +			case UCHECK_L2:
> +				l2_tsc_freq = uc.args[1];
> +				printf("L2's TSC frequency is around: %"PRIu64
> +				       "\n", l2_tsc_freq);
> +
> +				compare_tsc_freq(l2_tsc_freq,
> +						 l1_tsc_freq * L2_SCALE_FACTOR);
> +				break;
> +			}
> +			break;
> +		case UCALL_DONE:
> +			goto done;
> +		default:
> +			TEST_FAIL("Unknown ucall %lu", uc.cmd);
> +		}
> +	}
> +
> +done:
> +	kvm_vm_free(vm);
> +	return 0;
> +}


Overall looks good to me. Thanks!

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-24 17:53   ` Maxim Levitsky
@ 2021-05-24 18:44     ` Sean Christopherson
  2021-05-25 10:41       ` Stamatis, Ilias
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2021-05-24 18:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Ilias Stamatis, kvm, linux-kernel, pbonzini, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Mon, May 24, 2021, Maxim Levitsky wrote:
> On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 4b70431c2edd..7c52c697cfe3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1392,9 +1392,8 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> >  	}
> >  
> >  	/* Setup TSC multiplier */
> > -	if (kvm_has_tsc_control &&
> > -	    vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> > -		decache_tsc_multiplier(vmx);
> > +	if (kvm_has_tsc_control)
> > +		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> 
> This might have an overhead of writing the TSC scaling ratio even if
> it is unchanged. I haven't measured how expensive vmread/vmwrites are but
> at least when nested, the vmreads/vmwrites can be very expensive (if they
> cause a vmexit).
> 
> This is why I think the 'vmx->current_tsc_ratio' exists - to have
> a cached value of TSC scale ratio to avoid either 'vmread'ing
> or 'vmwrite'ing it without a need.

Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
per-VMCS tracking, it should not be useful, keyword "should".

What I meant by my earlier comment:

  Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
  earlier", which is all kinds of wrong.

is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
behavior is to set the field at VMCS initialization, and then immediately set it
whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
In other words, my unclear feedback was to make it obsolete (and drop it) by 
fixing the underlying mess, not to just drop the optimization hack.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-24 18:44     ` Sean Christopherson
@ 2021-05-25 10:41       ` Stamatis, Ilias
  2021-05-25 15:58         ` Sean Christopherson
  0 siblings, 1 reply; 39+ messages in thread
From: Stamatis, Ilias @ 2021-05-25 10:41 UTC (permalink / raw)
  To: seanjc, mlevitsk
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, wanpengli

On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> On Mon, May 24, 2021, Maxim Levitsky wrote:
> > On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 4b70431c2edd..7c52c697cfe3 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1392,9 +1392,8 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> > >     }
> > > 
> > >     /* Setup TSC multiplier */
> > > -   if (kvm_has_tsc_control &&
> > > -       vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> > > -           decache_tsc_multiplier(vmx);
> > > +   if (kvm_has_tsc_control)
> > > +           vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> > 
> > This might have an overhead of writing the TSC scaling ratio even if
> > it is unchanged. I haven't measured how expensive vmread/vmwrites are but
> > at least when nested, the vmreads/vmwrites can be very expensive (if they
> > cause a vmexit).
> > 
> > This is why I think the 'vmx->current_tsc_ratio' exists - to have
> > a cached value of TSC scale ratio to avoid either 'vmread'ing
> > or 'vmwrite'ing it without a need.

Right. I thought the overhead might not be that significant since we're doing
lots of vmwrites on vmentry/vmexit anyway, but yeah, why introduce any kind of
extra overhead anyway.

I'm fine with this particular patch getting dropped. It's not directly related 
to the series anyway.

> 
> Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
> scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
> per-VMCS tracking, it should not be useful, keyword "should".
> 
> What I meant by my earlier comment:
> 
>   Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
>   earlier", which is all kinds of wrong.
> 
> is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
> behavior is to set the field at VMCS initialization, and then immediately set it
> whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> In other words, my unclear feedback was to make it obsolete (and drop it) by
> fixing the underlying mess, not to just drop the optimization hack.

I understood this and replied earlier. The right place for the hw multiplier
field to be updated is inside set_tsc_khz() in common code when the ratio
changes. However, this requires adding another vendor callback etc. As all
this is further refactoring I believe it's better to leave this series as is -
ie only touching code that is directly related to nested TSC scaling and not
try to do everything as part of the same series. This makes testing easier
too. We can still implement these changes later.

Thanks,
Ilias


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 10:41       ` Stamatis, Ilias
@ 2021-05-25 15:58         ` Sean Christopherson
  2021-05-25 16:15           ` Paolo Bonzini
                             ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Sean Christopherson @ 2021-05-25 15:58 UTC (permalink / raw)
  To: Stamatis, Ilias
  Cc: mlevitsk, kvm, linux-kernel, jmattson, Woodhouse, David,
	vkuznets, joro, mtosatti, zamsden, pbonzini, wanpengli

On Tue, May 25, 2021, Stamatis, Ilias wrote:
> On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> > Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
> > scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
> > per-VMCS tracking, it should not be useful, keyword "should".
> > 
> > What I meant by my earlier comment:
> > 
> >   Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
> >   earlier", which is all kinds of wrong.
> > 
> > is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
> > behavior is to set the field at VMCS initialization, and then immediately set it
> > whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> > In other words, my unclear feedback was to make it obsolete (and drop it) by
> > fixing the underlying mess, not to just drop the optimization hack.
> 
> I understood this and replied earlier. The right place for the hw multiplier
> field to be updated is inside set_tsc_khz() in common code when the ratio
> changes. However, this requires adding another vendor callback etc. As all
> this is further refactoring I believe it's better to leave this series as is -
> ie only touching code that is directly related to nested TSC scaling and not
> try to do everything as part of the same series.

But it directly impacts your code, e.g. the nested enter/exit flows would need
to dance around the decache silliness.  And I believe it even more directly
impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace
invokes KVM_SET_TSC_KHZ while L2 is active.

> This makes testing easier too.

Hmm, sort of.  Yes, the fewer patches/modifications in a series definitely makes
the series itself easier to test.  But stepping back and looking at the total
cost of testing, I would argue that punting related changes to a later time
increases the overall cost.  E.g. if someone else picks up the clean up work,
then they have to redo most, if not all, of the testing that you are already
doing, including getting access to the proper hardware, understanding what tests
to prioritize, etc...  Whereas adding one more patch to your series is an
incremental cost since you already have the hardware setup, know which tests to
run, etc...

> We can still implement these changes later.

We can, but we shouldn't.  Simply dropping vmx->current_tsc_ratio is not an
option; it knowingly introduces a (minor) performance regression, for no reason
other than wanting to avoid code churn.  Piling more stuff on top of the flawed
decache logic is impolite, as it adds more work for the person that ends up
doing the cleanup.  I would 100% agree if this were a significant cleanup and/or
completely unrelated, but IMO that's not the case.

Compile tested only...


diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 029c9615378f..34ad7a17458a 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
 KVM_X86_OP(write_tsc_offset)
+KVM_X86_OP(write_tsc_multiplier)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f099277b993d..a334ce7741ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
        u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
        u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
        void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+       void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);

        /*
         * Retrieve somewhat arbitrary exit information.  Intended to be used
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b18f60463073..914afcceb46d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1103,6 +1103,14 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
        vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }

+static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
+{
+       /*
+        * Handled when loading guest state since the ratio is programmed via
+        * MSR_AMD64_TSC_RATIO, not a field in the VMCB.
+        */
+}
+
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
 static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
                                              struct vcpu_svm *svm)
@@ -4528,6 +4536,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
        .get_l2_tsc_offset = svm_get_l2_tsc_offset,
        .get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
        .write_tsc_offset = svm_write_tsc_offset,
+       .write_tsc_multiplier = svm_write_tsc_multiplier,

        .load_mmu_pgd = svm_load_mmu_pgd,

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..712190493926 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2535,7 +2535,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
        vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);

        if (kvm_has_tsc_control)
-               decache_tsc_multiplier(vmx);
+               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);

        nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);

@@ -4505,7 +4505,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
                vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);

        if (kvm_has_tsc_control)
-               decache_tsc_multiplier(vmx);
+               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);

        if (vmx->nested.change_vmcs01_virtual_apic_mode) {
                vmx->nested.change_vmcs01_virtual_apic_mode = false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4b70431c2edd..bf845a08995e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1390,11 +1390,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,

                vmx->loaded_vmcs->cpu = cpu;
        }
-
-       /* Setup TSC multiplier */
-       if (kvm_has_tsc_control &&
-           vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
-               decache_tsc_multiplier(vmx);
 }

 /*
@@ -1813,6 +1808,11 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
        vmcs_write64(TSC_OFFSET, offset);
...skipping...
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -322,8 +322,6 @@ struct vcpu_vmx {
        /* apic deadline value in host tsc */
        u64 hv_deadline_tsc;

-       u64 current_tsc_ratio;
-
        unsigned long host_debugctlmsr;

        /*
@@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
                              GFP_KERNEL_ACCOUNT);
 }

-static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
-{
-       vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
-       vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
-}
-
 static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
 {
        return vmx->secondary_exec_control &
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b61b54cea495..690de1868873 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2179,14 +2179,16 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
        return v;
 }

+static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
+                                         u64 l1_multiplier);
+
 static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 {
        u64 ratio;

        /* Guest TSC same frequency as host TSC? */
        if (!scale) {
-               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
-               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
                return 0;
        }

@@ -2212,7 +2214,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
                return -1;
        }

-       vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
+       kvm_vcpu_write_tsc_multiplier(vcpu, ratio);
        return 0;
 }

@@ -2224,8 +2226,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
        /* tsc_khz can be zero if TSC calibration fails */
        if (user_tsc_khz == 0) {
                /* set tsc_scaling_ratio to a safe value */
-               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
-               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
                return -1;
        }

@@ -2383,6 +2384,25 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
        static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }

+static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
+                                         u64 l1_multiplier)
+{
+       if (!kvm_has_tsc_control)
+               return;
+
+       vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
+
+       /* Userspace is changing the multiplier while L2 is active... */
+       if (is_guest_mode(vcpu))
+               vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
+                       l1_multiplier,
+                       static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
+       else
+               vcpu->arch.tsc_scaling_ratio = l1_multiplier;
+
+       static_call(kvm_x86_write_tsc_multiplier)(vcpu, vcpu->arch.tsc_scaling_ratio);
+}
+
 static inline bool kvm_check_tsc_unstable(void)
 {
 #ifdef CONFIG_X86_64

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
  2021-05-24 17:54   ` Maxim Levitsky
@ 2021-05-25 16:05   ` Sean Christopherson
  1 sibling, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2021-05-25 16:05 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

"KVM: nVMX:" for the scope.

The shortlog is also a bit confusing.  I usually think of "set == write", i.e.
I expected VMWRITEs in the diff.  The nested_vmx_vmexit() case in particular is
gnarly because the actual VMWRITEs aren't captured in the diff's context.

What about combining this with the next patch that exposes the feature to L1?
E.g. "KVM: nVMX: Enable nested TSC scaling" or so.

That would avoid bikeshedding the meaning of "set", fix the goof in the next patch's
shortlog (KVM exposes the feature to L1, not L2), and eliminate an unnecessary
patch for bisection purposes.  Bisecting to a patch that exposes the feature but
doesn't introduce any actual functionality isn't all that helpful, e.g. if there
is a bug in _this_ patch then bisection will arguably point at the wrong patch.

On Fri, May 21, 2021, Ilias Stamatis wrote:
> Calculate the nested TSC offset and multiplier on entering L2 using the
> corresponding functions. Restore the L1 values on L2's exit.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 239154d3e4e7..f75c4174cbcf 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2532,6 +2532,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  		vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat);
>  	}
>  
> +	vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
> +			vcpu->arch.l1_tsc_offset,
> +			vmx_get_l2_tsc_offset(vcpu),
> +			vmx_get_l2_tsc_multiplier(vcpu));
> +
> +	vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
> +			vcpu->arch.l1_tsc_scaling_ratio,
> +			vmx_get_l2_tsc_multiplier(vcpu));
> +
>  	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
>  	if (kvm_has_tsc_control)
>  		vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> @@ -3353,8 +3362,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  	}
>  
>  	enter_guest_mode(vcpu);
> -	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> -		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
>  
>  	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
>  		exit_reason.basic = EXIT_REASON_INVALID_STATE;
> @@ -4462,8 +4469,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  	if (nested_cpu_has_preemption_timer(vmcs12))
>  		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
>  
> -	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> -		vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
> +	if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING)) {
> +		vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
> +		if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
> +			vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
> +	}
>  
>  	if (likely(!vmx->fail)) {
>  		sync_vmcs02_to_vmcs12(vcpu, vmcs12);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 15:58         ` Sean Christopherson
@ 2021-05-25 16:15           ` Paolo Bonzini
  2021-05-25 16:34             ` Sean Christopherson
  2021-05-25 18:52           ` Stamatis, Ilias
  2021-05-25 19:25           ` Stamatis, Ilias
  2 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2021-05-25 16:15 UTC (permalink / raw)
  To: Sean Christopherson, Stamatis, Ilias
  Cc: mlevitsk, kvm, linux-kernel, jmattson, Woodhouse, David,
	vkuznets, joro, mtosatti, zamsden, wanpengli

On 25/05/21 17:58, Sean Christopherson wrote:
>> The right place for the hw multiplier
>> field to be updated is inside set_tsc_khz() in common code when the ratio
>> changes.

Sort of, the problem is that you have two VMCS's to update.  If properly 
fixed, the cache is useful to fix the issue with KVM_SET_TSC_KHZ needing 
to update both of them.  For that to work, you'd have to move the cache 
to struct loaded_vmcs.

So you can:

1) move the cached tsc_ratio to struct loaded_vmcs

2) add a function in common code (update_tsc_parameters or something 
like that) to update both the offset and the ratio depending on 
is_guest_mode()

3) call that function from nested vmentry/vmexit

And at that point the cache will do its job and figure out whether a 
vmwrite is needed, on both vmentry and vmexit.

I actually like the idea of storing the expected value in kvm_vcpu and 
the current value in loaded_vmcs.  We might use it for other things such 
as reload_vmcs01_apic_access_page perhaps.

Paolo

>> However, this requires adding another vendor callback etc. As all
>> this is further refactoring I believe it's better to leave this series as is -
>> ie only touching code that is directly related to nested TSC scaling and not
>> try to do everything as part of the same series.
> But it directly impacts your code, e.g. the nested enter/exit flows would need
> to dance around the decache silliness.  And I believe it even more directly
> impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace
> invokes KVM_SET_TSC_KHZ while L2 is active.
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 16:15           ` Paolo Bonzini
@ 2021-05-25 16:34             ` Sean Christopherson
  2021-05-25 17:34               ` Paolo Bonzini
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2021-05-25 16:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stamatis, Ilias, mlevitsk, kvm, linux-kernel, jmattson,
	Woodhouse, David, vkuznets, joro, mtosatti, zamsden, wanpengli

On Tue, May 25, 2021, Paolo Bonzini wrote:
> On 25/05/21 17:58, Sean Christopherson wrote:
> > > The right place for the hw multiplier
> > > field to be updated is inside set_tsc_khz() in common code when the ratio
> > > changes.
> 
> Sort of, the problem is that you have two VMCS's to update.  If properly
> fixed, the cache is useful to fix the issue with KVM_SET_TSC_KHZ needing to
> update both of them.  For that to work, you'd have to move the cache to
> struct loaded_vmcs.

vmcs01 and vmcs02 will get updated at enter/exit, if there's no caching then
it all Just Works.

> So you can:
> 
> 1) move the cached tsc_ratio to struct loaded_vmcs
> 
> 2) add a function in common code (update_tsc_parameters or something like
> that) to update both the offset and the ratio depending on is_guest_mode()
> 
> 3) call that function from nested vmentry/vmexit
> 
> And at that point the cache will do its job and figure out whether a vmwrite
> is needed, on both vmentry and vmexit.
> 
> I actually like the idea of storing the expected value in kvm_vcpu and the
> current value in loaded_vmcs.  We might use it for other things such as
> reload_vmcs01_apic_access_page perhaps.

I'm not necessarily opposed to aggressively shadowing the VMCS, but if we go
that route then it should be a standalone series that implements a framework
that can be easily extended to arbitrary fields.  Adding fields to loaded_vmcs
one at a time will be tedious and error prone.  E.g. what makes TSC_MULTIPLIER
more special than TSC_OFFSET, GUEST_IA32_PAT, GUEST_IA32_DEBUGCTL, GUEST_BNDCFGS,
and other number of fields that are likely to persist for a given vmcs02?

The current caching logic is just plain ugly and should not exist.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 16:34             ` Sean Christopherson
@ 2021-05-25 17:34               ` Paolo Bonzini
  2021-05-25 18:21                 ` Sean Christopherson
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2021-05-25 17:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Stamatis, Ilias, mlevitsk, kvm, linux-kernel, jmattson,
	Woodhouse, David, vkuznets, joro, mtosatti, zamsden, wanpengli

On 25/05/21 18:34, Sean Christopherson wrote:
>> I actually like the idea of storing the expected value in kvm_vcpu and the
>> current value in loaded_vmcs.  We might use it for other things such as
>> reload_vmcs01_apic_access_page perhaps.
> I'm not necessarily opposed to aggressively shadowing the VMCS, but if we go
> that route then it should be a standalone series that implements a framework
> that can be easily extended to arbitrary fields.  Adding fields to loaded_vmcs
> one at a time will be tedious and error prone.  E.g. what makes TSC_MULTIPLIER
> more special than TSC_OFFSET, GUEST_IA32_PAT, GUEST_IA32_DEBUGCTL, GUEST_BNDCFGS,
> and other number of fields that are likely to persist for a given vmcs02?

That it can be changed via ioctls in a way that affects both vmcs01 and 
vmcs02.  So TSC_MULTIPLIER is in the same boat as TSC_OFFSET, which I 
agree we should shadow more aggressively, but the others are different.

Paolo


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 17:34               ` Paolo Bonzini
@ 2021-05-25 18:21                 ` Sean Christopherson
  0 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2021-05-25 18:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stamatis, Ilias, mlevitsk, kvm, linux-kernel, jmattson,
	Woodhouse, David, vkuznets, joro, mtosatti, zamsden, wanpengli

On Tue, May 25, 2021, Paolo Bonzini wrote:
> On 25/05/21 18:34, Sean Christopherson wrote:
> > > I actually like the idea of storing the expected value in kvm_vcpu and the
> > > current value in loaded_vmcs.  We might use it for other things such as
> > > reload_vmcs01_apic_access_page perhaps.
> > I'm not necessarily opposed to aggressively shadowing the VMCS, but if we go
> > that route then it should be a standalone series that implements a framework
> > that can be easily extended to arbitrary fields.  Adding fields to loaded_vmcs
> > one at a time will be tedious and error prone.  E.g. what makes TSC_MULTIPLIER
> > more special than TSC_OFFSET, GUEST_IA32_PAT, GUEST_IA32_DEBUGCTL, GUEST_BNDCFGS,
> > and other number of fields that are likely to persist for a given vmcs02?
> 
> That it can be changed via ioctls in a way that affects both vmcs01 and vmcs02.

That holds true for any MSR that is conditionally loaded/cleared on enter/exit,
e.g. userspace can stuff MSR_IA32_CR_PAT while L2 is active, and that can affect
L1 if L1 is running without VM_EXIT_LOAD_IA32_PAT.

I'm not saying that the above is likely, but neither is changing the TSC scaling
ratio while L2 is active (I assume it occurs on migration, but in the grand
scheme that's not a common operation).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 15:58         ` Sean Christopherson
  2021-05-25 16:15           ` Paolo Bonzini
@ 2021-05-25 18:52           ` Stamatis, Ilias
  2021-05-25 19:25           ` Stamatis, Ilias
  2 siblings, 0 replies; 39+ messages in thread
From: Stamatis, Ilias @ 2021-05-25 18:52 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets,
	mtosatti, joro, zamsden, mlevitsk, pbonzini, wanpengli

On Tue, 2021-05-25 at 15:58 +0000, Sean Christopherson wrote:
> On Tue, May 25, 2021, Stamatis, Ilias wrote:
> > On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> > > Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
> > > scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
> > > per-VMCS tracking, it should not be useful, keyword "should".
> > > 
> > > What I meant by my earlier comment:
> > > 
> > >   Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
> > >   earlier", which is all kinds of wrong.
> > > 
> > > is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
> > > behavior is to set the field at VMCS initialization, and then immediately set it
> > > whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> > > In other words, my unclear feedback was to make it obsolete (and drop it) by
> > > fixing the underlying mess, not to just drop the optimization hack.
> > 
> > I understood this and replied earlier. The right place for the hw multiplier
> > field to be updated is inside set_tsc_khz() in common code when the ratio
> > changes. However, this requires adding another vendor callback etc. As all
> > this is further refactoring I believe it's better to leave this series as is -
> > ie only touching code that is directly related to nested TSC scaling and not
> > try to do everything as part of the same series.
> 
> But it directly impacts your code, e.g. the nested enter/exit flows would need
> to dance around the decache silliness.  And I believe it even more directly
> impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace
> invokes KVM_SET_TSC_KHZ while L2 is active.

Good catch!

> 
> > This makes testing easier too.
> 
> Hmm, sort of.  Yes, the fewer patches/modifications in a series definitely makes
> the series itself easier to test.  But stepping back and looking at the total
> cost of testing, I would argue that punting related changes to a later time
> increases the overall cost.  E.g. if someone else picks up the clean up work,
> then they have to redo most, if not all, of the testing that you are already
> doing, including getting access to the proper hardware, understanding what tests
> to prioritize, etc...  Whereas adding one more patch to your series is an
> incremental cost since you already have the hardware setup, know which tests to
> run, etc...
> 
> > We can still implement these changes later.
> 
> We can, but we shouldn't.  Simply dropping vmx->current_tsc_ratio is not an
> option; it knowingly introduces a (minor) performance regression, for no reason
> other than wanting to avoid code churn.  Piling more stuff on top of the flawed
> decache logic is impolite, as it adds more work for the person that ends up
> doing the cleanup.  I would 100% agree if this were a significant cleanup and/or
> completely unrelated, but IMO that's not the case.
> 
> Compile tested only...

Thank you. 

> 
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 029c9615378f..34ad7a17458a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
>  KVM_X86_OP(get_l2_tsc_offset)
>  KVM_X86_OP(get_l2_tsc_multiplier)
>  KVM_X86_OP(write_tsc_offset)
> +KVM_X86_OP(write_tsc_multiplier)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
>  KVM_X86_OP(handle_exit_irqoff)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f099277b993d..a334ce7741ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
>         u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
>         u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
>         void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> +       void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);
> 
>         /*
>          * Retrieve somewhat arbitrary exit information.  Intended to be used
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b18f60463073..914afcceb46d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1103,6 +1103,14 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>         vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
>  }
> 
> +static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
> +{
> +       /*
> +        * Handled when loading guest state since the ratio is programmed via
> +        * MSR_AMD64_TSC_RATIO, not a field in the VMCB.
> +        */
> +}
> +

Ok, what I wanted to avoid really is having to dig into SVM code and see where
exactly it sets the TSC multiplier or having to implement
svm_write_tsc_multiplier as I knew AMD uses an MSR instead of a VMCB field.

But if we are fine with introducing this as is above (for now) I will include 
this in the series, apply the other small changes suggested and re-post the 
patches.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 15:58         ` Sean Christopherson
  2021-05-25 16:15           ` Paolo Bonzini
  2021-05-25 18:52           ` Stamatis, Ilias
@ 2021-05-25 19:25           ` Stamatis, Ilias
  2021-05-25 23:35             ` Sean Christopherson
  2 siblings, 1 reply; 39+ messages in thread
From: Stamatis, Ilias @ 2021-05-25 19:25 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets,
	mtosatti, joro, zamsden, mlevitsk, pbonzini, wanpengli

On Tue, 2021-05-25 at 15:58 +0000, Sean Christopherson wrote:
> On Tue, May 25, 2021, Stamatis, Ilias wrote:
> > On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> > > Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
> > > scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
> > > per-VMCS tracking, it should not be useful, keyword "should".
> > > 
> > > What I meant by my earlier comment:
> > > 
> > >   Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
> > >   earlier", which is all kinds of wrong.
> > > 
> > > is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
> > > behavior is to set the field at VMCS initialization, and then immediately set it
> > > whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> > > In other words, my unclear feedback was to make it obsolete (and drop it) by
> > > fixing the underlying mess, not to just drop the optimization hack.
> > 
> > I understood this and replied earlier. The right place for the hw multiplier
> > field to be updated is inside set_tsc_khz() in common code when the ratio
> > changes. However, this requires adding another vendor callback etc. As all
> > this is further refactoring I believe it's better to leave this series as is -
> > ie only touching code that is directly related to nested TSC scaling and not
> > try to do everything as part of the same series.
> 
> But it directly impacts your code, e.g. the nested enter/exit flows would need
> to dance around the decache silliness.  And I believe it even more directly
> impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace
> invokes KVM_SET_TSC_KHZ while L2 is active.
> 
> > This makes testing easier too.
> 
> Hmm, sort of.  Yes, the fewer patches/modifications in a series definitely makes
> the series itself easier to test.  But stepping back and looking at the total
> cost of testing, I would argue that punting related changes to a later time
> increases the overall cost.  E.g. if someone else picks up the clean up work,
> then they have to redo most, if not all, of the testing that you are already
> doing, including getting access to the proper hardware, understanding what tests
> to prioritize, etc...  Whereas adding one more patch to your series is an
> incremental cost since you already have the hardware setup, know which tests to
> run, etc...
> 
> > We can still implement these changes later.
> 
> We can, but we shouldn't.  Simply dropping vmx->current_tsc_ratio is not an
> option; it knowingly introduces a (minor) performance regression, for no reason
> other than wanting to avoid code churn.  Piling more stuff on top of the flawed
> decache logic is impolite, as it adds more work for the person that ends up
> doing the cleanup.  I would 100% agree if this were a significant cleanup and/or
> completely unrelated, but IMO that's not the case.
> 
> Compile tested only...
> 
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 029c9615378f..34ad7a17458a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
>  KVM_X86_OP(get_l2_tsc_offset)
>  KVM_X86_OP(get_l2_tsc_multiplier)
>  KVM_X86_OP(write_tsc_offset)
> +KVM_X86_OP(write_tsc_multiplier)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
>  KVM_X86_OP(handle_exit_irqoff)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f099277b993d..a334ce7741ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
>         u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
>         u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
>         void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> +       void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);
> 
>         /*
>          * Retrieve somewhat arbitrary exit information.  Intended to be used
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b18f60463073..914afcceb46d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1103,6 +1103,14 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>         vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
>  }
> 
> +static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
> +{
> +       /*
> +        * Handled when loading guest state since the ratio is programmed via
> +        * MSR_AMD64_TSC_RATIO, not a field in the VMCB.
> +        */
> +}
> +
>  /* Evaluate instruction intercepts that depend on guest CPUID features. */
>  static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
>                                               struct vcpu_svm *svm)
> @@ -4528,6 +4536,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>         .get_l2_tsc_offset = svm_get_l2_tsc_offset,
>         .get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
>         .write_tsc_offset = svm_write_tsc_offset,
> +       .write_tsc_multiplier = svm_write_tsc_multiplier,
> 
>         .load_mmu_pgd = svm_load_mmu_pgd,
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..712190493926 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2535,7 +2535,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>         vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
> 
>         if (kvm_has_tsc_control)
> -               decache_tsc_multiplier(vmx);
> +               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);
> 
>         nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
> 
> @@ -4505,7 +4505,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>                 vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);
> 
>         if (kvm_has_tsc_control)
> -               decache_tsc_multiplier(vmx);
> +               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);
> 
>         if (vmx->nested.change_vmcs01_virtual_apic_mode) {
>                 vmx->nested.change_vmcs01_virtual_apic_mode = false;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4b70431c2edd..bf845a08995e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1390,11 +1390,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> 
>                 vmx->loaded_vmcs->cpu = cpu;
>         }
> -
> -       /* Setup TSC multiplier */
> -       if (kvm_has_tsc_control &&
> -           vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> -               decache_tsc_multiplier(vmx);
>  }
> 
>  /*
> @@ -1813,6 +1808,11 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>         vmcs_write64(TSC_OFFSET, offset);
> ...skipping...
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -322,8 +322,6 @@ struct vcpu_vmx {
>         /* apic deadline value in host tsc */
>         u64 hv_deadline_tsc;
> 
> -       u64 current_tsc_ratio;
> -
>         unsigned long host_debugctlmsr;
> 
>         /*
> @@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
>                               GFP_KERNEL_ACCOUNT);
>  }
> 
> -static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
> -{
> -       vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
> -       vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
> -}
> -
>  static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
>  {
>         return vmx->secondary_exec_control &
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b61b54cea495..690de1868873 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2179,14 +2179,16 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
>         return v;
>  }
> 
> +static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
> +                                         u64 l1_multiplier);
> +
>  static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  {
>         u64 ratio;
> 
>         /* Guest TSC same frequency as host TSC? */
>         if (!scale) {
> -               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> -               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> +               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
>                 return 0;
>         }
> 
> @@ -2212,7 +2214,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>                 return -1;
>         }
> 
> -       vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
> +       kvm_vcpu_write_tsc_multiplier(vcpu, ratio);
>         return 0;
>  }
> 
> @@ -2224,8 +2226,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
>         /* tsc_khz can be zero if TSC calibration fails */
>         if (user_tsc_khz == 0) {
>                 /* set tsc_scaling_ratio to a safe value */
> -               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> -               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> +               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
>                 return -1;
>         }
> 
> @@ -2383,6 +2384,25 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
>         static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
>  }
> 
> +static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
> +                                         u64 l1_multiplier)
> +{
> +       if (!kvm_has_tsc_control)
> +               return;
> +
> +       vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
> +
> +       /* Userspace is changing the multiplier while L2 is active... */
> +       if (is_guest_mode(vcpu))
> +               vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
> +                       l1_multiplier,
> +                       static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
> +       else
> +               vcpu->arch.tsc_scaling_ratio = l1_multiplier;
> +
> +       static_call(kvm_x86_write_tsc_multiplier)(vcpu, vcpu->arch.tsc_scaling_ratio);
> +}
> +
>  static inline bool kvm_check_tsc_unstable(void)
>  {
>  #ifdef CONFIG_X86_64

Hmm, this patch actually still removes the caching and introduces a small
performance overhead. For example if neither L1 nor L2 are scaled it will
still do a vmwrite for every L2 entry/write.

So do we want to get rid of decache_tsc_multiplier() but keep 
vmx->current_tsc_ratio and do the check inside write_tsc_multiplier()? Or 
alternatively delete vmx->current_tsc_ratio too and have 
write_tsc_multiplier() receive 2 parameters, one of the old multiplier and 
one of the new?



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
  2021-05-25 19:25           ` Stamatis, Ilias
@ 2021-05-25 23:35             ` Sean Christopherson
  0 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2021-05-25 23:35 UTC (permalink / raw)
  To: Stamatis, Ilias
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets,
	mtosatti, joro, zamsden, mlevitsk, pbonzini, wanpengli

On Tue, May 25, 2021, Stamatis, Ilias wrote:
> Hmm, this patch actually still removes the caching and introduces a small
> performance overhead. For example if neither L1 nor L2 are scaled it will
> still do a vmwrite for every L2 entry/write.

True, but there is an ocean of difference between the relative performance of
vmx_vcpu_load_vmcs() and a nested transition.  vmx_vcpu_load_vmcs() is also
called much more frequently.

> So do we want to get rid of decache_tsc_multiplier() but keep 
> vmx->current_tsc_ratio and do the check inside write_tsc_multiplier()? Or 
> alternatively delete vmx->current_tsc_ratio too and have 
> write_tsc_multiplier() receive 2 parameters, one of the old multiplier and 
> one of the new?

My vote is to kill it, eat the barely-noticeable perf hit on nVMX, and tackle
the aggressive VMCS shadowing in a separate series.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-05-25 23:35 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
2021-05-24 17:49   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
2021-05-24 17:49   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1() Ilias Stamatis
2021-05-24 14:21   ` Paolo Bonzini
2021-05-24 17:49     ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
2021-05-24 14:23   ` Paolo Bonzini
2021-05-24 15:48     ` Sean Christopherson
2021-05-24 15:56       ` Paolo Bonzini
2021-05-24 17:50     ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 05/12] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
2021-05-24 17:50   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier Ilias Stamatis
2021-05-24 17:51   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
2021-05-24 17:51   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier() Ilias Stamatis
2021-05-24 17:53   ` Maxim Levitsky
2021-05-24 18:44     ` Sean Christopherson
2021-05-25 10:41       ` Stamatis, Ilias
2021-05-25 15:58         ` Sean Christopherson
2021-05-25 16:15           ` Paolo Bonzini
2021-05-25 16:34             ` Sean Christopherson
2021-05-25 17:34               ` Paolo Bonzini
2021-05-25 18:21                 ` Sean Christopherson
2021-05-25 18:52           ` Stamatis, Ilias
2021-05-25 19:25           ` Stamatis, Ilias
2021-05-25 23:35             ` Sean Christopherson
2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
2021-05-24 17:54   ` Maxim Levitsky
2021-05-25 16:05   ` Sean Christopherson
2021-05-21 10:24 ` [PATCH v3 11/12] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
2021-05-24 17:55   ` Maxim Levitsky
2021-05-24 15:37 ` [PATCH v3 00/12] KVM: Implement nested TSC scaling Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.