kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] KVM: Implement nested TSC scaling
@ 2021-05-12 15:09 Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

KVM currently supports hardware-assisted TSC scaling but only for L1;
the feature is not exposed to nested guests. This patch series adds
support for nested TSC scaling and allows both L1 and L2 to be scaled
with different scaling factors. That is achieved by "merging" the 01 and
02 values together.

Most of the logic in this series is implemented in common code (by doing
the necessary restructurings), however the patches add support for VMX
only. Adding support for SVM should be easy at this point and Maxim
Levitsky has volunteered to do this (thanks!).

Changelog:
v2:
  - Applied most (all?) of Maxim's feedback
  - Added a mul_s64_u64_shr function in math64.h
  - Added a separate kvm_scale_tsc_l1 function instead of passing an
    argument to kvm_scale_tsc
  - Implemented the 02 fields calculations in common code
  - Moved all of write_l1_tsc_offset's logic to common code
  - Added a check for whether the TSC is stable in patch 10
  - Used a random L1 factor and a negative offset in patch 10

Ilias Stamatis (10):
  math64.h: Add mul_s64_u64_shr()
  KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1()
  KVM: VMX: Add a TSC multiplier field in VMCS12
  KVM: X86: Add functions for retrieving L2 TSC fields from common code
  KVM: X86: Add functions that calculate the 02 TSC offset and
    multiplier
  KVM: X86: Move write_l1_tsc_offset() logic to common code and rename
    it
  KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  KVM: VMX: Expose TSC scaling to L2
  KVM: selftests: x86: Add vmx_nested_tsc_scaling_test

 arch/x86/include/asm/kvm-x86-ops.h            |   4 +-
 arch/x86/include/asm/kvm_host.h               |  13 +-
 arch/x86/kvm/svm/svm.c                        |  29 ++-
 arch/x86/kvm/vmx/nested.c                     |  19 +-
 arch/x86/kvm/vmx/vmcs12.c                     |   1 +
 arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
 arch/x86/kvm/vmx/vmx.c                        |  42 +--
 arch/x86/kvm/x86.c                            |  93 +++++--
 include/linux/math64.h                        |  19 ++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
 12 files changed, 409 insertions(+), 59 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr()
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-18 22:36   ` Sean Christopherson
  2021-05-12 15:09 ` [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

This function is needed for the nested TSC scaling code where we need to
multiply the signed TSC offset with the unsigned TSC multiplier.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 include/linux/math64.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/math64.h b/include/linux/math64.h
index 66deb1fdc2ef..2928f03d6d46 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -3,6 +3,7 @@
 #define _LINUX_MATH64_H
 
 #include <linux/types.h>
+#include <linux/math.h>
 #include <vdso/math64.h>
 #include <asm/div64.h>
 
@@ -234,6 +235,24 @@ static inline u64 mul_u64_u64_shr(u64 a, u64 b, unsigned int shift)
 
 #endif
 
+#ifndef mul_s64_u64_shr
+static inline u64 mul_s64_u64_shr(s64 a, u64 b, unsigned int shift)
+{
+	u64 ret;
+
+	/*
+	 * Extract the sign before the multiplication and put it back
+	 * afterwards if needed.
+	 */
+	ret = mul_u64_u64_shr(abs(a), b, shift);
+
+	if (a < 0)
+		ret = -((s64) ret);
+
+	return ret;
+}
+#endif /* mul_s64_u64_shr */
+
 #ifndef mul_u64_u32_div
 static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-18 22:52   ` Sean Christopherson
  2021-05-12 15:09 ` [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1() Ilias Stamatis
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Store L1's scaling ratio in that struct like we already do for L1's TSC
offset. This allows for easy save/restore when we enter and then exit
the nested guest.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++--
 arch/x86/kvm/x86.c              | 6 ++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..7dfc609eacd6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -707,7 +707,7 @@ struct kvm_vcpu_arch {
 	} st;
 
 	u64 l1_tsc_offset;
-	u64 tsc_offset;
+	u64 tsc_offset; /* current tsc offset */
 	u64 last_guest_tsc;
 	u64 last_host_tsc;
 	u64 tsc_offset_adjustment;
@@ -721,7 +721,8 @@ struct kvm_vcpu_arch {
 	u32 virtual_tsc_khz;
 	s64 ia32_tsc_adjust_msr;
 	u64 msr_ia32_power_ctl;
-	u64 tsc_scaling_ratio;
+	u64 l1_tsc_scaling_ratio;
+	u64 tsc_scaling_ratio; /* current scaling ratio */
 
 	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
 	unsigned nmi_pending; /* NMI queued after currently running handler */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..07cf5d7ece38 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2185,6 +2185,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 
 	/* Guest TSC same frequency as host TSC? */
 	if (!scale) {
+		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return 0;
 	}
@@ -2211,7 +2212,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 		return -1;
 	}
 
-	vcpu->arch.tsc_scaling_ratio = ratio;
+	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
 	return 0;
 }
 
@@ -2223,6 +2224,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
 	/* tsc_khz can be zero if TSC calibration fails */
 	if (user_tsc_khz == 0) {
 		/* set tsc_scaling_ratio to a safe value */
+		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return -1;
 	}
@@ -2459,7 +2461,7 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
 
 static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 {
-	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
+	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
 		WARN_ON(adjustment < 0);
 	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
 	adjust_tsc_offset_guest(vcpu, adjustment);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1()
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-18 23:04   ` Sean Christopherson
  2021-05-12 15:09 ` [PATCH v2 04/10] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

The existing kvm_scale_tsc() scales the TSC using the current TSC
scaling ratio. That used to be the same as L1's scaling ratio but now
with nested TSC scaling support it is no longer the case.

This patch adds a new kvm_scale_tsc_l1() function that scales the TSC
using L1's scaling ratio. The existing kvm_scale_tsc() can still be used
for scaling L2 TSC values.

Additionally, this patch renames the kvm_compute_tsc_offset() function
to kvm_compute_tsc_offset_l1() and has the function treat its TSC
argument as an L1 TSC value. All existing code uses this function
passing L1 values to it.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 41 ++++++++++++++++++++++-----------
 2 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7dfc609eacd6..be59197e5eb7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1789,6 +1789,7 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
 }
 
 u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
+u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 07cf5d7ece38..84af1af7a2cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2319,18 +2319,30 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_scale_tsc);
 
-static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc)
+{
+	u64 _tsc = tsc;
+	u64 ratio = vcpu->arch.l1_tsc_scaling_ratio;
+
+	if (ratio != kvm_default_tsc_scaling_ratio)
+		_tsc = __scale_tsc(ratio, tsc);
+
+	return _tsc;
+}
+EXPORT_SYMBOL_GPL(kvm_scale_tsc_l1);
+
+static u64 kvm_compute_tsc_offset_l1(struct kvm_vcpu *vcpu, u64 target_tsc)
 {
 	u64 tsc;
 
-	tsc = kvm_scale_tsc(vcpu, rdtsc());
+	tsc = kvm_scale_tsc_l1(vcpu, rdtsc());
 
 	return target_tsc - tsc;
 }
 
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 {
-	return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
+	return vcpu->arch.l1_tsc_offset + kvm_scale_tsc_l1(vcpu, host_tsc);
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
@@ -2363,7 +2375,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
-	offset = kvm_compute_tsc_offset(vcpu, data);
+	offset = kvm_compute_tsc_offset_l1(vcpu, data);
 	ns = get_kvmclock_base_ns();
 	elapsed = ns - kvm->arch.last_tsc_nsec;
 
@@ -2402,7 +2414,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 		} else {
 			u64 delta = nsec_to_cycles(vcpu, elapsed);
 			data += delta;
-			offset = kvm_compute_tsc_offset(vcpu, data);
+			offset = kvm_compute_tsc_offset_l1(vcpu, data);
 		}
 		matched = true;
 		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
@@ -2463,7 +2475,7 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 {
 	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
 		WARN_ON(adjustment < 0);
-	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
+	adjustment = kvm_scale_tsc_l1(vcpu, (u64) adjustment);
 	adjust_tsc_offset_guest(vcpu, adjustment);
 }
 
@@ -2846,7 +2858,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	/* With all the info we got, fill in the values */
 
 	if (kvm_has_tsc_control)
-		tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz);
+		tgt_tsc_khz = kvm_scale_tsc_l1(v, tgt_tsc_khz);
 
 	if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
 		kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
@@ -3235,7 +3247,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr_info->host_initiated) {
 			kvm_synchronize_tsc(vcpu, data);
 		} else {
-			u64 adj = kvm_compute_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
+			u64 adj = kvm_compute_tsc_offset_l1(vcpu, data) - vcpu->arch.l1_tsc_offset;
 			adjust_tsc_offset_guest(vcpu, adj);
 			vcpu->arch.ia32_tsc_adjust_msr += adj;
 		}
@@ -3537,10 +3549,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * return L1's TSC value to ensure backwards-compatible
 		 * behavior for migration.
 		 */
-		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
-							    vcpu->arch.tsc_offset;
-
-		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
+		if (msr_info->host_initiated) {
+			msr_info->data = kvm_scale_tsc_l1(vcpu, rdtsc()) +
+					 vcpu->arch.l1_tsc_offset;
+		} else {
+			msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) +
+					 vcpu->arch.tsc_offset;
+		}
 		break;
 	}
 	case MSR_MTRRcap:
@@ -4123,7 +4138,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
 		if (kvm_check_tsc_unstable()) {
-			u64 offset = kvm_compute_tsc_offset(vcpu,
+			u64 offset = kvm_compute_tsc_offset_l1(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
 			vcpu->arch.tsc_catchup = 1;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 04/10] KVM: VMX: Add a TSC multiplier field in VMCS12
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (2 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1() Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 05/10] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

This is required for supporting nested TSC scaling.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/vmcs12.c | 1 +
 arch/x86/kvm/vmx/vmcs12.h | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 034adb6404dc..d9f5d7c56ae3 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -37,6 +37,7 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(VM_ENTRY_MSR_LOAD_ADDR, vm_entry_msr_load_addr),
 	FIELD64(PML_ADDRESS, pml_address),
 	FIELD64(TSC_OFFSET, tsc_offset),
+	FIELD64(TSC_MULTIPLIER, tsc_multiplier),
 	FIELD64(VIRTUAL_APIC_PAGE_ADDR, virtual_apic_page_addr),
 	FIELD64(APIC_ACCESS_ADDR, apic_access_addr),
 	FIELD64(POSTED_INTR_DESC_ADDR, posted_intr_desc_addr),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 13494956d0e9..bb81a23afe89 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -70,7 +70,8 @@ struct __packed vmcs12 {
 	u64 eptp_list_address;
 	u64 pml_address;
 	u64 encls_exiting_bitmap;
-	u64 padding64[2]; /* room for future expansion */
+	u64 tsc_multiplier;
+	u64 padding64[1]; /* room for future expansion */
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -258,6 +259,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(eptp_list_address, 304);
 	CHECK_OFFSET(pml_address, 312);
 	CHECK_OFFSET(encls_exiting_bitmap, 320);
+	CHECK_OFFSET(tsc_multiplier, 328);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 05/10] KVM: X86: Add functions for retrieving L2 TSC fields from common code
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (3 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 04/10] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier Ilias Stamatis
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

In order to implement as much of the nested TSC scaling logic as
possible in common code, we need these vendor callbacks for retrieving
the TSC offset and the TSC multiplier that L1 has set for L2.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 ++
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/svm/svm.c             | 14 ++++++++++++++
 arch/x86/kvm/vmx/vmx.c             | 25 +++++++++++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 323641097f63..2063616fba1c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -87,6 +87,8 @@ KVM_X86_OP(set_identity_map_addr)
 KVM_X86_OP(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP_NULL(has_wbinvd_exit)
+KVM_X86_OP(get_l2_tsc_offset)
+KVM_X86_OP(get_l2_tsc_multiplier)
 KVM_X86_OP(write_l1_tsc_offset)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index be59197e5eb7..4c4a3fefff57 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1305,6 +1305,8 @@ struct kvm_x86_ops {
 
 	bool (*has_wbinvd_exit)(void);
 
+	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
+	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
 	/* Returns actual tsc_offset set in active VMCS */
 	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dfa351e605de..679b2fc1a3f9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1082,6 +1082,18 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type)
 	seg->base = 0;
 }
 
+static u64 svm_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->nested.ctl.tsc_offset;
+}
+
+static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	return kvm_default_tsc_scaling_ratio;
+}
+
 static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4526,6 +4538,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.has_wbinvd_exit = svm_has_wbinvd_exit,
 
+	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
 	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bceb5ca3a89..575e13bddda8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1787,6 +1787,29 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 	vmx->guest_uret_msrs_loaded = false;
 }
 
+static u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	u64 offset = 0;
+
+	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
+		offset = vmcs12->tsc_offset;
+
+	return offset;
+}
+
+static u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	u64 multiplier = kvm_default_tsc_scaling_ratio;
+
+	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING &&
+	    vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING)
+		multiplier = vmcs12->tsc_multiplier;
+
+	return multiplier;
+}
+
 static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -7700,6 +7723,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
+	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
 	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (4 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 05/10] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-18 23:21   ` Sean Christopherson
  2021-05-12 15:09 ` [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

When L2 is entered we need to "merge" the TSC multiplier and TSC offset
values of 01 and 12 together.

The merging is done using the following equations:
  offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
  mult_02 = (mult_01 * mult_12) >> shift_bits

Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4c4a3fefff57..57a25d8e8b0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1793,6 +1793,8 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
 u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
 u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
+void kvm_set_02_tsc_offset(struct kvm_vcpu *vcpu);
+void kvm_set_02_tsc_multiplier(struct kvm_vcpu *vcpu);
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84af1af7a2cc..1db6cfc2079f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2346,6 +2346,35 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+void kvm_set_02_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	u64 l2_offset = static_call(kvm_x86_get_l2_tsc_offset)(vcpu);
+	u64 l2_multiplier = static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu);
+
+	if (l2_multiplier != kvm_default_tsc_scaling_ratio) {
+		vcpu->arch.tsc_offset = mul_s64_u64_shr(
+				(s64) vcpu->arch.l1_tsc_offset,
+				l2_multiplier,
+				kvm_tsc_scaling_ratio_frac_bits);
+	}
+
+	vcpu->arch.tsc_offset += l2_offset;
+}
+EXPORT_SYMBOL_GPL(kvm_set_02_tsc_offset);
+
+void kvm_set_02_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	u64 l2_multiplier = static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu);
+
+	if (l2_multiplier != kvm_default_tsc_scaling_ratio) {
+		vcpu->arch.tsc_scaling_ratio = mul_u64_u64_shr(
+				vcpu->arch.l1_tsc_scaling_ratio,
+				l2_multiplier,
+				kvm_tsc_scaling_ratio_frac_bits);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (5 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-19  0:05   ` Sean Christopherson
  2021-05-12 15:09 ` [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

The write_l1_tsc_offset() callback has a misleading name. It does not
set L1's TSC offset, it rather updates the current TSC offset which
might be different if a nested guest is executing. Additionally, both
the vmx and svm implementations use the same logic for calculating the
current TSC before writing it to hardware.

This patch renames the function and moves the common logic to the
caller. The vmx/svm-specific code now merely sets the given offset to
the corresponding hardware structure.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/include/asm/kvm_host.h    |  3 +--
 arch/x86/kvm/svm/svm.c             | 21 ++++-----------------
 arch/x86/kvm/vmx/vmx.c             | 23 +++--------------------
 arch/x86/kvm/x86.c                 | 17 ++++++++++++++++-
 5 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 2063616fba1c..029c9615378f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -89,7 +89,7 @@ KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP_NULL(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
-KVM_X86_OP(write_l1_tsc_offset)
+KVM_X86_OP(write_tsc_offset)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 57a25d8e8b0f..61cf201c001a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1307,8 +1307,7 @@ struct kvm_x86_ops {
 
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
 	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
-	/* Returns actual tsc_offset set in active VMCS */
-	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
 	/*
 	 * Retrieve somewhat arbitrary exit information.  Intended to be used
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 679b2fc1a3f9..b18f60463073 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1094,26 +1094,13 @@ static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_default_tsc_scaling_ratio;
 }
 
-static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	u64 g_tsc_offset = 0;
-
-	if (is_guest_mode(vcpu)) {
-		/* Write L1's TSC offset.  */
-		g_tsc_offset = svm->vmcb->control.tsc_offset -
-			       svm->vmcb01.ptr->control.tsc_offset;
-		svm->vmcb01.ptr->control.tsc_offset = offset;
-	}
-
-	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-				   svm->vmcb->control.tsc_offset - g_tsc_offset,
-				   offset);
-
-	svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
 
+	svm->vmcb01.ptr->control.tsc_offset = vcpu->arch.l1_tsc_offset;
+	svm->vmcb->control.tsc_offset = offset;
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
-	return svm->vmcb->control.tsc_offset;
 }
 
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
@@ -4540,7 +4527,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
-	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
+	.write_tsc_offset = svm_write_tsc_offset,
 
 	.load_mmu_pgd = svm_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 575e13bddda8..3c4eb14a1e86 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1810,26 +1810,9 @@ static u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return multiplier;
 }
 
-static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
-	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-	u64 g_tsc_offset = 0;
-
-	/*
-	 * We're here if L1 chose not to trap WRMSR to TSC. According
-	 * to the spec, this should set L1's TSC; The offset that L1
-	 * set for L2 remains unchanged, and still needs to be added
-	 * to the newly set TSC to get L2's TSC.
-	 */
-	if (is_guest_mode(vcpu) &&
-	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
-		g_tsc_offset = vmcs12->tsc_offset;
-
-	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-				   vcpu->arch.tsc_offset - g_tsc_offset,
-				   offset);
-	vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
-	return offset + g_tsc_offset;
+	vmcs_write64(TSC_OFFSET, offset);
 }
 
 /*
@@ -7725,7 +7708,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
 	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
-	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
+	.write_tsc_offset = vmx_write_tsc_offset,
 
 	.load_mmu_pgd = vmx_load_mmu_pgd,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1db6cfc2079f..f3ba1be4d5b9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2377,8 +2377,23 @@ EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
 
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
+	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
+				   vcpu->arch.l1_tsc_offset,
+				   offset);
+
 	vcpu->arch.l1_tsc_offset = offset;
-	vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
+	vcpu->arch.tsc_offset = offset;
+
+	if (is_guest_mode(vcpu)) {
+		/*
+		 * We're here if L1 chose not to trap WRMSR to TSC and
+		 * according to the spec this should set L1's TSC (as opposed
+		 * to setting L1's offset for L2).
+		 */
+		kvm_set_02_tsc_offset(vcpu);
+	}
+
+	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }
 
 static inline bool kvm_check_tsc_unstable(void)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (6 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-19  0:07   ` Sean Christopherson
  2021-05-12 15:09 ` [PATCH v2 09/10] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 10/10] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
  9 siblings, 1 reply; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Now that nested TSC scaling is supported we need to calculate the
correct 02 values for both the offset and the multiplier using the
corresponding functions. On L2's exit the L1 values are restored.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 arch/x86/kvm/vmx/nested.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..f1dff1ebaccb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3354,8 +3354,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	}
 
 	enter_guest_mode(vcpu);
-	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
-		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
+
+	kvm_set_02_tsc_offset(vcpu);
+	kvm_set_02_tsc_multiplier(vcpu);
 
 	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
 		exit_reason.basic = EXIT_REASON_INVALID_STATE;
@@ -4463,8 +4464,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	if (nested_cpu_has_preemption_timer(vmcs12))
 		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
 
-	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
-		vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
+	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING) {
+		vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
+
+		if (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING)
+			vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
+	}
 
 	if (likely(!vmx->fail)) {
 		sync_vmcs02_to_vmcs12(vcpu, vmcs12);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 09/10] KVM: VMX: Expose TSC scaling to L2
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (7 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  2021-05-12 15:09 ` [PATCH v2 10/10] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
  9 siblings, 0 replies; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Expose the TSC scaling feature to nested guests.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f1dff1ebaccb..d431e17dbc5b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2277,7 +2277,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 				  SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
-				  SECONDARY_EXEC_ENABLE_VMFUNC);
+				  SECONDARY_EXEC_ENABLE_VMFUNC |
+				  SECONDARY_EXEC_TSC_SCALING);
 		if (nested_cpu_has(vmcs12,
 				   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
 			exec_control |= vmcs12->secondary_vm_exec_control;
@@ -6479,7 +6480,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		SECONDARY_EXEC_RDRAND_EXITING |
 		SECONDARY_EXEC_ENABLE_INVPCID |
 		SECONDARY_EXEC_RDSEED_EXITING |
-		SECONDARY_EXEC_XSAVES;
+		SECONDARY_EXEC_XSAVES |
+		SECONDARY_EXEC_TSC_SCALING;
 
 	/*
 	 * We can emulate "VMCS shadowing," even if the hardware
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 10/10] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test
  2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
                   ` (8 preceding siblings ...)
  2021-05-12 15:09 ` [PATCH v2 09/10] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
@ 2021-05-12 15:09 ` Ilias Stamatis
  9 siblings, 0 replies; 25+ messages in thread
From: Ilias Stamatis @ 2021-05-12 15:09 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini
  Cc: mlevitsk, seanjc, vkuznets, wanpengli, jmattson, joro, zamsden,
	mtosatti, dwmw, ilstam

Test that nested TSC scaling works as expected with both L1 and L2
scaled.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/vmx_nested_tsc_scaling_test.c  | 242 ++++++++++++++++++
 3 files changed, 244 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..cc02022f9951 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -29,6 +29,7 @@
 /x86_64/vmx_preemption_timer_test
 /x86_64/vmx_set_nested_state_test
 /x86_64/vmx_tsc_adjust_test
+/x86_64/vmx_nested_tsc_scaling_test
 /x86_64/xapic_ipi_test
 /x86_64/xen_shinfo_test
 /x86_64/xen_vmcall_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..1078240b1313 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -60,6 +60,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
 TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
 TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
new file mode 100644
index 000000000000..f4a8f426281a
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vmx_nested_tsc_scaling_test
+ *
+ * Copyright (C) 2021 Amazon.com, Inc. or its affiliates.
+ *
+ * This test case verifies that nested TSC scaling behaves as expected when
+ * both L1 and L2 are scaled using different ratios. For this test we scale
+ * L1 down and scale L2 up.
+ */
+
+#include <time.h>
+
+#include "kvm_util.h"
+#include "vmx.h"
+#include "kselftest.h"
+
+
+#define VCPU_ID 0
+
+/* L2 is scaled up (from L1's perspective) by this factor */
+#define L2_SCALE_FACTOR 4ULL
+
+#define TSC_OFFSET_L2 ((uint64_t) -33125236320908)
+#define TSC_MULTIPLIER_L2 (L2_SCALE_FACTOR << 48)
+
+#define L2_GUEST_STACK_SIZE 64
+
+enum { USLEEP, UCHECK_L1, UCHECK_L2 };
+#define GUEST_SLEEP(sec)         ucall(UCALL_SYNC, 2, USLEEP, sec)
+#define GUEST_CHECK(level, freq) ucall(UCALL_SYNC, 2, level, freq)
+
+
+/*
+ * This function checks whether the "actual" TSC frequency of a guest matches
+ * its expected frequency. In order to account for delays in taking the TSC
+ * measurements, a difference of 1% between the actual and the expected value
+ * is tolerated.
+ */
+static void compare_tsc_freq(uint64_t actual, uint64_t expected)
+{
+	uint64_t tolerance, thresh_low, thresh_high;
+
+	tolerance = expected / 100;
+	thresh_low = expected - tolerance;
+	thresh_high = expected + tolerance;
+
+	TEST_ASSERT(thresh_low < actual,
+		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
+		" but it actually is %"PRIu64,
+		thresh_low, thresh_high, actual);
+	TEST_ASSERT(thresh_high > actual,
+		"TSC freq is expected to be between %"PRIu64" and %"PRIu64
+		" but it actually is %"PRIu64,
+		thresh_low, thresh_high, actual);
+}
+
+static void check_tsc_freq(int level)
+{
+	uint64_t tsc_start, tsc_end, tsc_freq;
+
+	/*
+	 * Reading the TSC twice with about a second's difference should give
+	 * us an approximation of the TSC frequency from the guest's
+	 * perspective. Now, this won't be completely accurate, but it should
+	 * be good enough for the purposes of this test.
+	 */
+	tsc_start = rdmsr(MSR_IA32_TSC);
+	GUEST_SLEEP(1);
+	tsc_end = rdmsr(MSR_IA32_TSC);
+
+	tsc_freq = tsc_end - tsc_start;
+
+	GUEST_CHECK(level, tsc_freq);
+}
+
+static void l2_guest_code(void)
+{
+	check_tsc_freq(UCHECK_L2);
+
+	/* exit to L1 */
+	__asm__ __volatile__("vmcall");
+}
+
+static void l1_guest_code(struct vmx_pages *vmx_pages)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	uint32_t control;
+
+	/* check that L1's frequency looks alright before launching L2 */
+	check_tsc_freq(UCHECK_L1);
+
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+
+	/* prepare the VMCS for L2 execution */
+	prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+	/* enable TSC offsetting and TSC scaling for L2 */
+	control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
+	control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
+	vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
+
+	control = vmreadz(SECONDARY_VM_EXEC_CONTROL);
+	control |= SECONDARY_EXEC_TSC_SCALING;
+	vmwrite(SECONDARY_VM_EXEC_CONTROL, control);
+
+	vmwrite(TSC_OFFSET, TSC_OFFSET_L2);
+	vmwrite(TSC_MULTIPLIER, TSC_MULTIPLIER_L2);
+	vmwrite(TSC_MULTIPLIER_HIGH, TSC_MULTIPLIER_L2 >> 32);
+
+	/* launch L2 */
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+
+	/* check that L1's frequency still looks good */
+	check_tsc_freq(UCHECK_L1);
+
+	GUEST_DONE();
+}
+
+static void tsc_scaling_check_supported(void)
+{
+	if (!kvm_check_cap(KVM_CAP_TSC_CONTROL)) {
+		print_skip("TSC scaling not supported by the HW");
+		exit(KSFT_SKIP);
+	}
+}
+
+static void stable_tsc_check_supported(void)
+{
+	FILE *fp;
+	char buf[4];
+
+	fp = fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", "r");
+	if (fp == NULL)
+		goto skip_test;
+
+	if (fgets(buf, sizeof(buf), fp) == NULL)
+		goto skip_test;
+
+	if (strncmp(buf, "tsc", sizeof(buf)))
+		goto skip_test;
+
+	return;
+skip_test:
+	print_skip("TSC is not stable");
+	exit(KSFT_SKIP);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	vm_vaddr_t vmx_pages_gva;
+
+	uint64_t tsc_start, tsc_end;
+	uint64_t tsc_khz;
+	uint64_t l1_scale_factor;
+	uint64_t l0_tsc_freq = 0;
+	uint64_t l1_tsc_freq = 0;
+	uint64_t l2_tsc_freq = 0;
+
+	nested_vmx_check_supported();
+	tsc_scaling_check_supported();
+	stable_tsc_check_supported();
+
+	/*
+	 * We set L1's scale factor to be a random number from 2 to 10.
+	 * Ideally we would do the same for L2's factor but that one is
+	 * referenced by both main() and l1_guest_code() and using a global
+	 * variable does not work.
+	 */
+	srand(time(NULL));
+	l1_scale_factor = (rand() % 9) + 2;
+	printf("L1's scale factor is: %"PRIu64"\n", l1_scale_factor);
+	printf("L2's scale factor is: %llu\n", L2_SCALE_FACTOR);
+
+	tsc_start = rdtsc();
+	sleep(1);
+	tsc_end = rdtsc();
+
+	l0_tsc_freq = tsc_end - tsc_start;
+	printf("real TSC frequency is around: %"PRIu64"\n", l0_tsc_freq);
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code);
+	vcpu_alloc_vmx(vm, &vmx_pages_gva);
+	vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+
+	tsc_khz = _vcpu_ioctl(vm, VCPU_ID, KVM_GET_TSC_KHZ, NULL);
+	TEST_ASSERT(tsc_khz != -1, "vcpu ioctl KVM_GET_TSC_KHZ failed");
+
+	/* scale down L1's TSC frequency */
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_TSC_KHZ,
+		  (void *) (tsc_khz / l1_scale_factor));
+
+	for (;;) {
+		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+		struct ucall uc;
+
+		vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+			    run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *) uc.args[0]);
+		case UCALL_SYNC:
+			switch (uc.args[0]) {
+			case USLEEP:
+				sleep(uc.args[1]);
+				break;
+			case UCHECK_L1:
+				l1_tsc_freq = uc.args[1];
+				printf("L1's TSC frequency is around: %"PRIu64
+				       "\n", l1_tsc_freq);
+
+				compare_tsc_freq(l1_tsc_freq,
+						 l0_tsc_freq / l1_scale_factor);
+				break;
+			case UCHECK_L2:
+				l2_tsc_freq = uc.args[1];
+				printf("L2's TSC frequency is around: %"PRIu64
+				       "\n", l2_tsc_freq);
+
+				compare_tsc_freq(l2_tsc_freq,
+						 l1_tsc_freq * L2_SCALE_FACTOR);
+				break;
+			}
+			break;
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		}
+	}
+
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr()
  2021-05-12 15:09 ` [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
@ 2021-05-18 22:36   ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-05-18 22:36 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Wed, May 12, 2021, Ilias Stamatis wrote:
> This function is needed for the nested TSC scaling code where we need to

s/the nested/KVM's nested virtualization.  Non-KVM folk will have no idea what
"nested" means without context.  They may know not what nested virtualization is,
but this will at least give them a starting point for asking the right questions.

> multiply the signed TSC offset with the unsigned TSC multiplier.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  2021-05-12 15:09 ` [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
@ 2021-05-18 22:52   ` Sean Christopherson
  2021-05-19  8:54     ` Stamatis, Ilias
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-05-18 22:52 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Wed, May 12, 2021, Ilias Stamatis wrote:
> Store L1's scaling ratio in that struct like we already do for L1's TSC

s/that struct/kvm_vcpu_arch.  Forcing the reader to look at the subject to
understand the changelog is annoying, especially when it saves all of a handful
of characters.  E.g. I often read patches without the subject in scope.

> offset. This allows for easy save/restore when we enter and then exit
> the nested guest.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..07cf5d7ece38 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2185,6 +2185,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  
>  	/* Guest TSC same frequency as host TSC? */
>  	if (!scale) {
> +		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;

Looks like these are always set as a pair, maybe add a helper, e.g.

static void kvm_set_l1_tsc_scaling_ratio(u64 ratio)
{
	vcpu->arch.l1_tsc_scaling_ratio = ratio;
	vcpu->arch.tsc_scaling_ratio = ratio;
}

>  		return 0;
>  	}
> @@ -2211,7 +2212,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  		return -1;
>  	}
>  
> -	vcpu->arch.tsc_scaling_ratio = ratio;
> +	vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
>  	return 0;
>  }
>  
> @@ -2223,6 +2224,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
>  	/* tsc_khz can be zero if TSC calibration fails */
>  	if (user_tsc_khz == 0) {
>  		/* set tsc_scaling_ratio to a safe value */
> +		vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
>  		return -1;
>  	}
> @@ -2459,7 +2461,7 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
>  
>  static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
>  {
> -	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
> +	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
>  		WARN_ON(adjustment < 0);
>  	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
>  	adjust_tsc_offset_guest(vcpu, adjustment);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1()
  2021-05-12 15:09 ` [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1() Ilias Stamatis
@ 2021-05-18 23:04   ` Sean Christopherson
  2021-05-19  9:02     ` Stamatis, Ilias
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-05-18 23:04 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Wed, May 12, 2021, Ilias Stamatis wrote:
> The existing kvm_scale_tsc() scales the TSC using the current TSC
> scaling ratio. That used to be the same as L1's scaling ratio but now
> with nested TSC scaling support it is no longer the case.
> 
> This patch adds a new kvm_scale_tsc_l1() function that scales the TSC
> using L1's scaling ratio. The existing kvm_scale_tsc() can still be used
> for scaling L2 TSC values.
> 
> Additionally, this patch renames the kvm_compute_tsc_offset() function
> to kvm_compute_tsc_offset_l1() and has the function treat its TSC
> argument as an L1 TSC value. All existing code uses this function
> passing L1 values to it.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 41 ++++++++++++++++++++++-----------
>  2 files changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7dfc609eacd6..be59197e5eb7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1789,6 +1789,7 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
>  }
>  
>  u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
> +u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc);
>  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);

I don't really care which version is used, but we should be consistent, i.e. choose
kvm_<action>_tsc_l1 or kvm_<action>_tsc_l1, not both.  The easy choice is the
former since it's already there.

>  unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 07cf5d7ece38..84af1af7a2cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2319,18 +2319,30 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
>  }
>  EXPORT_SYMBOL_GPL(kvm_scale_tsc);
>  
> -static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
> +u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc)
> +{
> +	u64 _tsc = tsc;
> +	u64 ratio = vcpu->arch.l1_tsc_scaling_ratio;
> +
> +	if (ratio != kvm_default_tsc_scaling_ratio)
> +		_tsc = __scale_tsc(ratio, tsc);
> +
> +	return _tsc;
> +}

Just make the ratio a param.  This is complete copy+paste of kvm_scale_tsc(),
with 3 characters added.  And all of the callers are already in an L1-specific
function or have L1 vs. L2 awareness.  IMO, that makes the code less magical, too,
as I don't have to dive into a helper to see that it reads l1_tsc_scaling_ratio
versus tsc_scaling_ratio.

> +EXPORT_SYMBOL_GPL(kvm_scale_tsc_l1);
> +
> +static u64 kvm_compute_tsc_offset_l1(struct kvm_vcpu *vcpu, u64 target_tsc)
>  {
>  	u64 tsc;
>  
> -	tsc = kvm_scale_tsc(vcpu, rdtsc());
> +	tsc = kvm_scale_tsc_l1(vcpu, rdtsc());
>  
>  	return target_tsc - tsc;
>  }
>  
>  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
>  {
> -	return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
> +	return vcpu->arch.l1_tsc_offset + kvm_scale_tsc_l1(vcpu, host_tsc);
>  }
>  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
>  
> @@ -2363,7 +2375,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
>  	bool synchronizing = false;
>  
>  	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
> -	offset = kvm_compute_tsc_offset(vcpu, data);
> +	offset = kvm_compute_tsc_offset_l1(vcpu, data);
>  	ns = get_kvmclock_base_ns();
>  	elapsed = ns - kvm->arch.last_tsc_nsec;
>  
> @@ -2402,7 +2414,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
>  		} else {
>  			u64 delta = nsec_to_cycles(vcpu, elapsed);
>  			data += delta;
> -			offset = kvm_compute_tsc_offset(vcpu, data);
> +			offset = kvm_compute_tsc_offset_l1(vcpu, data);
>  		}
>  		matched = true;
>  		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> @@ -2463,7 +2475,7 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
>  {
>  	if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
>  		WARN_ON(adjustment < 0);
> -	adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
> +	adjustment = kvm_scale_tsc_l1(vcpu, (u64) adjustment);
>  	adjust_tsc_offset_guest(vcpu, adjustment);
>  }
>  
> @@ -2846,7 +2858,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  	/* With all the info we got, fill in the values */
>  
>  	if (kvm_has_tsc_control)
> -		tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz);
> +		tgt_tsc_khz = kvm_scale_tsc_l1(v, tgt_tsc_khz);
>  
>  	if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
>  		kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
> @@ -3235,7 +3247,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if (msr_info->host_initiated) {
>  			kvm_synchronize_tsc(vcpu, data);
>  		} else {
> -			u64 adj = kvm_compute_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
> +			u64 adj = kvm_compute_tsc_offset_l1(vcpu, data) - vcpu->arch.l1_tsc_offset;
>  			adjust_tsc_offset_guest(vcpu, adj);
>  			vcpu->arch.ia32_tsc_adjust_msr += adj;
>  		}
> @@ -3537,10 +3549,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * return L1's TSC value to ensure backwards-compatible
>  		 * behavior for migration.
>  		 */
> -		u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> -							    vcpu->arch.tsc_offset;
> -
> -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> +		if (msr_info->host_initiated) {

Unnecessary curly braces.

> +			msr_info->data = kvm_scale_tsc_l1(vcpu, rdtsc()) +
> +					 vcpu->arch.l1_tsc_offset;
> +		} else {
> +			msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) +
> +					 vcpu->arch.tsc_offset;
> +		}
>  		break;
>  	}
>  	case MSR_MTRRcap:
> @@ -4123,7 +4138,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  			mark_tsc_unstable("KVM discovered backwards TSC");
>  
>  		if (kvm_check_tsc_unstable()) {
> -			u64 offset = kvm_compute_tsc_offset(vcpu,
> +			u64 offset = kvm_compute_tsc_offset_l1(vcpu,
>  						vcpu->arch.last_guest_tsc);
>  			kvm_vcpu_write_tsc_offset(vcpu, offset);
>  			vcpu->arch.tsc_catchup = 1;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier
  2021-05-12 15:09 ` [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier Ilias Stamatis
@ 2021-05-18 23:21   ` Sean Christopherson
  2021-05-19 10:15     ` Stamatis, Ilias
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-05-18 23:21 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Wed, May 12, 2021, Ilias Stamatis wrote:
> When L2 is entered we need to "merge" the TSC multiplier and TSC offset
> values of 01 and 12 together.
> 
> The merging is done using the following equations:
>   offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
>   mult_02 = (mult_01 * mult_12) >> shift_bits
> 
> Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/x86.c              | 29 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4c4a3fefff57..57a25d8e8b0f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1793,6 +1793,8 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
>  u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
>  u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc);
>  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
> +void kvm_set_02_tsc_offset(struct kvm_vcpu *vcpu);
> +void kvm_set_02_tsc_multiplier(struct kvm_vcpu *vcpu);
>  
>  unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
>  bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 84af1af7a2cc..1db6cfc2079f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2346,6 +2346,35 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
>  }
>  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
>  
> +void kvm_set_02_tsc_offset(struct kvm_vcpu *vcpu)

I dislike like the "02" nomenclature.  "02" is used specifically to refer to
vmcs02 and vmcb02, whereas these helpers touch KVM's software model, not the CPU
struct.  Can't this simply be "l2"?

> +{
> +	u64 l2_offset = static_call(kvm_x86_get_l2_tsc_offset)(vcpu);
> +	u64 l2_multiplier = static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu);
> +
> +	if (l2_multiplier != kvm_default_tsc_scaling_ratio) {
> +		vcpu->arch.tsc_offset = mul_s64_u64_shr(
> +				(s64) vcpu->arch.l1_tsc_offset,
> +				l2_multiplier,
> +				kvm_tsc_scaling_ratio_frac_bits);
> +	}
> +
> +	vcpu->arch.tsc_offset += l2_offset;
> +}
> +EXPORT_SYMBOL_GPL(kvm_set_02_tsc_offset);
> +
> +void kvm_set_02_tsc_multiplier(struct kvm_vcpu *vcpu)

I normally like splitting patches gratuitously, but in this case I think it would
be better to combine this with the VMX usage in patch 08.  It's impossible to
properly review this patch without looking at its callers.

> +{
> +	u64 l2_multiplier = static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu);

Case in point, calling back into vendor code to get the L2 multiplier is silly,
just have the caller provide it explicitly.

> +	if (l2_multiplier != kvm_default_tsc_scaling_ratio) {

Why does this check against the default ratio instead of L1's ratio?  If L1 is
running a non-default ratio, but L2 is running a default ratio, won't this result
in KVM leaving vcpu->arch.tsc_scaling_ratio at L1's ratio?  Or is there scaling
ratio magic I don't understand (which is likely...)?  If there's magic, can you
add a comment?

Same feedback for the check in the offset version.

> +		vcpu->arch.tsc_scaling_ratio = mul_u64_u64_shr(
> +				vcpu->arch.l1_tsc_scaling_ratio,
> +				l2_multiplier,
> +				kvm_tsc_scaling_ratio_frac_bits);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
> +
>  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	vcpu->arch.l1_tsc_offset = offset;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-12 15:09 ` [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
@ 2021-05-19  0:05   ` Sean Christopherson
  2021-05-19 11:45     ` Stamatis, Ilias
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-05-19  0:05 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Wed, May 12, 2021, Ilias Stamatis wrote:
> The write_l1_tsc_offset() callback has a misleading name. It does not
> set L1's TSC offset, it rather updates the current TSC offset which
> might be different if a nested guest is executing. Additionally, both
> the vmx and svm implementations use the same logic for calculating the
> current TSC before writing it to hardware.

I don't disagree, but the current name as the advantage of clarifying (well,
hinting) that the offset is L1's offset.  That hint is lost in this refactoring.
Maybe rename "u64 offset" to "u64 l1_tsc_offset"?

> This patch renames the function and moves the common logic to the

Use imperative mood instead of "This patch.  From 
Documentation/process/submitting-patches.rst:

  Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour.

> caller. The vmx/svm-specific code now merely sets the given offset to
> the corresponding hardware structure.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  2 +-
>  arch/x86/include/asm/kvm_host.h    |  3 +--
>  arch/x86/kvm/svm/svm.c             | 21 ++++-----------------
>  arch/x86/kvm/vmx/vmx.c             | 23 +++--------------------
>  arch/x86/kvm/x86.c                 | 17 ++++++++++++++++-
>  5 files changed, 25 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 2063616fba1c..029c9615378f 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -89,7 +89,7 @@ KVM_X86_OP(load_mmu_pgd)
>  KVM_X86_OP_NULL(has_wbinvd_exit)
>  KVM_X86_OP(get_l2_tsc_offset)
>  KVM_X86_OP(get_l2_tsc_multiplier)
> -KVM_X86_OP(write_l1_tsc_offset)
> +KVM_X86_OP(write_tsc_offset)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
>  KVM_X86_OP(handle_exit_irqoff)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 57a25d8e8b0f..61cf201c001a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1307,8 +1307,7 @@ struct kvm_x86_ops {
>  
>  	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
>  	u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
> -	/* Returns actual tsc_offset set in active VMCS */
> -	u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> +	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>  
>  	/*
>  	 * Retrieve somewhat arbitrary exit information.  Intended to be used
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 679b2fc1a3f9..b18f60463073 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1094,26 +1094,13 @@ static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
>  	return kvm_default_tsc_scaling_ratio;
>  }
>  
> -static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> +static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> -	u64 g_tsc_offset = 0;
> -
> -	if (is_guest_mode(vcpu)) {
> -		/* Write L1's TSC offset.  */
> -		g_tsc_offset = svm->vmcb->control.tsc_offset -
> -			       svm->vmcb01.ptr->control.tsc_offset;
> -		svm->vmcb01.ptr->control.tsc_offset = offset;
> -	}
> -
> -	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> -				   svm->vmcb->control.tsc_offset - g_tsc_offset,
> -				   offset);
> -
> -	svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
>  
> +	svm->vmcb01.ptr->control.tsc_offset = vcpu->arch.l1_tsc_offset;
> +	svm->vmcb->control.tsc_offset = offset;
>  	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
> -	return svm->vmcb->control.tsc_offset;
>  }
>  
>  /* Evaluate instruction intercepts that depend on guest CPUID features. */
> @@ -4540,7 +4527,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  
>  	.get_l2_tsc_offset = svm_get_l2_tsc_offset,
>  	.get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
> -	.write_l1_tsc_offset = svm_write_l1_tsc_offset,
> +	.write_tsc_offset = svm_write_tsc_offset,
>  
>  	.load_mmu_pgd = svm_load_mmu_pgd,
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 575e13bddda8..3c4eb14a1e86 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1810,26 +1810,9 @@ static u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
>  	return multiplier;
>  }
>  
> -static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> +static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
> -	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> -	u64 g_tsc_offset = 0;
> -
> -	/*
> -	 * We're here if L1 chose not to trap WRMSR to TSC. According
> -	 * to the spec, this should set L1's TSC; The offset that L1
> -	 * set for L2 remains unchanged, and still needs to be added
> -	 * to the newly set TSC to get L2's TSC.
> -	 */
> -	if (is_guest_mode(vcpu) &&
> -	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
> -		g_tsc_offset = vmcs12->tsc_offset;
> -
> -	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> -				   vcpu->arch.tsc_offset - g_tsc_offset,
> -				   offset);
> -	vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
> -	return offset + g_tsc_offset;
> +	vmcs_write64(TSC_OFFSET, offset);
>  }
>  
>  /*
> @@ -7725,7 +7708,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>  
>  	.get_l2_tsc_offset = vmx_get_l2_tsc_offset,
>  	.get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
> -	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
> +	.write_tsc_offset = vmx_write_tsc_offset,
>  
>  	.load_mmu_pgd = vmx_load_mmu_pgd,
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1db6cfc2079f..f3ba1be4d5b9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2377,8 +2377,23 @@ EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
>  
>  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
> +	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> +				   vcpu->arch.l1_tsc_offset,
> +				   offset);
> +
>  	vcpu->arch.l1_tsc_offset = offset;
> -	vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
> +	vcpu->arch.tsc_offset = offset;
> +
> +	if (is_guest_mode(vcpu)) {

Unnecessary curly braces.

> +		/*
> +		 * We're here if L1 chose not to trap WRMSR to TSC and
> +		 * according to the spec this should set L1's TSC (as opposed
> +		 * to setting L1's offset for L2).
> +		 */

While we're shuffling code, can we improve this comment?  It works for the WRMSR
case, but makes no sense in the context of host TSC adjustments.  It's not at all
clear to me that it's even correct or relevant in those cases.

> +		kvm_set_02_tsc_offset(vcpu);

I really dislike that kvm_set_02_tsc_offset() consumes a bunch of variables
_and_ sets arch.tsc_offset, e.g. it's not at all obvious that moving this call
around will break L2.  Even more bizarre is that arch.tsc_offset is conditionally
consumed.  Oh, and kvm_set_02_tsc_offset() is not idempotent since it can do a
RMW on arch.tsc_offset.

The below static_call() dependency doesn't appear to be easily solved, but we
can at least strongly hint that vcpu->arch.tsc_offset is set.  For everything
else, I think we can clean things up by doing this (with the vendor calls
providing the L2 variables directly):

void kvm_set_l2_tsc_offset(struct kvm_vcpu *vcpu, u64 l2_offset,
			   u64 l2_multiplier)
{
	u64 l1_offset = vcpu->arch.l1_tsc_offset;

	if (l2_multiplier != kvm_default_tsc_scaling_ratio)
		l2_offset += mul_s64_u64_shr((s64)l1_tsc_offset, l2_multiplier,
					     kvm_tsc_scaling_ratio_frac_bits);

	vcpu->arch.tsc_offset = l2_offset;
}
EXPORT_SYMBOL_GPL(kvm_get_l2_tsc_offset);

static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
{
	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
				   vcpu->arch.l1_tsc_offset,
				   offset);

	vcpu->arch.l1_tsc_offset = offset;

	if (is_guest_mode(vcpu))
		kvm_set_l2_tsc_offset(vcpu,
				      static_call(kvm_x86_get_l2_tsc_offset)(vcpu),
				      static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
	else
		vcpu->arch.tsc_offset = offset;

	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
}


An alternative would be to explicitly track L1 and L2, and _never_ track the
current TSC values.  I.e. always compute the correct value on the fly.  I think
the only hot path is the TSC deadline timer, and AFAICT that always runs with
L1's timer.  Speaking of which, at the end of this series, vmx_set_hv_timer()
uses L1's TSC but the current scaling ratio; that can't be right.

> +	}
> +
> +	static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
>  }
>  
>  static inline bool kvm_check_tsc_unstable(void)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  2021-05-12 15:09 ` [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
@ 2021-05-19  0:07   ` Sean Christopherson
  2021-05-19 11:55     ` Stamatis, Ilias
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-05-19  0:07 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, linux-kernel, pbonzini, mlevitsk, vkuznets, wanpengli,
	jmattson, joro, zamsden, mtosatti, dwmw

On Wed, May 12, 2021, Ilias Stamatis wrote:
> Now that nested TSC scaling is supported we need to calculate the
> correct 02 values for both the offset and the multiplier using the
> corresponding functions. On L2's exit the L1 values are restored.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..f1dff1ebaccb 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3354,8 +3354,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  	}
>  
>  	enter_guest_mode(vcpu);
> -	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> -		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
> +
> +	kvm_set_02_tsc_offset(vcpu);
> +	kvm_set_02_tsc_multiplier(vcpu);

Please move this code into prepare_vmcs02() to co-locate it with the relevant
vmcs02 logic.  If there's something in prepare_vmcs02() that consumes
vcpu->arch.tsc_offset() other than the obvious VMWRITE, I vote to move things
around to fix that rather than create this weird split.

>  	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
>  		exit_reason.basic = EXIT_REASON_INVALID_STATE;
> @@ -4463,8 +4464,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  	if (nested_cpu_has_preemption_timer(vmcs12))
>  		hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
>  
> -	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> -		vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
> +	if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING) {
> +		vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
> +
> +		if (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING)
> +			vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
> +	}
>  
>  	if (likely(!vmx->fail)) {
>  		sync_vmcs02_to_vmcs12(vcpu, vmcs12);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch'
  2021-05-18 22:52   ` Sean Christopherson
@ 2021-05-19  8:54     ` Stamatis, Ilias
  0 siblings, 0 replies; 25+ messages in thread
From: Stamatis, Ilias @ 2021-05-19  8:54 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Tue, 2021-05-18 at 22:52 +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Ilias Stamatis wrote:
> > Store L1's scaling ratio in that struct like we already do for L1's TSC
> 
> s/that struct/kvm_vcpu_arch.  Forcing the reader to look at the subject to
> understand the changelog is annoying, especially when it saves all of a handful
> of characters.  E.g. I often read patches without the subject in scope.
> 
> > offset. This allows for easy save/restore when we enter and then exit
> > the nested guest.
> > 
> > Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> 
> ...
> 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9b6bca616929..07cf5d7ece38 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2185,6 +2185,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
> > 
> >       /* Guest TSC same frequency as host TSC? */
> >       if (!scale) {
> > +             vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> >               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> 
> Looks like these are always set as a pair, maybe add a helper, e.g.
> 
> static void kvm_set_l1_tsc_scaling_ratio(u64 ratio)
> {
>         vcpu->arch.l1_tsc_scaling_ratio = ratio;
>         vcpu->arch.tsc_scaling_ratio = ratio;
> }
> 

Hmm, they are not *always* set as a pair. Plus wouldn't this name be a bit
misleading suggesting that L1's scaling ratio is updated but implicitly
changing the current ratio too? 

I'm not sure a helper function would add much
value here.

> >               return 0;
> >       }
> > @@ -2211,7 +2212,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
> >               return -1;
> >       }
> > 
> > -     vcpu->arch.tsc_scaling_ratio = ratio;
> > +     vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
> >       return 0;
> >  }
> > 
> > @@ -2223,6 +2224,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
> >       /* tsc_khz can be zero if TSC calibration fails */
> >       if (user_tsc_khz == 0) {
> >               /* set tsc_scaling_ratio to a safe value */
> > +             vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> >               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> >               return -1;
> >       }
> > @@ -2459,7 +2461,7 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
> > 
> >  static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
> >  {
> > -     if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
> > +     if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
> >               WARN_ON(adjustment < 0);
> >       adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
> >       adjust_tsc_offset_guest(vcpu, adjustment);
> > --
> > 2.17.1
> > 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1()
  2021-05-18 23:04   ` Sean Christopherson
@ 2021-05-19  9:02     ` Stamatis, Ilias
  2021-05-19 15:40       ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Stamatis, Ilias @ 2021-05-19  9:02 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Tue, 2021-05-18 at 23:04 +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Ilias Stamatis wrote:
> > The existing kvm_scale_tsc() scales the TSC using the current TSC
> > scaling ratio. That used to be the same as L1's scaling ratio but now
> > with nested TSC scaling support it is no longer the case.
> > 
> > This patch adds a new kvm_scale_tsc_l1() function that scales the TSC
> > using L1's scaling ratio. The existing kvm_scale_tsc() can still be used
> > for scaling L2 TSC values.
> > 
> > Additionally, this patch renames the kvm_compute_tsc_offset() function
> > to kvm_compute_tsc_offset_l1() and has the function treat its TSC
> > argument as an L1 TSC value. All existing code uses this function
> > passing L1 values to it.
> > 
> > Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/x86.c              | 41 ++++++++++++++++++++++-----------
> >  2 files changed, 29 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 7dfc609eacd6..be59197e5eb7 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1789,6 +1789,7 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
> >  }
> > 
> >  u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
> > +u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc);
> >  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
> 
> I don't really care which version is used, but we should be consistent, i.e. choose
> kvm_<action>_tsc_l1 or kvm_<action>_tsc_l1, not both.  The easy choice is the
> former since it's already there.

OK

> 
> >  unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 07cf5d7ece38..84af1af7a2cc 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2319,18 +2319,30 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_scale_tsc);
> > 
> > -static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
> > +u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc)
> > +{
> > +     u64 _tsc = tsc;
> > +     u64 ratio = vcpu->arch.l1_tsc_scaling_ratio;
> > +
> > +     if (ratio != kvm_default_tsc_scaling_ratio)
> > +             _tsc = __scale_tsc(ratio, tsc);
> > +
> > +     return _tsc;
> > +}
> 
> Just make the ratio a param.  This is complete copy+paste of kvm_scale_tsc(),
> with 3 characters added.  And all of the callers are already in an L1-specific
> function or have L1 vs. L2 awareness.  IMO, that makes the code less magical, too,
> as I don't have to dive into a helper to see that it reads l1_tsc_scaling_ratio
> versus tsc_scaling_ratio.
> 

That's how I did it initially but changed it into a separate function after
receiving feedback on v1. I'm neutral, I don't mind changing it back.

More
opinions?

> > +EXPORT_SYMBOL_GPL(kvm_scale_tsc_l1);
> > +
> > +static u64 kvm_compute_tsc_offset_l1(struct kvm_vcpu *vcpu, u64 target_tsc)
> >  {
> >       u64 tsc;
> > 
> > -     tsc = kvm_scale_tsc(vcpu, rdtsc());
> > +     tsc = kvm_scale_tsc_l1(vcpu, rdtsc());
> > 
> >       return target_tsc - tsc;
> >  }
> > 
> >  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
> >  {
> > -     return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
> > +     return vcpu->arch.l1_tsc_offset + kvm_scale_tsc_l1(vcpu, host_tsc);
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
> > 
> > @@ -2363,7 +2375,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
> >       bool synchronizing = false;
> > 
> >       raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
> > -     offset = kvm_compute_tsc_offset(vcpu, data);
> > +     offset = kvm_compute_tsc_offset_l1(vcpu, data);
> >       ns = get_kvmclock_base_ns();
> >       elapsed = ns - kvm->arch.last_tsc_nsec;
> > 
> > @@ -2402,7 +2414,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
> >               } else {
> >                       u64 delta = nsec_to_cycles(vcpu, elapsed);
> >                       data += delta;
> > -                     offset = kvm_compute_tsc_offset(vcpu, data);
> > +                     offset = kvm_compute_tsc_offset_l1(vcpu, data);
> >               }
> >               matched = true;
> >               already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> > @@ -2463,7 +2475,7 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
> >  {
> >       if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
> >               WARN_ON(adjustment < 0);
> > -     adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
> > +     adjustment = kvm_scale_tsc_l1(vcpu, (u64) adjustment);
> >       adjust_tsc_offset_guest(vcpu, adjustment);
> >  }
> > 
> > @@ -2846,7 +2858,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >       /* With all the info we got, fill in the values */
> > 
> >       if (kvm_has_tsc_control)
> > -             tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz);
> > +             tgt_tsc_khz = kvm_scale_tsc_l1(v, tgt_tsc_khz);
> > 
> >       if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
> >               kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
> > @@ -3235,7 +3247,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >               if (msr_info->host_initiated) {
> >                       kvm_synchronize_tsc(vcpu, data);
> >               } else {
> > -                     u64 adj = kvm_compute_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
> > +                     u64 adj = kvm_compute_tsc_offset_l1(vcpu, data) - vcpu->arch.l1_tsc_offset;
> >                       adjust_tsc_offset_guest(vcpu, adj);
> >                       vcpu->arch.ia32_tsc_adjust_msr += adj;
> >               }
> > @@ -3537,10 +3549,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                * return L1's TSC value to ensure backwards-compatible
> >                * behavior for migration.
> >                */
> > -             u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> > -                                                         vcpu->arch.tsc_offset;
> > -
> > -             msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> > +             if (msr_info->host_initiated) {
> 
> Unnecessary curly braces.
> 
> > +                     msr_info->data = kvm_scale_tsc_l1(vcpu, rdtsc()) +
> > +                                      vcpu->arch.l1_tsc_offset;
> > +             } else {
> > +                     msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) +
> > +                                      vcpu->arch.tsc_offset;
> > +             }
> >               break;
> >       }
> >       case MSR_MTRRcap:
> > @@ -4123,7 +4138,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >                       mark_tsc_unstable("KVM discovered backwards TSC");
> > 
> >               if (kvm_check_tsc_unstable()) {
> > -                     u64 offset = kvm_compute_tsc_offset(vcpu,
> > +                     u64 offset = kvm_compute_tsc_offset_l1(vcpu,
> >                                               vcpu->arch.last_guest_tsc);
> >                       kvm_vcpu_write_tsc_offset(vcpu, offset);
> >                       vcpu->arch.tsc_catchup = 1;
> > --
> > 2.17.1
> > 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier
  2021-05-18 23:21   ` Sean Christopherson
@ 2021-05-19 10:15     ` Stamatis, Ilias
  0 siblings, 0 replies; 25+ messages in thread
From: Stamatis, Ilias @ 2021-05-19 10:15 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Tue, 2021-05-18 at 23:21 +0000, Sean Christopherson wrote:
> 
> On Wed, May 12, 2021, Ilias Stamatis wrote:
> > When L2 is entered we need to "merge" the TSC multiplier and TSC offset
> > values of 01 and 12 together.
> > 
> > The merging is done using the following equations:
> >   offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
> >   mult_02 = (mult_01 * mult_12) >> shift_bits
> > 
> > Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.
> > 
> > Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  2 ++
> >  arch/x86/kvm/x86.c              | 29 +++++++++++++++++++++++++++++
> >  2 files changed, 31 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 4c4a3fefff57..57a25d8e8b0f 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1793,6 +1793,8 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
> >  u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
> >  u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc);
> >  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
> > +void kvm_set_02_tsc_offset(struct kvm_vcpu *vcpu);
> > +void kvm_set_02_tsc_multiplier(struct kvm_vcpu *vcpu);
> > 
> >  unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
> >  bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 84af1af7a2cc..1db6cfc2079f 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2346,6 +2346,35 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
> > 
> > +void kvm_set_02_tsc_offset(struct kvm_vcpu *vcpu)
> 
> I dislike like the "02" nomenclature.  "02" is used specifically to refer to
> vmcs02 and vmcb02, whereas these helpers touch KVM's software model, not the CPU
> struct.  Can't this simply be "l2"?
> 
> > +{
> > +     u64 l2_offset = static_call(kvm_x86_get_l2_tsc_offset)(vcpu);
> > +     u64 l2_multiplier = static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu);
> > +
> > +     if (l2_multiplier != kvm_default_tsc_scaling_ratio) {
> > +             vcpu->arch.tsc_offset = mul_s64_u64_shr(
> > +                             (s64) vcpu->arch.l1_tsc_offset,
> > +                             l2_multiplier,
> > +                             kvm_tsc_scaling_ratio_frac_bits);
> > +     }
> > +
> > +     vcpu->arch.tsc_offset += l2_offset;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_set_02_tsc_offset);
> > +
> > +void kvm_set_02_tsc_multiplier(struct kvm_vcpu *vcpu)
> 
> I normally like splitting patches gratuitously, but in this case I think it would
> be better to combine this with the VMX usage in patch 08.  It's impossible to
> properly review this patch without looking at its callers.
> 
> > +{
> > +     u64 l2_multiplier = static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu);
> 
> Case in point, calling back into vendor code to get the L2 multiplier is silly,
> just have the caller provide it explicitly.
> 
> > +     if (l2_multiplier != kvm_default_tsc_scaling_ratio) {
> 
> Why does this check against the default ratio instead of L1's ratio?  If L1 is
> running a non-default ratio, but L2 is running a default ratio, won't this result
> in KVM leaving vcpu->arch.tsc_scaling_ratio at L1's ratio?  Or is there scaling
> ratio magic I don't understand (which is likely...)?  If there's magic, can you
> add a comment?
> 

Think of the "default ratio" as a ratio of 1, ie L2 is not scaled (from L1's
perspective). So yes, as you say if L1 is running at a non-default ratio, but
L2 is running at default ratio (not scaled), this results in KVM leaving 
arch.tsc_scaling_ratio at L1's ratio (as it should). 

I am not sure a comment is needed here. 

Having said that, theoretically we could omit this check completely and still
get the correct result. But in reality because of the computer math involved
there will be a small precision error and the tsc_scaling_ratio ratio won't
end up being exactly the same as the l1_tsc_scaling_ratio. 

I will implement the rest of your feedback, thanks.

> 
> Same feedback for the check in the offset version.
> 
> > +             vcpu->arch.tsc_scaling_ratio = mul_u64_u64_shr(
> > +                             vcpu->arch.l1_tsc_scaling_ratio,
> > +                             l2_multiplier,
> > +                             kvm_tsc_scaling_ratio_frac_bits);
> > +     }
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
> > +
> >  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> >  {
> >       vcpu->arch.l1_tsc_offset = offset;
> > --
> > 2.17.1
> > 






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-19  0:05   ` Sean Christopherson
@ 2021-05-19 11:45     ` Stamatis, Ilias
  2021-05-19 15:49       ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Stamatis, Ilias @ 2021-05-19 11:45 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Wed, 2021-05-19 at 00:05 +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Ilias Stamatis wrote:
> > The write_l1_tsc_offset() callback has a misleading name. It does not
> > set L1's TSC offset, it rather updates the current TSC offset which
> > might be different if a nested guest is executing. Additionally, both
> > the vmx and svm implementations use the same logic for calculating the
> > current TSC before writing it to hardware.
> 
> I don't disagree, but the current name as the advantage of clarifying (well,
> hinting) that the offset is L1's offset.  That hint is lost in this refactoring.
> Maybe rename "u64 offset" to "u64 l1_tsc_offset"?
> 
> > This patch renames the function and moves the common logic to the
> 
> Use imperative mood instead of "This patch.  From
> Documentation/process/submitting-patches.rst:
> 
>   Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>   instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>   to do frotz", as if you are giving orders to the codebase to change
>   its behaviour.
> 
> > caller. The vmx/svm-specific code now merely sets the given offset to
> > the corresponding hardware structure.
> > 
> > Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> > ---
> >  arch/x86/include/asm/kvm-x86-ops.h |  2 +-
> >  arch/x86/include/asm/kvm_host.h    |  3 +--
> >  arch/x86/kvm/svm/svm.c             | 21 ++++-----------------
> >  arch/x86/kvm/vmx/vmx.c             | 23 +++--------------------
> >  arch/x86/kvm/x86.c                 | 17 ++++++++++++++++-
> >  5 files changed, 25 insertions(+), 41 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index 2063616fba1c..029c9615378f 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -89,7 +89,7 @@ KVM_X86_OP(load_mmu_pgd)
> >  KVM_X86_OP_NULL(has_wbinvd_exit)
> >  KVM_X86_OP(get_l2_tsc_offset)
> >  KVM_X86_OP(get_l2_tsc_multiplier)
> > -KVM_X86_OP(write_l1_tsc_offset)
> > +KVM_X86_OP(write_tsc_offset)
> >  KVM_X86_OP(get_exit_info)
> >  KVM_X86_OP(check_intercept)
> >  KVM_X86_OP(handle_exit_irqoff)
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 57a25d8e8b0f..61cf201c001a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1307,8 +1307,7 @@ struct kvm_x86_ops {
> > 
> >       u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
> >       u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
> > -     /* Returns actual tsc_offset set in active VMCS */
> > -     u64 (*write_l1_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> > +     void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> > 
> >       /*
> >        * Retrieve somewhat arbitrary exit information.  Intended to be used
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 679b2fc1a3f9..b18f60463073 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1094,26 +1094,13 @@ static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
> >       return kvm_default_tsc_scaling_ratio;
> >  }
> > 
> > -static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > +static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> >  {
> >       struct vcpu_svm *svm = to_svm(vcpu);
> > -     u64 g_tsc_offset = 0;
> > -
> > -     if (is_guest_mode(vcpu)) {
> > -             /* Write L1's TSC offset.  */
> > -             g_tsc_offset = svm->vmcb->control.tsc_offset -
> > -                            svm->vmcb01.ptr->control.tsc_offset;
> > -             svm->vmcb01.ptr->control.tsc_offset = offset;
> > -     }
> > -
> > -     trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> > -                                svm->vmcb->control.tsc_offset - g_tsc_offset,
> > -                                offset);
> > -
> > -     svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
> > 
> > +     svm->vmcb01.ptr->control.tsc_offset = vcpu->arch.l1_tsc_offset;
> > +     svm->vmcb->control.tsc_offset = offset;
> >       vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
> > -     return svm->vmcb->control.tsc_offset;
> >  }
> > 
> >  /* Evaluate instruction intercepts that depend on guest CPUID features. */
> > @@ -4540,7 +4527,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> > 
> >       .get_l2_tsc_offset = svm_get_l2_tsc_offset,
> >       .get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
> > -     .write_l1_tsc_offset = svm_write_l1_tsc_offset,
> > +     .write_tsc_offset = svm_write_tsc_offset,
> > 
> >       .load_mmu_pgd = svm_load_mmu_pgd,
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 575e13bddda8..3c4eb14a1e86 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1810,26 +1810,9 @@ static u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
> >       return multiplier;
> >  }
> > 
> > -static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > +static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> >  {
> > -     struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> > -     u64 g_tsc_offset = 0;
> > -
> > -     /*
> > -      * We're here if L1 chose not to trap WRMSR to TSC. According
> > -      * to the spec, this should set L1's TSC; The offset that L1
> > -      * set for L2 remains unchanged, and still needs to be added
> > -      * to the newly set TSC to get L2's TSC.
> > -      */
> > -     if (is_guest_mode(vcpu) &&
> > -         (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
> > -             g_tsc_offset = vmcs12->tsc_offset;
> > -
> > -     trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> > -                                vcpu->arch.tsc_offset - g_tsc_offset,
> > -                                offset);
> > -     vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
> > -     return offset + g_tsc_offset;
> > +     vmcs_write64(TSC_OFFSET, offset);
> >  }
> > 
> >  /*
> > @@ -7725,7 +7708,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> > 
> >       .get_l2_tsc_offset = vmx_get_l2_tsc_offset,
> >       .get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier,
> > -     .write_l1_tsc_offset = vmx_write_l1_tsc_offset,
> > +     .write_tsc_offset = vmx_write_tsc_offset,
> > 
> >       .load_mmu_pgd = vmx_load_mmu_pgd,
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1db6cfc2079f..f3ba1be4d5b9 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2377,8 +2377,23 @@ EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
> > 
> >  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> >  {
> > +     trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> > +                                vcpu->arch.l1_tsc_offset,
> > +                                offset);
> > +
> >       vcpu->arch.l1_tsc_offset = offset;
> > -     vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
> > +     vcpu->arch.tsc_offset = offset;
> > +
> > +     if (is_guest_mode(vcpu)) {
> 
> Unnecessary curly braces.

Really? We are supposed to have a 6-lines body without brackets? I'm not
opposing, I'm just surprised that that's the coding standard.

> 
> > +             /*
> > +              * We're here if L1 chose not to trap WRMSR to TSC and
> > +              * according to the spec this should set L1's TSC (as opposed
> > +              * to setting L1's offset for L2).
> > +              */
> 
> While we're shuffling code, can we improve this comment?  It works for the WRMSR
> case, but makes no sense in the context of host TSC adjustments.  It's not at all
> clear to me that it's even correct or relevant in those cases.
> 

Do you suggest removing it completely or how do you want it to be? I don't
mind deleting it.

> > +             kvm_set_02_tsc_offset(vcpu);
> 
> I really dislike that kvm_set_02_tsc_offset() consumes a bunch of variables
> _and_ sets arch.tsc_offset, e.g. it's not at all obvious that moving this call
> around will break L2.  Even more bizarre is that arch.tsc_offset is conditionally
> consumed.  Oh, and kvm_set_02_tsc_offset() is not idempotent since it can do a
> RMW on arch.tsc_offset.
> 
> The below static_call() dependency doesn't appear to be easily solved, but we
> can at least strongly hint that vcpu->arch.tsc_offset is set.  For everything
> else, I think we can clean things up by doing this (with the vendor calls
> providing the L2 variables directly):
> 
> void kvm_set_l2_tsc_offset(struct kvm_vcpu *vcpu, u64 l2_offset,
>                            u64 l2_multiplier)
> {
>         u64 l1_offset = vcpu->arch.l1_tsc_offset;
> 
>         if (l2_multiplier != kvm_default_tsc_scaling_ratio)
>                 l2_offset += mul_s64_u64_shr((s64)l1_tsc_offset, l2_multiplier,
>                                              kvm_tsc_scaling_ratio_frac_bits);
> 
>         vcpu->arch.tsc_offset = l2_offset;
> }
> EXPORT_SYMBOL_GPL(kvm_get_l2_tsc_offset);
> 
> static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> {
>         trace_kvm_write_tsc_offset(vcpu->vcpu_id,
>                                    vcpu->arch.l1_tsc_offset,
>                                    offset);
> 
>         vcpu->arch.l1_tsc_offset = offset;
> 
>         if (is_guest_mode(vcpu))
>                 kvm_set_l2_tsc_offset(vcpu,
>                                       static_call(kvm_x86_get_l2_tsc_offset)(vcpu),
>                                       static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
>         else
>                 vcpu->arch.tsc_offset = offset;
> 
>         static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
> }
> 

OK, I will change it to what you suggest above.

> 
> An alternative would be to explicitly track L1 and L2, and _never_ track the
> current TSC values.  I.e. always compute the correct value on the fly.  I think
> the only hot path is the TSC deadline timer, and AFAICT that always runs with
> L1's timer.  Speaking of which, at the end of this series, vmx_set_hv_timer()
> uses L1's TSC but the current scaling ratio; that can't be right.
> 

You are right, good catch, I will add this to patch 2.

I think let's leave the vcpu->arch.tsc_scaling_ratio variable as is for now.

> > +     }
> > +
> > +     static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
> >  }
> > 
> >  static inline bool kvm_check_tsc_unstable(void)
> > --
> > 2.17.1
> > 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit
  2021-05-19  0:07   ` Sean Christopherson
@ 2021-05-19 11:55     ` Stamatis, Ilias
  0 siblings, 0 replies; 25+ messages in thread
From: Stamatis, Ilias @ 2021-05-19 11:55 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Wed, 2021-05-19 at 00:07 +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Ilias Stamatis wrote:
> > Now that nested TSC scaling is supported we need to calculate the
> > correct 02 values for both the offset and the multiplier using the
> > corresponding functions. On L2's exit the L1 values are restored.
> > 
> > Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 6058a65a6ede..f1dff1ebaccb 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3354,8 +3354,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> >       }
> > 
> >       enter_guest_mode(vcpu);
> > -     if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> > -             vcpu->arch.tsc_offset += vmcs12->tsc_offset;
> > +
> > +     kvm_set_02_tsc_offset(vcpu);
> > +     kvm_set_02_tsc_multiplier(vcpu);
> 
> Please move this code into prepare_vmcs02() to co-locate it with the relevant
> vmcs02 logic.  If there's something in prepare_vmcs02() that consumes
> vcpu->arch.tsc_offset() other than the obvious VMWRITE, I vote to move things
> around to fix that rather than create this weird split.
> 

Agreed. It makes more sense.

> >       if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
> >               exit_reason.basic = EXIT_REASON_INVALID_STATE;
> > @@ -4463,8 +4464,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
> >       if (nested_cpu_has_preemption_timer(vmcs12))
> >               hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
> > 
> > -     if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> > -             vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
> > +     if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING) {
> > +             vcpu->arch.tsc_offset = vcpu->arch.l1_tsc_offset;
> > +
> > +             if (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING)
> > +                     vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
> > +     }

I guess these need to stay where they are though.

And thinking about it the two if conditions are unnecessary really.

Thanks for the reviews!

Ilias

> > 
> >       if (likely(!vmx->fail)) {
> >               sync_vmcs02_to_vmcs12(vcpu, vmcs12);
> > --
> > 2.17.1
> > 




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1()
  2021-05-19  9:02     ` Stamatis, Ilias
@ 2021-05-19 15:40       ` Sean Christopherson
  2021-05-20 18:27         ` Stamatis, Ilias
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-05-19 15:40 UTC (permalink / raw)
  To: Stamatis, Ilias
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Wed, May 19, 2021, Stamatis, Ilias wrote:
> On Tue, 2021-05-18 at 23:04 +0000, Sean Christopherson wrote:
> > On Wed, May 12, 2021, Ilias Stamatis wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 07cf5d7ece38..84af1af7a2cc 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2319,18 +2319,30 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_scale_tsc);
> > > 
> > > -static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
> > > +u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc)
> > > +{
> > > +     u64 _tsc = tsc;
> > > +     u64 ratio = vcpu->arch.l1_tsc_scaling_ratio;
> > > +
> > > +     if (ratio != kvm_default_tsc_scaling_ratio)
> > > +             _tsc = __scale_tsc(ratio, tsc);
> > > +
> > > +     return _tsc;
> > > +}
> > 
> > Just make the ratio a param.  This is complete copy+paste of kvm_scale_tsc(),
> > with 3 characters added.  And all of the callers are already in an L1-specific
> > function or have L1 vs. L2 awareness.  IMO, that makes the code less magical, too,
> > as I don't have to dive into a helper to see that it reads l1_tsc_scaling_ratio
> > versus tsc_scaling_ratio.
> > 
> 
> That's how I did it initially but changed it into a separate function after
> receiving feedback on v1. I'm neutral, I don't mind changing it back.

Ah, I see the conundrum.  The vendor code isn't straightforward because of all
the enabling checks against vmcs12 controls.

Given that, I don't terribly mind the callbacks, but I do think the connection
between the computation and the VMWRITE needs to be more explicit.

Poking around the code, the other thing that would help would be to get rid of
the awful decache_tsc_multiplier().  That helper was added to paper over the
completely broken logic of commit ff2c3a180377 ("KVM: VMX: Setup TSC scaling
ratio when a vcpu is loaded").  Its use in vmx_vcpu_load_vmcs() is basically
"write the VMCS if we forgot to earlier", which is all kinds of wrong.

If we get rid of that stupidity as prep work at the beginning of this series,
and have the "setters" return the computed value, the nested VMX code can
consume the value directly instead of having the subtle dependency on the helpers.

	vmcs_write64(TSC_OFFSET, kvm_calc_l2_tsc_offset(vcpu));

	if (kvm_has_tsc_control)
		vmcs_write64(TSC_MULTIPLIER, kvm_calc_l2_tsc_multiplier(vcpu));


Side topic, the checks against the vmcs12 controls are wrong.  Specifically,
when checking a secondary execution control, KVM needs to first check that the
secondary control is enabled in the primary control.  But, we helpers for that.
The primary control should use its helper, too.  And while you're at it, drop
the local variable in the getter.  I.e.:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3c4eb14a1e86..8735f2d71e17 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1801,13 +1801,12 @@ static u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
 static u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 {
        struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-       u64 multiplier = kvm_default_tsc_scaling_ratio;

-       if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING &&
-           vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING)
-               multiplier = vmcs12->tsc_multiplier;
+       if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING) &&
+           nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING))
+               return vmcs12->tsc_multiplier;

-       return multiplier;
+       return kvm_default_tsc_scaling_ratio;
 }

Side topic #2: I now see why the x86.c helpers skip the math if the multiplier
is kvm_default_tsc_scaling_ratio.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it
  2021-05-19 11:45     ` Stamatis, Ilias
@ 2021-05-19 15:49       ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-05-19 15:49 UTC (permalink / raw)
  To: Stamatis, Ilias
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets, joro,
	mtosatti, zamsden, pbonzini, mlevitsk, wanpengli

On Wed, May 19, 2021, Stamatis, Ilias wrote:
> On Wed, 2021-05-19 at 00:05 +0000, Sean Christopherson wrote:
> > On Wed, May 12, 2021, Ilias Stamatis wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 1db6cfc2079f..f3ba1be4d5b9 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2377,8 +2377,23 @@ EXPORT_SYMBOL_GPL(kvm_set_02_tsc_multiplier);
> > > 
> > >  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > >  {
> > > +     trace_kvm_write_tsc_offset(vcpu->vcpu_id,
> > > +                                vcpu->arch.l1_tsc_offset,
> > > +                                offset);
> > > +
> > >       vcpu->arch.l1_tsc_offset = offset;
> > > -     vcpu->arch.tsc_offset = static_call(kvm_x86_write_l1_tsc_offset)(vcpu, offset);
> > > +     vcpu->arch.tsc_offset = offset;
> > > +
> > > +     if (is_guest_mode(vcpu)) {
> > 
> > Unnecessary curly braces.
> 
> Really? We are supposed to have a 6-lines body without brackets? I'm not
> opposing, I'm just surprised that that's the coding standard.

Comments don't (technically) count.  I usually avoid the ambiguity by putting
the comment above the if statement.  That also helps with indentation, e.g.

	/*
	 * This is a comment.
	 */
	if (is_guest_mode(vcpu))
		kvm_set_02_tsc_offset(vcpu);

> > > +             /*
> > > +              * We're here if L1 chose not to trap WRMSR to TSC and
> > > +              * according to the spec this should set L1's TSC (as opposed
> > > +              * to setting L1's offset for L2).
> > > +              */
> > 
> > While we're shuffling code, can we improve this comment?  It works for the WRMSR
> > case, but makes no sense in the context of host TSC adjustments.  It's not at all
> > clear to me that it's even correct or relevant in those cases.
> > 
> 
> Do you suggest removing it completely or how do you want it to be? I don't
> mind deleting it.

Heh, I'd happily write the comment, except I have no idea what the logic is in
the non-WRMSR case.  I do think we need a comment, IMO none of paths that lead
to changing the TSC offset while L2 is active are obvious.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1()
  2021-05-19 15:40       ` Sean Christopherson
@ 2021-05-20 18:27         ` Stamatis, Ilias
  0 siblings, 0 replies; 25+ messages in thread
From: Stamatis, Ilias @ 2021-05-20 18:27 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-kernel, jmattson, Woodhouse, David, vkuznets,
	mtosatti, joro, zamsden, pbonzini, mlevitsk, wanpengli

On Wed, 2021-05-19 at 15:40 +0000, Sean Christopherson wrote:
> On Wed, May 19, 2021, Stamatis, Ilias wrote:
> > On Tue, 2021-05-18 at 23:04 +0000, Sean Christopherson wrote:
> > > On Wed, May 12, 2021, Ilias Stamatis wrote:
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 07cf5d7ece38..84af1af7a2cc 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -2319,18 +2319,30 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(kvm_scale_tsc);
> > > > 
> > > > -static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
> > > > +u64 kvm_scale_tsc_l1(struct kvm_vcpu *vcpu, u64 tsc)
> > > > +{
> > > > +     u64 _tsc = tsc;
> > > > +     u64 ratio = vcpu->arch.l1_tsc_scaling_ratio;
> > > > +
> > > > +     if (ratio != kvm_default_tsc_scaling_ratio)
> > > > +             _tsc = __scale_tsc(ratio, tsc);
> > > > +
> > > > +     return _tsc;
> > > > +}
> > > 
> > > Just make the ratio a param.  This is complete copy+paste of kvm_scale_tsc(),
> > > with 3 characters added.  And all of the callers are already in an L1-specific
> > > function or have L1 vs. L2 awareness.  IMO, that makes the code less magical, too,
> > > as I don't have to dive into a helper to see that it reads l1_tsc_scaling_ratio
> > > versus tsc_scaling_ratio.
> > > 
> > 
> > That's how I did it initially but changed it into a separate function after
> > receiving feedback on v1. I'm neutral, I don't mind changing it back.
> 
> Ah, I see the conundrum.  The vendor code isn't straightforward because of all
> the enabling checks against vmcs12 controls.
> 
> Given that, I don't terribly mind the callbacks, but I do think the connection
> between the computation and the VMWRITE needs to be more explicit.
> 
> Poking around the code, the other thing that would help would be to get rid of
> the awful decache_tsc_multiplier().  That helper was added to paper over the
> completely broken logic of commit ff2c3a180377 ("KVM: VMX: Setup TSC scaling
> ratio when a vcpu is loaded").  Its use in vmx_vcpu_load_vmcs() is basically
> "write the VMCS if we forgot to earlier", which is all kinds of wrong.
> 

I am going to add a patch that removes decache_tsc_multiplier() and 
vmx->current_tsc_ratio as the latter is useless since vcpu->arch.tsc_scaling_ratio 
is already the current ratio. And without it decache_tsc_multiplier() becomes
an one-liner that is pointless to have; we can do vmcs_write64() directly.

Nevertheless, I am not going to move the code outside of vmx_vcpu_load_vmcs().
Granted, a better place for setting the multiplier in hardware would be
set_tsc_khz(). But this function is inside x86.c so it would require yet
another vendor callback to be added, move the svm code too, etc, etc.

Much more refactoring can be done in KVM code in general but I don't think it
has to be part of this series. I am going to send the v3 patches tomorrow. 

Ilias


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-05-20 18:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 15:09 [PATCH v2 00/10] KVM: Implement nested TSC scaling Ilias Stamatis
2021-05-12 15:09 ` [PATCH v2 01/10] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
2021-05-18 22:36   ` Sean Christopherson
2021-05-12 15:09 ` [PATCH v2 02/10] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
2021-05-18 22:52   ` Sean Christopherson
2021-05-19  8:54     ` Stamatis, Ilias
2021-05-12 15:09 ` [PATCH v2 03/10] KVM: X86: Add kvm_scale_tsc_l1() and kvm_compute_tsc_offset_l1() Ilias Stamatis
2021-05-18 23:04   ` Sean Christopherson
2021-05-19  9:02     ` Stamatis, Ilias
2021-05-19 15:40       ` Sean Christopherson
2021-05-20 18:27         ` Stamatis, Ilias
2021-05-12 15:09 ` [PATCH v2 04/10] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
2021-05-12 15:09 ` [PATCH v2 05/10] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
2021-05-12 15:09 ` [PATCH v2 06/10] KVM: X86: Add functions that calculate the 02 TSC offset and multiplier Ilias Stamatis
2021-05-18 23:21   ` Sean Christopherson
2021-05-19 10:15     ` Stamatis, Ilias
2021-05-12 15:09 ` [PATCH v2 07/10] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
2021-05-19  0:05   ` Sean Christopherson
2021-05-19 11:45     ` Stamatis, Ilias
2021-05-19 15:49       ` Sean Christopherson
2021-05-12 15:09 ` [PATCH v2 08/10] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
2021-05-19  0:07   ` Sean Christopherson
2021-05-19 11:55     ` Stamatis, Ilias
2021-05-12 15:09 ` [PATCH v2 09/10] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
2021-05-12 15:09 ` [PATCH v2 10/10] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).