kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes
@ 2021-03-16 14:37 Vitaly Kuznetsov
  2021-03-16 14:37 ` [PATCH v2 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Changes since v1:
- Invalidate TSC page from kvm_gen_update_masterclock() instead of calling
 kvm_hv_setup_tsc_page() for all vCPUs [Paolo]
- Set hv->hv_tsc_page_status = HV_TSC_PAGE_UNSET when TSC page is disabled
 with MSR write. Check both HV_TSC_PAGE_BROKEN/HV_TSC_PAGE_UNSET states
 in kvm_hv_setup_tsc_page()/kvm_hv_invalidate_tsc_page().
- Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
 get_time_ref_counter().

Original description:

I'm investigating an issue when Linux guest on nested Hyper-V on KVM 
(WSL2 on Win10 on KVM to be precise) hangs after L1 KVM is migrated. Trace
shows us that L2 is trying to set L1's Synthetic Timer and reacting to
this Hyper-V sets Synthetic Timer in KVM but the target value it sets is
always slightly in the past, this causes the timer to expire immediately
and an interrupt storm is thus observed. L2 is not making much forward
progress.

The issue is only observed when re-enlightenment is exposed to L1. KVM
doesn't really support re-enlightenment notifications upon migration,
userspace is supposed to expose it only when TSC scaling is supported
on the destination host. Without re-enlightenment exposed, Hyper-V will
not expose stable TSC page clocksource to its L2s. The issue is observed
when migration happens between hosts supporting TSC scaling. Rumor has it
that it is possible to reproduce the problem even when migrating locally
to the same host, though, I wasn't really able to.

The current speculation is that when Hyper-V is migrated, it uses stale
(cached) TSC page values to compute the difference between its own
clocksource (provided by KVM) and its guests' TSC pages to program
synthetic timers and in some cases, when TSC page is updated, this puts all
stimer expirations in the past. This, in its turn, causes an interrupt
storms (both L0-L1 and L1->L2 as Hyper-V mirrors stimer expirations into
L2).

The proposed fix is to skip updating TSC page clocksource when guest opted
for re-enlightenment notifications (PATCH4). Patches 1-3 are slightly
related fixes to the (mostly theoretical) issues I've stumbled upon while
working on the problem.

Vitaly Kuznetsov (4):
  KVM: x86: hyper-v: Limit guest to writing zero to
    HV_X64_MSR_TSC_EMULATION_STATUS
  KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary
    CPUs
  KVM: x86: hyper-v: Track Hyper-V TSC page status
  KVM: x86: hyper-v: Don't touch TSC page values when guest opted for
    re-enlightenment

 arch/x86/include/asm/kvm_host.h | 10 ++++
 arch/x86/kvm/hyperv.c           | 91 +++++++++++++++++++++++++++++----
 arch/x86/kvm/hyperv.h           |  1 +
 arch/x86/kvm/x86.c              |  2 +
 4 files changed, 94 insertions(+), 10 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS
  2021-03-16 14:37 [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
@ 2021-03-16 14:37 ` Vitaly Kuznetsov
  2021-03-16 14:37 ` [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

HV_X64_MSR_TSC_EMULATION_STATUS indicates whether TSC accesses are emulated
after migration (to accommodate for a different host TSC frequency when TSC
scaling is not supported; we don't implement this in KVM). Guest can use
the same MSR to stop TSC access emulation by writing zero. Writing anything
else is forbidden.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 58fa8c029867..eefb85b86fe8 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1229,6 +1229,9 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 		hv->hv_tsc_emulation_control = data;
 		break;
 	case HV_X64_MSR_TSC_EMULATION_STATUS:
+		if (data && !host)
+			return 1;
+
 		hv->hv_tsc_emulation_status = data;
 		break;
 	case HV_X64_MSR_TIME_REF_COUNT:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-16 14:37 [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
  2021-03-16 14:37 ` [PATCH v2 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
@ 2021-03-16 14:37 ` Vitaly Kuznetsov
  2021-03-18 17:02   ` Marcelo Tosatti
  2021-03-16 14:37 ` [PATCH v2 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
we need to make sure no vCPU sees stale values in PV clock structures and
thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
clocksource is global and kvm_guest_time_update() only updates in on vCPU0
but this is not entirely correct: nothing blocks some other vCPU from
entering the guest before we finish the update on CPU0 and it can read
stale values from the page.

Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs
to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 23 +++++++++++++++++++++++
 arch/x86/kvm/hyperv.h |  1 +
 arch/x86/kvm/x86.c    |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index eefb85b86fe8..a0e3c49233d4 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1137,6 +1137,29 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	mutex_unlock(&hv->hv_lock);
 }
 
+void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
+{
+	struct kvm_hv *hv = to_kvm_hv(kvm);
+	u64 gfn;
+
+	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
+		return;
+
+	mutex_lock(&hv->hv_lock);
+
+	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
+		goto out_unlock;
+
+	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
+
+	hv->tsc_ref.tsc_sequence = 0;
+	kvm_write_guest(kvm, gfn_to_gpa(gfn),
+			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
+
+out_unlock:
+	mutex_unlock(&hv->hv_lock);
+}
+
 static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 			     bool host)
 {
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index e951af1fcb2c..60547d5cb6d7 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -133,6 +133,7 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu);
 
 void kvm_hv_setup_tsc_page(struct kvm *kvm,
 			   struct pvclock_vcpu_time_info *hv_clock);
+void kvm_hv_invalidate_tsc_page(struct kvm *kvm);
 
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47e021bdcc94..a5c5b38735e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2551,6 +2551,8 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 	struct kvm_vcpu *vcpu;
 	struct kvm_arch *ka = &kvm->arch;
 
+	kvm_hv_invalidate_tsc_page(kvm);
+
 	spin_lock(&ka->pvclock_gtod_sync_lock);
 	kvm_make_mclock_inprogress_request(kvm);
 	/* no guest entries from this point */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-16 14:37 [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
  2021-03-16 14:37 ` [PATCH v2 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
  2021-03-16 14:37 ` [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
@ 2021-03-16 14:37 ` Vitaly Kuznetsov
  2021-03-17  8:07   ` Paolo Bonzini
  2021-03-16 14:37 ` [PATCH v2 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment Vitaly Kuznetsov
  2021-03-18 14:09 ` [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests Vitaly Kuznetsov
  4 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
was updated from guest/host side or if we've failed to set it up (because
e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
need to retry.

Also, in a hypothetical situation when we are in 'always catchup' mode for
TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
returns false.

Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
get_time_ref_counter() to properly handle the situation when we failed to
write the updated TSC page values to the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 10 +++++++
 arch/x86/kvm/hyperv.c           | 49 +++++++++++++++++++++++----------
 2 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9bc091ecaaeb..71b14e51fdc0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -884,12 +884,22 @@ struct kvm_hv_syndbg {
 	u64 options;
 };
 
+enum hv_tsc_page_status {
+	HV_TSC_PAGE_UNSET = 0,
+	HV_TSC_PAGE_GUEST_CHANGED,
+	HV_TSC_PAGE_HOST_CHANGED,
+	HV_TSC_PAGE_SET,
+	HV_TSC_PAGE_UPDATING,
+	HV_TSC_PAGE_BROKEN,
+};
+
 /* Hyper-V emulation context */
 struct kvm_hv {
 	struct mutex hv_lock;
 	u64 hv_guest_os_id;
 	u64 hv_hypercall;
 	u64 hv_tsc_page;
+	enum hv_tsc_page_status hv_tsc_page_status;
 
 	/* Hyper-v based guest crash (NT kernel bugcheck) parameters */
 	u64 hv_crash_param[HV_X64_MSR_CRASH_PARAMS];
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a0e3c49233d4..5c0f10a2b3ab 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -520,10 +520,10 @@ static u64 get_time_ref_counter(struct kvm *kvm)
 	u64 tsc;
 
 	/*
-	 * The guest has not set up the TSC page or the clock isn't
-	 * stable, fall back to get_kvmclock_ns.
+	 * Fall back to get_kvmclock_ns() when TSC page hasn't been set up,
+	 * is broken, disabled or being updated.
 	 */
-	if (!hv->tsc_ref.tsc_sequence)
+	if (hv->hv_tsc_page_status != HV_TSC_PAGE_SET)
 		return div_u64(get_kvmclock_ns(kvm), 100);
 
 	vcpu = kvm_get_vcpu(kvm, 0);
@@ -1087,7 +1087,8 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
 	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);
 
-	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
+	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
+	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET)
 		return;
 
 	mutex_lock(&hv->hv_lock);
@@ -1101,7 +1102,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	 */
 	if (unlikely(kvm_read_guest(kvm, gfn_to_gpa(gfn),
 				    &tsc_seq, sizeof(tsc_seq))))
-		goto out_unlock;
+		goto out_err;
 
 	/*
 	 * While we're computing and writing the parameters, force the
@@ -1110,15 +1111,15 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	hv->tsc_ref.tsc_sequence = 0;
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
 			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
-		goto out_unlock;
+		goto out_err;
 
 	if (!compute_tsc_page_parameters(hv_clock, &hv->tsc_ref))
-		goto out_unlock;
+		goto out_err;
 
 	/* Ensure sequence is zero before writing the rest of the struct.  */
 	smp_wmb();
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref)))
-		goto out_unlock;
+		goto out_err;
 
 	/*
 	 * Now switch to the TSC page mechanism by writing the sequence.
@@ -1131,8 +1132,15 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	smp_wmb();
 
 	hv->tsc_ref.tsc_sequence = tsc_seq;
-	kvm_write_guest(kvm, gfn_to_gpa(gfn),
-			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
+	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
+			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
+		goto out_err;
+
+	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
+	goto out_unlock;
+
+out_err:
+	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
 out_unlock:
 	mutex_unlock(&hv->hv_lock);
 }
@@ -1142,7 +1150,8 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 	struct kvm_hv *hv = to_kvm_hv(kvm);
 	u64 gfn;
 
-	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
+	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
+	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET)
 		return;
 
 	mutex_lock(&hv->hv_lock);
@@ -1150,11 +1159,16 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
 		goto out_unlock;
 
+	/* Preserve HV_TSC_PAGE_GUEST_CHANGED/HV_TSC_PAGE_HOST_CHANGED states */
+	if (hv->hv_tsc_page_status == HV_TSC_PAGE_SET)
+		hv->hv_tsc_page_status = HV_TSC_PAGE_UPDATING;
+
 	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 
 	hv->tsc_ref.tsc_sequence = 0;
-	kvm_write_guest(kvm, gfn_to_gpa(gfn),
-			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
+	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
+			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
+		hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
 
 out_unlock:
 	mutex_unlock(&hv->hv_lock);
@@ -1216,8 +1230,15 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 	}
 	case HV_X64_MSR_REFERENCE_TSC:
 		hv->hv_tsc_page = data;
-		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
+		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) {
+			if (!host)
+				hv->hv_tsc_page_status = HV_TSC_PAGE_GUEST_CHANGED;
+			else
+				hv->hv_tsc_page_status = HV_TSC_PAGE_HOST_CHANGED;
 			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
+		} else {
+			hv->hv_tsc_page_status = HV_TSC_PAGE_UNSET;
+		}
 		break;
 	case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
 		return kvm_hv_msr_set_crash_data(kvm,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment
  2021-03-16 14:37 [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2021-03-16 14:37 ` [PATCH v2 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
@ 2021-03-16 14:37 ` Vitaly Kuznetsov
  2021-03-18 14:09 ` [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests Vitaly Kuznetsov
  4 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

When guest opts for re-enlightenment notifications upon migration, it is
in its right to assume that TSC page values never change (as they're only
supposed to change upon migration and the host has to keep things as they
are before it receives confirmation from the guest). This is mostly true
until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will
trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by
calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift
apart (even slightly), the update causes TSC page values to change.

The issue at hand is that when Hyper-V is migrated, it uses stale (cached)
TSC page values to compute the difference between its own clocksource
(provided by KVM) and its guests' TSC pages to program synthetic timers
and in some cases, when TSC page is updated, this puts all stimer
expirations in the past. This, in its turn, causes an interrupt storm
and L2 guests not making much forward progress.

Note, KVM doesn't fully implement re-enlightenment notification. Basically,
the support for reenlightenment MSRs is just a stub and userspace is only
expected to expose the feature when TSC scaling on the expected destination
hosts is available. With TSC scaling, no real re-enlightenment is needed
as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it
likely makes little sense to fully implement re-enlightenment in KVM.

Prevent TSC page from being updated after migration. In case it's not the
guest who's initiating the change and when TSC page is already enabled,
just keep it as it is: TSC value is supposed to be preserved across
migration and TSC frequency can't change with re-enlightenment enabled.
The guest is doomed anyway if any of this is not true.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 5c0f10a2b3ab..f98370a39936 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1077,6 +1077,21 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock,
 	return true;
 }
 
+/*
+ * Don't touch TSC page values if the guest has opted for TSC emulation after
+ * migration. KVM doesn't fully support reenlightenment notifications and TSC
+ * access emulation and Hyper-V is known to expect the values in TSC page to
+ * stay constant before TSC access emulation is disabled from guest side
+ * (HV_X64_MSR_TSC_EMULATION_STATUS). KVM userspace is expected to preserve TSC
+ * frequency and guest visible TSC value across migration (and prevent it when
+ * TSC scaling is unsupported).
+ */
+static inline bool tsc_page_update_unsafe(struct kvm_hv *hv)
+{
+	return (hv->hv_tsc_page_status != HV_TSC_PAGE_GUEST_CHANGED) &&
+		hv->hv_tsc_emulation_control;
+}
+
 void kvm_hv_setup_tsc_page(struct kvm *kvm,
 			   struct pvclock_vcpu_time_info *hv_clock)
 {
@@ -1104,6 +1119,14 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 				    &tsc_seq, sizeof(tsc_seq))))
 		goto out_err;
 
+	if (tsc_seq && tsc_page_update_unsafe(hv)) {
+		if (kvm_read_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref)))
+			goto out_err;
+
+		hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
+		goto out_unlock;
+	}
+
 	/*
 	 * While we're computing and writing the parameters, force the
 	 * guest to use the time reference count MSR.
@@ -1151,7 +1174,8 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 	u64 gfn;
 
 	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
-	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET)
+	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
+	    tsc_page_update_unsafe(hv))
 		return;
 
 	mutex_lock(&hv->hv_lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-16 14:37 ` [PATCH v2 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
@ 2021-03-17  8:07   ` Paolo Bonzini
  2021-03-17 11:19     ` [PATCH v2 5/4] KVM: x86: hyper-v: Briefly document enum hv_tsc_page_status Vitaly Kuznetsov
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-03-17  8:07 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On 16/03/21 15:37, Vitaly Kuznetsov wrote:
>   
> +enum hv_tsc_page_status {
> +	HV_TSC_PAGE_UNSET = 0,
> +	HV_TSC_PAGE_GUEST_CHANGED,
> +	HV_TSC_PAGE_HOST_CHANGED,
> +	HV_TSC_PAGE_SET,
> +	HV_TSC_PAGE_UPDATING,
> +	HV_TSC_PAGE_BROKEN,
> +};
> +

Can you add comments here?  No need to send the whole series, I can 
squash in the changes.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 5/4] KVM: x86: hyper-v: Briefly document enum hv_tsc_page_status
  2021-03-17  8:07   ` Paolo Bonzini
@ 2021-03-17 11:19     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-17 11:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Add comments to enum hv_tsc_page_status documenting what each state means.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71b14e51fdc0..e1b6e2edc828 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -884,12 +884,19 @@ struct kvm_hv_syndbg {
 	u64 options;
 };
 
+/* Current state of Hyper-V TSC page clocksource */
 enum hv_tsc_page_status {
+	/* TSC page was not set up or disabled */
 	HV_TSC_PAGE_UNSET = 0,
+	/* TSC page MSR was written by the guest, update pending */
 	HV_TSC_PAGE_GUEST_CHANGED,
+	/* TSC page MSR was written by KVM userspace, update pending */
 	HV_TSC_PAGE_HOST_CHANGED,
+	/* TSC page was properly set up and is currently active  */
 	HV_TSC_PAGE_SET,
+	/* TSC page is currently being updated and therefore is inactive */
 	HV_TSC_PAGE_UPDATING,
+	/* TSC page was set up with an inaccessible GPA */
 	HV_TSC_PAGE_BROKEN,
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-16 14:37 [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2021-03-16 14:37 ` [PATCH v2 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment Vitaly Kuznetsov
@ 2021-03-18 14:09 ` Vitaly Kuznetsov
  2021-03-18 14:26   ` Paolo Bonzini
                     ` (2 more replies)
  4 siblings, 3 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-18 14:09 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Introduce a new selftest for Hyper-V clocksources (MSR-based reference TSC
and TSC page). As a starting point, test the following:
1) Reference TSC is 1Ghz clock.
2) Reference TSC and TSC page give the same reading.
3) TSC page gets updated upon KVM_SET_CLOCK call.
4) TSC page does not get updated when guest opted for reenlightenment.
5) Disabled TSC page doesn't get updated.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/hyperv_clock.c       | 233 ++++++++++++++++++
 3 files changed, 235 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_clock.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 32b87cc77c8e..22be05c55f13 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -9,6 +9,7 @@
 /x86_64/evmcs_test
 /x86_64/get_cpuid_test
 /x86_64/kvm_pv_test
+/x86_64/hyperv_clock
 /x86_64/hyperv_cpuid
 /x86_64/mmio_warning_test
 /x86_64/platform_info_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a6d61f451f88..c3672e9087d3 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -41,6 +41,7 @@ LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_ha
 TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
 TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
 TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
+TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
new file mode 100644
index 000000000000..39d6491d8458
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Red Hat, Inc.
+ *
+ * Tests for Hyper-V clocksources
+ */
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+struct ms_hyperv_tsc_page {
+	volatile u32 tsc_sequence;
+	u32 reserved1;
+	volatile u64 tsc_scale;
+	volatile s64 tsc_offset;
+} __packed;
+
+#define HV_X64_MSR_GUEST_OS_ID			0x40000000
+#define HV_X64_MSR_TIME_REF_COUNT		0x40000020
+#define HV_X64_MSR_REFERENCE_TSC		0x40000021
+#define HV_X64_MSR_TSC_FREQUENCY		0x40000022
+#define HV_X64_MSR_REENLIGHTENMENT_CONTROL	0x40000106
+#define HV_X64_MSR_TSC_EMULATION_CONTROL	0x40000107
+
+/* Simplified mul_u64_u64_shr() */
+static inline u64 mul_u64_u64_shr64(u64 a, u64 b)
+{
+	union {
+		u64 ll;
+		struct {
+			u32 low, high;
+		} l;
+	} rm, rn, rh, a0, b0;
+	u64 c;
+
+	a0.ll = a;
+	b0.ll = b;
+
+	rm.ll = (u64)a0.l.low * b0.l.high;
+	rn.ll = (u64)a0.l.high * b0.l.low;
+	rh.ll = (u64)a0.l.high * b0.l.high;
+
+	rh.l.low = c = rm.l.high + rn.l.high + rh.l.low;
+	rh.l.high = (c >> 32) + rh.l.high;
+
+	return rh.ll;
+}
+
+static inline void nop_loop(void)
+{
+	int i;
+
+	for (i = 0; i < 1000000; i++)
+		asm volatile("nop");
+}
+
+static inline void check_tsc_msr_rdtsc(void)
+{
+	u64 tsc_freq, r1, r2, t1, t2;
+	s64 delta_ns;
+
+	tsc_freq = rdmsr(HV_X64_MSR_TSC_FREQUENCY);
+	GUEST_ASSERT(tsc_freq > 0);
+
+	/* First, check MSR-based clocksource */
+	r1 = rdtsc();
+	t1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
+	nop_loop();
+	r2 = rdtsc();
+	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
+
+	GUEST_ASSERT(t2 > t1);
+
+	/* HV_X64_MSR_TIME_REF_COUNT is in 100ns */
+	delta_ns = ((t2 - t1) * 100) - ((r2 - r1) * 1000000000 / tsc_freq);
+	if (delta_ns < 0)
+		delta_ns = -delta_ns;
+
+	/* 1% tolerance */
+	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
+}
+
+static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page)
+{
+	u64 r1, r2, t1, t2;
+	s64 delta_ns;
+
+	/* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */
+	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
+	r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
+	nop_loop();
+	t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
+	r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
+
+	delta_ns = ((r2 - r1) - (t2 - t1)) * 100;
+	if (delta_ns < 0)
+		delta_ns = -delta_ns;
+
+	/* 1% tolerance */
+	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
+}
+
+static void guest_main(struct ms_hyperv_tsc_page *tsc_page, vm_paddr_t tsc_page_gpa)
+{
+	u64 tsc_scale, tsc_offset;
+
+	/* Set Guest OS id to enable Hyper-V emulation */
+	GUEST_SYNC(1);
+	wrmsr(HV_X64_MSR_GUEST_OS_ID, (u64)0x8100 << 48);
+	GUEST_SYNC(2);
+
+	check_tsc_msr_rdtsc();
+
+	GUEST_SYNC(3);
+
+	/* Set up TSC page is disabled state, check that it's clean */
+	wrmsr(HV_X64_MSR_REFERENCE_TSC, tsc_page_gpa);
+	GUEST_ASSERT(tsc_page->tsc_sequence == 0);
+	GUEST_ASSERT(tsc_page->tsc_scale == 0);
+	GUEST_ASSERT(tsc_page->tsc_offset == 0);
+
+	GUEST_SYNC(4);
+
+	/* Set up TSC page is enabled state */
+	wrmsr(HV_X64_MSR_REFERENCE_TSC, tsc_page_gpa | 0x1);
+	GUEST_ASSERT(tsc_page->tsc_sequence != 0);
+
+	GUEST_SYNC(5);
+
+	check_tsc_msr_tsc_page(tsc_page);
+
+	GUEST_SYNC(6);
+
+	tsc_offset = tsc_page->tsc_offset;
+	/* Call KVM_SET_CLOCK from userspace, check that TSC page was updated */
+	GUEST_SYNC(7);
+	GUEST_ASSERT(tsc_page->tsc_offset != tsc_offset);
+
+	nop_loop();
+
+	/*
+	 * Enable Re-enlightenment and check that TSC page stays constant across
+	 * KVM_SET_CLOCK.
+	 */
+	wrmsr(HV_X64_MSR_REENLIGHTENMENT_CONTROL, 0x1 << 16 | 0xff);
+	wrmsr(HV_X64_MSR_TSC_EMULATION_CONTROL, 0x1);
+	tsc_offset = tsc_page->tsc_offset;
+	tsc_scale = tsc_page->tsc_scale;
+	GUEST_SYNC(8);
+	GUEST_ASSERT(tsc_page->tsc_offset == tsc_offset);
+	GUEST_ASSERT(tsc_page->tsc_scale == tsc_scale);
+
+	GUEST_SYNC(9);
+
+	check_tsc_msr_tsc_page(tsc_page);
+
+	/*
+	 * Disable re-enlightenment and TSC page, check that KVM doesn't update
+	 * it anymore.
+	 */
+	wrmsr(HV_X64_MSR_REENLIGHTENMENT_CONTROL, 0);
+	wrmsr(HV_X64_MSR_TSC_EMULATION_CONTROL, 0);
+	wrmsr(HV_X64_MSR_REFERENCE_TSC, 0);
+	memset(tsc_page, 0, sizeof(*tsc_page));
+
+	GUEST_SYNC(10);
+	GUEST_ASSERT(tsc_page->tsc_sequence == 0);
+	GUEST_ASSERT(tsc_page->tsc_offset == 0);
+	GUEST_ASSERT(tsc_page->tsc_scale == 0);
+
+	GUEST_DONE();
+}
+
+#define VCPU_ID 0
+
+int main(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	struct ucall uc;
+	vm_vaddr_t tsc_page_gva;
+	int stage;
+
+	vm = vm_create_default(VCPU_ID, 0, guest_main);
+	run = vcpu_state(vm, VCPU_ID);
+
+	vcpu_set_hv_cpuid(vm, VCPU_ID);
+
+	tsc_page_gva = vm_vaddr_alloc(vm, getpagesize(), 0x10000, 0, 0);
+	memset(addr_gpa2hva(vm, tsc_page_gva), 0x0, getpagesize());
+	TEST_ASSERT((addr_gva2gpa(vm, tsc_page_gva) & (getpagesize() - 1)) == 0,
+		"TSC page has to be page aligned\n");
+	vcpu_args_set(vm, VCPU_ID, 2, tsc_page_gva, addr_gva2gpa(vm, tsc_page_gva));
+
+	for (stage = 1;; stage++) {
+		_vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Stage %d: unexpected exit reason: %u (%s),\n",
+			    stage, run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+				  __FILE__, uc.args[1]);
+			/* NOT REACHED */
+		case UCALL_SYNC:
+			break;
+		case UCALL_DONE:
+			/* Keep in sync with guest_main() */
+			TEST_ASSERT(stage == 11, "Testing ended prematurely, stage %d\n",
+				    stage);
+			goto out;
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		}
+
+		TEST_ASSERT(!strcmp((const char *)uc.args[0], "hello") &&
+			    uc.args[1] == stage,
+			    "Stage %d: Unexpected register values vmexit, got %lx",
+			    stage, (ulong)uc.args[1]);
+
+		/* Reset kvmclock triggering TSC page update */
+		if (stage == 7 || stage == 8 || stage == 10) {
+			struct kvm_clock_data clock = {0};
+
+			vm_ioctl(vm, KVM_SET_CLOCK, &clock);
+		}
+	}
+
+out:
+	kvm_vm_free(vm);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 14:09 ` [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests Vitaly Kuznetsov
@ 2021-03-18 14:26   ` Paolo Bonzini
  2021-03-18 14:52     ` Vitaly Kuznetsov
  2021-03-18 15:27   ` Paolo Bonzini
  2021-03-18 16:57   ` Marcelo Tosatti
  2 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-03-18 14:26 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On 18/03/21 15:09, Vitaly Kuznetsov wrote:
> +static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page)
> +{
> +	u64 r1, r2, t1, t2;
> +	s64 delta_ns;
> +
> +	/* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */
> +	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
> +	r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
> +	nop_loop();
> +	t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
> +	r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
> +
> +	delta_ns = ((r2 - r1) - (t2 - t1)) * 100;
> +	if (delta_ns < 0)
> +		delta_ns = -delta_ns;
> +
> +	/* 1% tolerance */
> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
> +}
> +

I think you should also be able to check r1 and r2 individually, not 
just r1 and r2.  Is that correct?

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 14:26   ` Paolo Bonzini
@ 2021-03-18 14:52     ` Vitaly Kuznetsov
  2021-03-18 15:01       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-18 14:52 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/03/21 15:09, Vitaly Kuznetsov wrote:
>> +static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page)
>> +{
>> +	u64 r1, r2, t1, t2;
>> +	s64 delta_ns;
>> +
>> +	/* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */
>> +	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
>> +	r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>> +	nop_loop();
>> +	t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
>> +	r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>> +
>> +	delta_ns = ((r2 - r1) - (t2 - t1)) * 100;
>> +	if (delta_ns < 0)
>> +		delta_ns = -delta_ns;
>> +
>> +	/* 1% tolerance */
>> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
>> +}
>> +
>
> I think you should also be able to check r1 and r2 individually, not 
> just r1 and r2.  Is that correct?

Right, we could've checked r1 == t1 and r2 == t2 actually (with some
tiny margin of course). Let me try that.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 14:52     ` Vitaly Kuznetsov
@ 2021-03-18 15:01       ` Paolo Bonzini
  2021-03-18 15:23         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-03-18 15:01 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On 18/03/21 15:52, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 18/03/21 15:09, Vitaly Kuznetsov wrote:
>>> +static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page)
>>> +{
>>> +	u64 r1, r2, t1, t2;
>>> +	s64 delta_ns;
>>> +
>>> +	/* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */
>>> +	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
>>> +	r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>>> +	nop_loop();
>>> +	t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
>>> +	r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>>> +
>>> +	delta_ns = ((r2 - r1) - (t2 - t1)) * 100;
>>> +	if (delta_ns < 0)
>>> +		delta_ns = -delta_ns;
>>> +
>>> +	/* 1% tolerance */
>>> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
>>> +}
>>> +
>>
>> I think you should also be able to check r1 and r2 individually, not
>> just r1 and r2.  Is that correct?
> 
> Right, we could've checked r1 == t1 and r2 == t2 actually (with some
> tiny margin of course). Let me try that.

np, I can do that too.  Just checking my recollection of the TLFS.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 15:01       ` Paolo Bonzini
@ 2021-03-18 15:23         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-18 15:23 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1315 bytes --]

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/03/21 15:52, Vitaly Kuznetsov wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>>> On 18/03/21 15:09, Vitaly Kuznetsov wrote:
>>>> +static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page)
>>>> +{
>>>> +	u64 r1, r2, t1, t2;
>>>> +	s64 delta_ns;
>>>> +
>>>> +	/* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */
>>>> +	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
>>>> +	r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>>>> +	nop_loop();
>>>> +	t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
>>>> +	r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>>>> +
>>>> +	delta_ns = ((r2 - r1) - (t2 - t1)) * 100;
>>>> +	if (delta_ns < 0)
>>>> +		delta_ns = -delta_ns;
>>>> +
>>>> +	/* 1% tolerance */
>>>> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
>>>> +}
>>>> +
>>>
>>> I think you should also be able to check r1 and r2 individually, not
>>> just r1 and r2.  Is that correct?
>> 
>> Right, we could've checked r1 == t1 and r2 == t2 actually (with some
>> tiny margin of course). Let me try that.
>
> np, I can do that too.  Just checking my recollection of the TLFS.
>

I already hacked it, patch attached :-) Feel free to squash it or put
your own version in.

-- 
Vitaly


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-selftests-KVM-hyper-v-check_tsc_msr_tsc_page-fix.patch --]
[-- Type: text/x-patch, Size: 1846 bytes --]

From dae56d5d5c5fc7442d4dabf96a9c322acb42f458 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 18 Mar 2021 16:21:18 +0100
Subject: [PATCH] selftests: KVM: hyper-v: check_tsc_msr_tsc_page() fix.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/x86_64/hyperv_clock.c       | 25 ++++++++++---------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
index 39d6491d8458..aacb6d2697da 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
@@ -82,22 +82,23 @@ static inline void check_tsc_msr_rdtsc(void)
 
 static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page)
 {
-	u64 r1, r2, t1, t2;
-	s64 delta_ns;
+	u64 t1, t2;
 
-	/* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */
-	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
-	r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
+	/*
+	 * Make TSC run for a while to make sure clocksources don't diverge
+	 * over time.
+	 */
 	nop_loop();
-	t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
-	r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
 
-	delta_ns = ((r2 - r1) - (t2 - t1)) * 100;
-	if (delta_ns < 0)
-		delta_ns = -delta_ns;
+	/*
+	 * Get readings from TSC page and Reference TSC clocksources and make
+	 * sure they match.
+	 */
+	t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset;
+	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
 
-	/* 1% tolerance */
-	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
+	/* 10us tolerance */
+	GUEST_ASSERT(t2 - t1 <= 100);
 }
 
 static void guest_main(struct ms_hyperv_tsc_page *tsc_page, vm_paddr_t tsc_page_gpa)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 14:09 ` [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests Vitaly Kuznetsov
  2021-03-18 14:26   ` Paolo Bonzini
@ 2021-03-18 15:27   ` Paolo Bonzini
  2021-03-18 16:57   ` Marcelo Tosatti
  2 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2021-03-18 15:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On 18/03/21 15:09, Vitaly Kuznetsov wrote:
> +
> +	/* 1% tolerance */
> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);

Since I needed to add a printf to understand this one, I have also added 
a host-side test using KVM_GET_MSR, it doesn't hurt.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 14:09 ` [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests Vitaly Kuznetsov
  2021-03-18 14:26   ` Paolo Bonzini
  2021-03-18 15:27   ` Paolo Bonzini
@ 2021-03-18 16:57   ` Marcelo Tosatti
  2021-03-18 17:50     ` Paolo Bonzini
  2 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2021-03-18 16:57 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson

On Thu, Mar 18, 2021 at 03:09:49PM +0100, Vitaly Kuznetsov wrote:
> Introduce a new selftest for Hyper-V clocksources (MSR-based reference TSC
> and TSC page). As a starting point, test the following:
> 1) Reference TSC is 1Ghz clock.
> 2) Reference TSC and TSC page give the same reading.
> 3) TSC page gets updated upon KVM_SET_CLOCK call.
> 4) TSC page does not get updated when guest opted for reenlightenment.
> 5) Disabled TSC page doesn't get updated.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/x86_64/hyperv_clock.c       | 233 ++++++++++++++++++
>  3 files changed, 235 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_clock.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 32b87cc77c8e..22be05c55f13 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -9,6 +9,7 @@
>  /x86_64/evmcs_test
>  /x86_64/get_cpuid_test
>  /x86_64/kvm_pv_test
> +/x86_64/hyperv_clock
>  /x86_64/hyperv_cpuid
>  /x86_64/mmio_warning_test
>  /x86_64/platform_info_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..c3672e9087d3 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -41,6 +41,7 @@ LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_ha
>  TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
>  TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
>  TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
> +TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
>  TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
> new file mode 100644
> index 000000000000..39d6491d8458
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
> @@ -0,0 +1,233 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2021, Red Hat, Inc.
> + *
> + * Tests for Hyper-V clocksources
> + */
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +struct ms_hyperv_tsc_page {
> +	volatile u32 tsc_sequence;
> +	u32 reserved1;
> +	volatile u64 tsc_scale;
> +	volatile s64 tsc_offset;
> +} __packed;
> +
> +#define HV_X64_MSR_GUEST_OS_ID			0x40000000
> +#define HV_X64_MSR_TIME_REF_COUNT		0x40000020
> +#define HV_X64_MSR_REFERENCE_TSC		0x40000021
> +#define HV_X64_MSR_TSC_FREQUENCY		0x40000022
> +#define HV_X64_MSR_REENLIGHTENMENT_CONTROL	0x40000106
> +#define HV_X64_MSR_TSC_EMULATION_CONTROL	0x40000107
> +
> +/* Simplified mul_u64_u64_shr() */
> +static inline u64 mul_u64_u64_shr64(u64 a, u64 b)
> +{
> +	union {
> +		u64 ll;
> +		struct {
> +			u32 low, high;
> +		} l;
> +	} rm, rn, rh, a0, b0;
> +	u64 c;
> +
> +	a0.ll = a;
> +	b0.ll = b;
> +
> +	rm.ll = (u64)a0.l.low * b0.l.high;
> +	rn.ll = (u64)a0.l.high * b0.l.low;
> +	rh.ll = (u64)a0.l.high * b0.l.high;
> +
> +	rh.l.low = c = rm.l.high + rn.l.high + rh.l.low;
> +	rh.l.high = (c >> 32) + rh.l.high;
> +
> +	return rh.ll;
> +}
> +
> +static inline void nop_loop(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < 1000000; i++)
> +		asm volatile("nop");
> +}
> +
> +static inline void check_tsc_msr_rdtsc(void)
> +{
> +	u64 tsc_freq, r1, r2, t1, t2;
> +	s64 delta_ns;
> +
> +	tsc_freq = rdmsr(HV_X64_MSR_TSC_FREQUENCY);
> +	GUEST_ASSERT(tsc_freq > 0);
> +
> +	/* First, check MSR-based clocksource */
> +	r1 = rdtsc();
> +	t1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
> +	nop_loop();
> +	r2 = rdtsc();
> +	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
> +
> +	GUEST_ASSERT(t2 > t1);
> +
> +	/* HV_X64_MSR_TIME_REF_COUNT is in 100ns */
> +	delta_ns = ((t2 - t1) * 100) - ((r2 - r1) * 1000000000 / tsc_freq);
> +	if (delta_ns < 0)
> +		delta_ns = -delta_ns;

I think this should be monotonically increasing: 

1.	r1 = rdtsc();
2.	t1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
3.	nop_loop();
4.	r2 = rdtsc();
5.	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);	

	=>

	r1 <= t1 <= r2 <= t2

> +
> +	/* 1% tolerance */
> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
> +}

Doesnt an unbounded schedule-out/schedule-in (which resembles
overloaded host) of the qemu-kvm vcpu in any of the 
points 1,2,3,4,5 break the assertion above?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-16 14:37 ` [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
@ 2021-03-18 17:02   ` Marcelo Tosatti
  2021-03-18 18:04     ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2021-03-18 17:02 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson

On Tue, Mar 16, 2021 at 03:37:34PM +0100, Vitaly Kuznetsov wrote:
> When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
> we need to make sure no vCPU sees stale values in PV clock structures and
> thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
> clocksource is global and kvm_guest_time_update() only updates in on vCPU0
> but this is not entirely correct: nothing blocks some other vCPU from
> entering the guest before we finish the update on CPU0 and it can read
> stale values from the page.
> 
> Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs
> to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT).

Hi Vitaly,

Not clear why this is necessary, if the choice was to not touch TSC page
at all, when invariant TSC is supported on the host...

Ah, OK, this is not for the migration with iTSC on destination case,
but any call to kvm_gen_update_masterclock, correct?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 16:57   ` Marcelo Tosatti
@ 2021-03-18 17:50     ` Paolo Bonzini
  2021-03-18 17:55       ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-03-18 17:50 UTC (permalink / raw)
  To: Marcelo Tosatti, Vitaly Kuznetsov
  Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson

On 18/03/21 17:57, Marcelo Tosatti wrote:
> I think this should be monotonically increasing:
> 
> 1.	r1 = rdtsc();
> 2.	t1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
> 3.	nop_loop();
> 4.	r2 = rdtsc();
> 5.	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);	
> 
>>
>> +	/* 1% tolerance */
>> +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
>> +}
> 
> Doesnt an unbounded schedule-out/schedule-in (which resembles
> overloaded host) of the qemu-kvm vcpu in any of the 
> points 1,2,3,4,5 break the assertion above?


Yes, there's a window of a handful of instructions (at least on 
non-preemptible kernels).  If anyone ever hits it, we can run the test 
100 times and check that it passes at least 95 or 99 of them.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 17:50     ` Paolo Bonzini
@ 2021-03-18 17:55       ` Marcelo Tosatti
  2021-03-19  9:35         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2021-03-18 17:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, kvm, Sean Christopherson, Wanpeng Li, Jim Mattson

On Thu, Mar 18, 2021 at 06:50:35PM +0100, Paolo Bonzini wrote:
> On 18/03/21 17:57, Marcelo Tosatti wrote:
> > I think this should be monotonically increasing:
> > 
> > 1.	r1 = rdtsc();
> > 2.	t1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
> > 3.	nop_loop();
> > 4.	r2 = rdtsc();
> > 5.	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);	
> > 
> > > 
> > > +	/* 1% tolerance */
> > > +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
> > > +}
> > 
> > Doesnt an unbounded schedule-out/schedule-in (which resembles
> > overloaded host) of the qemu-kvm vcpu in any of the points 1,2,3,4,5
> > break the assertion above?
> 
> 
> Yes, there's a window of a handful of instructions (at least on
> non-preemptible kernels).  If anyone ever hits it, we can run the test 100
> times and check that it passes at least 95 or 99 of them.
> 
> Paolo

Yep, sounds like a good solution.

However this makes me wonder on the validity of the test: what its
trying to verify, again? (i would check the monotonicity that 
is r1 <= t1 <= r2 <= t2 as well, without the nop_loop in between).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-18 17:02   ` Marcelo Tosatti
@ 2021-03-18 18:04     ` Marcelo Tosatti
  2021-03-18 18:05       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2021-03-18 18:04 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson

On Thu, Mar 18, 2021 at 02:02:08PM -0300, Marcelo Tosatti wrote:
> On Tue, Mar 16, 2021 at 03:37:34PM +0100, Vitaly Kuznetsov wrote:
> > When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
> > we need to make sure no vCPU sees stale values in PV clock structures and
> > thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
> > clocksource is global and kvm_guest_time_update() only updates in on vCPU0
> > but this is not entirely correct: nothing blocks some other vCPU from
> > entering the guest before we finish the update on CPU0 and it can read
> > stale values from the page.
> > 
> > Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs
> > to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT).
> 
> Hi Vitaly,
> 
> Not clear why this is necessary, if the choice was to not touch TSC page
> at all, when invariant TSC is supported on the host...
	
	s/invariant TSC/TSC scaling/

> Ah, OK, this is not for the migration with iTSC on destination case,
> but any call to kvm_gen_update_masterclock, correct?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-18 18:04     ` Marcelo Tosatti
@ 2021-03-18 18:05       ` Paolo Bonzini
  2021-03-18 18:30         ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-03-18 18:05 UTC (permalink / raw)
  To: Marcelo Tosatti, Vitaly Kuznetsov
  Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson

On 18/03/21 19:04, Marcelo Tosatti wrote:
>>
>> Not clear why this is necessary, if the choice was to not touch TSC page
>> at all, when invariant TSC is supported on the host...
> 	
> 	s/invariant TSC/TSC scaling/
> 
>> Ah, OK, this is not for the migration with iTSC on destination case,
>> but any call to kvm_gen_update_masterclock, correct?

Yes, any update can be racy.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-18 18:05       ` Paolo Bonzini
@ 2021-03-18 18:30         ` Marcelo Tosatti
  2021-03-19  9:29           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2021-03-18 18:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, kvm, Sean Christopherson, Wanpeng Li, Jim Mattson

On Thu, Mar 18, 2021 at 07:05:49PM +0100, Paolo Bonzini wrote:
> On 18/03/21 19:04, Marcelo Tosatti wrote:
> > > 
> > > Not clear why this is necessary, if the choice was to not touch TSC page
> > > at all, when invariant TSC is supported on the host...
> > 	
> > 	s/invariant TSC/TSC scaling/
> > 
> > > Ah, OK, this is not for the migration with iTSC on destination case,
> > > but any call to kvm_gen_update_masterclock, correct?
> 
> Yes, any update can be racy.
> 
> Paolo

Which makes an unrelated KVM_REQ_MASTERCLOCK_UPDATE -> kvm_gen_update_masterclock 
sequence to inadvertedly reduce performance a possibility, unless i am
missing something.

Ah, OK, it should be enabled again at KVM_REQ_CLOCK_UPDATE.

Nevermind, then.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-18 18:30         ` Marcelo Tosatti
@ 2021-03-19  9:29           ` Vitaly Kuznetsov
  0 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-19  9:29 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson, Paolo Bonzini

Marcelo Tosatti <mtosatti@redhat.com> writes:

> On Thu, Mar 18, 2021 at 07:05:49PM +0100, Paolo Bonzini wrote:
>> On 18/03/21 19:04, Marcelo Tosatti wrote:
>> > > 
>> > > Not clear why this is necessary, if the choice was to not touch TSC page
>> > > at all, when invariant TSC is supported on the host...
>> > 	
>> > 	s/invariant TSC/TSC scaling/
>> > 
>> > > Ah, OK, this is not for the migration with iTSC on destination case,
>> > > but any call to kvm_gen_update_masterclock, correct?
>> 
>> Yes, any update can be racy.
>> 
>> Paolo
>
> Which makes an unrelated KVM_REQ_MASTERCLOCK_UPDATE -> kvm_gen_update_masterclock 
> sequence to inadvertedly reduce performance a possibility, unless i am
> missing something.
>
> Ah, OK, it should be enabled again at KVM_REQ_CLOCK_UPDATE.

Right,

this patch is loosely related to the original problem, it's just
something I've noticed while debugging. Unlike kvmclock, which is
per-cpu, TSC page is global so only vCPU0 updates it in
kvm_guest_time_update() but the potential problem is: some other vCPU
may enter the guest before we manage to update the page on vCPU0 and
this is racy. Here, we just follow TLFS: invalidate TSC page while we're
updating it (on KVM_REQ_MASTERCLOCK_UPDATE).

PATCH4, which avoids updating TSC page when re-enlightenment is enabled,
also disables this invalidation. This is OK as we're not going to update
it.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests
  2021-03-18 17:55       ` Marcelo Tosatti
@ 2021-03-19  9:35         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-19  9:35 UTC (permalink / raw)
  To: Marcelo Tosatti, Paolo Bonzini
  Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson

Marcelo Tosatti <mtosatti@redhat.com> writes:

> On Thu, Mar 18, 2021 at 06:50:35PM +0100, Paolo Bonzini wrote:
>> On 18/03/21 17:57, Marcelo Tosatti wrote:
>> > I think this should be monotonically increasing:
>> > 
>> > 1.	r1 = rdtsc();
>> > 2.	t1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
>> > 3.	nop_loop();
>> > 4.	r2 = rdtsc();
>> > 5.	t2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT);	
>> > 
>> > > 
>> > > +	/* 1% tolerance */
>> > > +	GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100);
>> > > +}
>> > 
>> > Doesnt an unbounded schedule-out/schedule-in (which resembles
>> > overloaded host) of the qemu-kvm vcpu in any of the points 1,2,3,4,5
>> > break the assertion above?
>> 
>> 
>> Yes, there's a window of a handful of instructions (at least on
>> non-preemptible kernels).  If anyone ever hits it, we can run the test 100
>> times and check that it passes at least 95 or 99 of them.
>> 
>> Paolo
>
> Yep, sounds like a good solution.
>
> However this makes me wonder on the validity of the test: what its
> trying to verify, again? (i would check the monotonicity that 
> is r1 <= t1 <= r2 <= t2 as well, without the nop_loop in between).

This particular place tests that Reference TSC
(HV_X64_MSR_TIME_REF_COUNT) is a 1Ghz clock. We test it against raw
TSC. TSC frequency is known to us (HV_X64_MSR_TSC_FREQUENCY) so we can
compare the delta after nop_loop().

We can't directly compare r1 and t1 (and r2 and t2) here because we
don't know the base precisely. We could've probably reset TSC to 0 and
kvmclock (which converts to Reference TSC) to 0 and compare after. For
now, we just check that Reference TSC is ticking as it should.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-03-19  9:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 14:37 [PATCH v2 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
2021-03-16 14:37 ` [PATCH v2 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
2021-03-16 14:37 ` [PATCH v2 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
2021-03-18 17:02   ` Marcelo Tosatti
2021-03-18 18:04     ` Marcelo Tosatti
2021-03-18 18:05       ` Paolo Bonzini
2021-03-18 18:30         ` Marcelo Tosatti
2021-03-19  9:29           ` Vitaly Kuznetsov
2021-03-16 14:37 ` [PATCH v2 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
2021-03-17  8:07   ` Paolo Bonzini
2021-03-17 11:19     ` [PATCH v2 5/4] KVM: x86: hyper-v: Briefly document enum hv_tsc_page_status Vitaly Kuznetsov
2021-03-16 14:37 ` [PATCH v2 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment Vitaly Kuznetsov
2021-03-18 14:09 ` [PATCH v2 6/4] selftests: kvm: Add basic Hyper-V clocksources tests Vitaly Kuznetsov
2021-03-18 14:26   ` Paolo Bonzini
2021-03-18 14:52     ` Vitaly Kuznetsov
2021-03-18 15:01       ` Paolo Bonzini
2021-03-18 15:23         ` Vitaly Kuznetsov
2021-03-18 15:27   ` Paolo Bonzini
2021-03-18 16:57   ` Marcelo Tosatti
2021-03-18 17:50     ` Paolo Bonzini
2021-03-18 17:55       ` Marcelo Tosatti
2021-03-19  9:35         ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).