kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: x86: hyper-v: TSC page fixes
@ 2021-03-15 14:37 Vitaly Kuznetsov
  2021-03-15 14:37 ` [PATCH 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

I'm investigating an issue when Linux guest on nested Hyper-V on KVM 
(WSL2 on Win10 on KVM to be precise) hangs after L1 KVM is migrated. Trace
shows us that L2 is trying to set L1's Synthetic Timer and reacting to
this Hyper-V sets Synthetic Timer in KVM but the target value it sets is
always slightly in the past, this causes the timer to expire immediately
and an interrupt storm is thus observed. L2 is not making much forward
progress.

The issue is only observed when re-enlightenment is exposed to L1. KVM
doesn't really support re-enlightenment notifications upon migration,
userspace is supposed to expose it only when TSC scaling is supported
on the destination host. Without re-enlightenment exposed, Hyper-V will
not expose stable TSC page clocksource to its L2s. The issue is observed
on when migration happens between hosts supporting TSC scaling. Rumor has
it that it is possible to reproduce the problem even when migrating locally
to the same host, though, I wasn't really able to.

The current speculation is that when Hyper-V is migrated, it uses stale
(cached) TSC page values to compute the difference between its own
clocksource (provided by KVM) and its guests' TSC pages to program
synthetic timers and in some cases, when TSC page is updated, this puts all
stimer expirations in the past. This, in its turn, causes an interrupt
storms (both L0-L1 and L1->L2 as Hyper-V mirrors stimer expirations into
L2).

The proposed fix is to skip updating TSC page clocksource when guest opted
for re-enlightenment notifications (PATCH4). Patches 1-3 are slightly
related fixes to the (mostly theoretical) issues I've stumbled upon while
working on the problem.

Vitaly Kuznetsov (4):
  KVM: x86: hyper-v: Limit guest to writing zero to
    HV_X64_MSR_TSC_EMULATION_STATUS
  KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary
    CPUs
  KVM: x86: hyper-v: Track Hyper-V TSC page status
  KVM: x86: hyper-v: Don't touch TSC page values when guest opted for
    re-enlightenment

 arch/x86/include/asm/kvm_host.h |  9 +++++++
 arch/x86/kvm/hyperv.c           | 42 +++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.c              |  5 ++--
 3 files changed, 49 insertions(+), 7 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS
  2021-03-15 14:37 [PATCH 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
@ 2021-03-15 14:37 ` Vitaly Kuznetsov
  2021-03-15 14:37 ` [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

HV_X64_MSR_TSC_EMULATION_STATUS indicates whether TSC accesses are emulated
after migration (to accommodate for a different host TSC frequency when TSC
scaling is not supported; we don't implement this in KVM). Guest can use
the same MSR to stop TSC access emulation by writing zero. Writing anything
else is forbidden.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 58fa8c029867..eefb85b86fe8 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1229,6 +1229,9 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 		hv->hv_tsc_emulation_control = data;
 		break;
 	case HV_X64_MSR_TSC_EMULATION_STATUS:
+		if (data && !host)
+			return 1;
+
 		hv->hv_tsc_emulation_status = data;
 		break;
 	case HV_X64_MSR_TIME_REF_COUNT:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-15 14:37 [PATCH 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
  2021-03-15 14:37 ` [PATCH 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
@ 2021-03-15 14:37 ` Vitaly Kuznetsov
  2021-03-15 15:45   ` Paolo Bonzini
  2021-03-15 14:37 ` [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
  2021-03-15 14:37 ` [PATCH 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment Vitaly Kuznetsov
  3 siblings, 1 reply; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
we need to make sure no vCPU sees stale values in PV clock structures and
thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
clocksource is global and kvm_guest_time_update() only updates in on vCPU0
but this is not entirely correct: nothing blocks some other vCPU from
entering the guest before we finish the update on CPU0 and it can read
stale values from the page.

Call kvm_hv_setup_tsc_page() on all vCPUs. Normally, KVM_REQ_CLOCK_UPDATE
should be very rare so we may not care much about being wasteful.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/x86.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47e021bdcc94..882c509bfc86 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2748,8 +2748,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 				       offsetof(struct compat_vcpu_info, time));
 	if (vcpu->xen.vcpu_time_info_set)
 		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
-	if (v == kvm_get_vcpu(v->kvm, 0))
-		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+
+	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-15 14:37 [PATCH 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
  2021-03-15 14:37 ` [PATCH 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
  2021-03-15 14:37 ` [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
@ 2021-03-15 14:37 ` Vitaly Kuznetsov
  2021-03-15 15:15   ` Sean Christopherson
  2021-03-15 14:37 ` [PATCH 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment Vitaly Kuznetsov
  3 siblings, 1 reply; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
was updated from guest/host side or if we've failed to set it up (because
e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
need to retry.

Also, in a hypothetical situation when we are in 'always catchup' mode for
TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
returns false.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  9 +++++++++
 arch/x86/kvm/hyperv.c           | 23 +++++++++++++++++------
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9bc091ecaaeb..87448e9e6b28 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -884,12 +884,21 @@ struct kvm_hv_syndbg {
 	u64 options;
 };
 
+enum hv_tsc_page_status {
+	HV_TSC_PAGE_UNSET = 0,
+	HV_TSC_PAGE_GUEST_CHANGED,
+	HV_TSC_PAGE_HOST_CHANGED,
+	HV_TSC_PAGE_SET,
+	HV_TSC_PAGE_BROKEN,
+};
+
 /* Hyper-V emulation context */
 struct kvm_hv {
 	struct mutex hv_lock;
 	u64 hv_guest_os_id;
 	u64 hv_hypercall;
 	u64 hv_tsc_page;
+	enum hv_tsc_page_status hv_tsc_page_status;
 
 	/* Hyper-v based guest crash (NT kernel bugcheck) parameters */
 	u64 hv_crash_param[HV_X64_MSR_CRASH_PARAMS];
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index eefb85b86fe8..2a8d078b16cb 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1087,7 +1087,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
 	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);
 
-	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
+	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN)
 		return;
 
 	mutex_lock(&hv->hv_lock);
@@ -1101,7 +1101,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	 */
 	if (unlikely(kvm_read_guest(kvm, gfn_to_gpa(gfn),
 				    &tsc_seq, sizeof(tsc_seq))))
-		goto out_unlock;
+		goto out_err;
 
 	/*
 	 * While we're computing and writing the parameters, force the
@@ -1110,15 +1110,15 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	hv->tsc_ref.tsc_sequence = 0;
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
 			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
-		goto out_unlock;
+		goto out_err;
 
 	if (!compute_tsc_page_parameters(hv_clock, &hv->tsc_ref))
-		goto out_unlock;
+		goto out_err;
 
 	/* Ensure sequence is zero before writing the rest of the struct.  */
 	smp_wmb();
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref)))
-		goto out_unlock;
+		goto out_err;
 
 	/*
 	 * Now switch to the TSC page mechanism by writing the sequence.
@@ -1133,6 +1133,12 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	hv->tsc_ref.tsc_sequence = tsc_seq;
 	kvm_write_guest(kvm, gfn_to_gpa(gfn),
 			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
+
+	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
+	goto out_unlock;
+
+out_err:
+	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
 out_unlock:
 	mutex_unlock(&hv->hv_lock);
 }
@@ -1193,8 +1199,13 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 	}
 	case HV_X64_MSR_REFERENCE_TSC:
 		hv->hv_tsc_page = data;
-		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
+		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) {
+			if (!host)
+				hv->hv_tsc_page_status = HV_TSC_PAGE_GUEST_CHANGED;
+			else
+				hv->hv_tsc_page_status = HV_TSC_PAGE_HOST_CHANGED;
 			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
+		}
 		break;
 	case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
 		return kvm_hv_msr_set_crash_data(kvm,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment
  2021-03-15 14:37 [PATCH 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2021-03-15 14:37 ` [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
@ 2021-03-15 14:37 ` Vitaly Kuznetsov
  3 siblings, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 14:37 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

When guest opts for re-enlightenment notifications upon migration, it is
in its right to assume that TSC page values never change (as they're only
supposed to change upon migration and the host has to keep things as they
are before it receives confirmation from the guest). This is mostly true
until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will
trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by
calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift
apart (even slightly), the update causes TSC page values to change.

The issue at hand is that when Hyper-V is migrated, it uses stale (cached)
TSC page values to compute the difference between its own clocksource
(provided by KVM) and its guests' TSC pages to program synthetic timers
and in some cases, when TSC page is updated, this puts all stimer
expirations in the past. This, in its turn, causes an interrupt storm
and L2 guests not making much forward progress.

Note, KVM doesn't fully implement re-enlightenment notification. Basically,
the support for reenlightenment MSRs is just a stub and userspace is only
expected to expose the feature when TSC scaling on the expected destination
hosts is available. With TSC scaling, no real re-enlightenment is needed
as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it
likely makes little sense to fully implement re-enlightenment in KVM.

Prevent TSC page from being updated after migration. In case it's not the
guest who's initiating the change and when TSC page is already enabled,
just keep it as it is: TSC value is supposed to be preserved across
migration and TSC frequency can't change with re-enlightenment enabled.
The guest is doomed anyway if any of this is not true.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 2a8d078b16cb..5889de580e37 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1103,6 +1103,24 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 				    &tsc_seq, sizeof(tsc_seq))))
 		goto out_err;
 
+	/*
+	 * Don't touch TSC page values if the guest has opted for TSC emulation
+	 * after migration. KVM doesn't fully support reenlightenment
+	 * notifications and TSC access emulation and Hyper-V is known to expect
+	 * the values in TSC page to stay constant before TSC access emulation
+	 * is disabled from guest side (HV_X64_MSR_TSC_EMULATION_STATUS).
+	 * Userspace is expected to preserve TSC frequency and guest visible TSC
+	 * value across migration (and prevent it when TSC scaling is
+	 * unsupported).
+	 */
+	if ((hv->hv_tsc_page_status != HV_TSC_PAGE_GUEST_CHANGED) &&
+	    hv->hv_tsc_emulation_control && tsc_seq) {
+		if (kvm_read_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref)))
+			goto out_err;
+
+		goto out_unlock;
+	}
+
 	/*
 	 * While we're computing and writing the parameters, force the
 	 * guest to use the time reference count MSR.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-15 14:37 ` [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
@ 2021-03-15 15:15   ` Sean Christopherson
  2021-03-15 15:34     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2021-03-15 15:15 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On Mon, Mar 15, 2021, Vitaly Kuznetsov wrote:
> Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
> was updated from guest/host side or if we've failed to set it up (because
> e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
> need to retry.
> 
> Also, in a hypothetical situation when we are in 'always catchup' mode for
> TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
> setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
> returns false.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index eefb85b86fe8..2a8d078b16cb 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1087,7 +1087,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>  	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
>  	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);
>  
> -	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
> +	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN)
>  		return;
>  
>  	mutex_lock(&hv->hv_lock);

...

> @@ -1133,6 +1133,12 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>  	hv->tsc_ref.tsc_sequence = tsc_seq;
>  	kvm_write_guest(kvm, gfn_to_gpa(gfn),
>  			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
> +
> +	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
> +	goto out_unlock;
> +
> +out_err:
> +	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
>  out_unlock:
>  	mutex_unlock(&hv->hv_lock);
>  }
> @@ -1193,8 +1199,13 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
>  	}
>  	case HV_X64_MSR_REFERENCE_TSC:
>  		hv->hv_tsc_page = data;
> -		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
> +		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) {
> +			if (!host)
> +				hv->hv_tsc_page_status = HV_TSC_PAGE_GUEST_CHANGED;
> +			else
> +				hv->hv_tsc_page_status = HV_TSC_PAGE_HOST_CHANGED;

Writing the status without taking hv->hv_lock could cause the update to be lost,
e.g. if a different vCPU fails kvm_hv_setup_tsc_page() at the same time, its
write to set status to HV_TSC_PAGE_BROKEN would race with this write.

>  			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> +		}
>  		break;
>  	case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
>  		return kvm_hv_msr_set_crash_data(kvm,
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-15 15:15   ` Sean Christopherson
@ 2021-03-15 15:34     ` Vitaly Kuznetsov
  2021-03-16 12:24       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 15:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Mar 15, 2021, Vitaly Kuznetsov wrote:
>> Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
>> was updated from guest/host side or if we've failed to set it up (because
>> e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
>> need to retry.
>> 
>> Also, in a hypothetical situation when we are in 'always catchup' mode for
>> TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
>> setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
>> returns false.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index eefb85b86fe8..2a8d078b16cb 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1087,7 +1087,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>>  	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
>>  	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);
>>  
>> -	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
>> +	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN)
>>  		return;
>>  
>>  	mutex_lock(&hv->hv_lock);
>
> ...
>
>> @@ -1133,6 +1133,12 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>>  	hv->tsc_ref.tsc_sequence = tsc_seq;
>>  	kvm_write_guest(kvm, gfn_to_gpa(gfn),
>>  			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
>> +
>> +	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
>> +	goto out_unlock;
>> +
>> +out_err:
>> +	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
>>  out_unlock:
>>  	mutex_unlock(&hv->hv_lock);
>>  }
>> @@ -1193,8 +1199,13 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
>>  	}
>>  	case HV_X64_MSR_REFERENCE_TSC:
>>  		hv->hv_tsc_page = data;
>> -		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
>> +		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) {
>> +			if (!host)
>> +				hv->hv_tsc_page_status = HV_TSC_PAGE_GUEST_CHANGED;
>> +			else
>> +				hv->hv_tsc_page_status = HV_TSC_PAGE_HOST_CHANGED;
>
> Writing the status without taking hv->hv_lock could cause the update to be lost,
> e.g. if a different vCPU fails kvm_hv_setup_tsc_page() at the same time, its
> write to set status to HV_TSC_PAGE_BROKEN would race with this write.
>

Oh, right you are, the lock was somewhere in my brain :-) Will do in v2.

>>  			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
>> +		}
>>  		break;
>>  	case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
>>  		return kvm_hv_msr_set_crash_data(kvm,
>> -- 
>> 2.30.2
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-15 14:37 ` [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
@ 2021-03-15 15:45   ` Paolo Bonzini
  2021-03-15 15:55     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-03-15 15:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On 15/03/21 15:37, Vitaly Kuznetsov wrote:
> When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
> we need to make sure no vCPU sees stale values in PV clock structures and
> thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
> clocksource is global and kvm_guest_time_update() only updates in on vCPU0
> but this is not entirely correct: nothing blocks some other vCPU from
> entering the guest before we finish the update on CPU0 and it can read
> stale values from the page.
> 
> Call kvm_hv_setup_tsc_page() on all vCPUs. Normally, KVM_REQ_CLOCK_UPDATE
> should be very rare so we may not care much about being wasteful.

I think we should instead write 0 to the page in kvm_gen_update_masterclock.

Paolo

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/kvm/x86.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 47e021bdcc94..882c509bfc86 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2748,8 +2748,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>   				       offsetof(struct compat_vcpu_info, time));
>   	if (vcpu->xen.vcpu_time_info_set)
>   		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
> -	if (v == kvm_get_vcpu(v->kvm, 0))
> -		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
> +
> +	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
> +
>   	return 0;
>   }
>   
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-15 15:45   ` Paolo Bonzini
@ 2021-03-15 15:55     ` Vitaly Kuznetsov
  2021-03-15 16:23       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-15 15:55 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 15/03/21 15:37, Vitaly Kuznetsov wrote:
>> When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
>> we need to make sure no vCPU sees stale values in PV clock structures and
>> thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
>> clocksource is global and kvm_guest_time_update() only updates in on vCPU0
>> but this is not entirely correct: nothing blocks some other vCPU from
>> entering the guest before we finish the update on CPU0 and it can read
>> stale values from the page.
>> 
>> Call kvm_hv_setup_tsc_page() on all vCPUs. Normally, KVM_REQ_CLOCK_UPDATE
>> should be very rare so we may not care much about being wasteful.
>
> I think we should instead write 0 to the page in kvm_gen_update_masterclock.
>

We can do that but we will also need to invalidate
hv->tsc_ref.tsc_sequence to prevent MSR based clocksource
(HV_X64_MSR_TIME_REF_COUNT -> get_time_ref_counter()) from using stale
hv->tsc_ref.tsc_scale/tsc_offset values (in case we had them
calculated).

Also, we can't really disable TSC page for nested scenario when guest
opted for reenlightenment (PATCH4) but we're not going to update the
page anyway so there's not much different.

> Paolo
>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 47e021bdcc94..882c509bfc86 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2748,8 +2748,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>   				       offsetof(struct compat_vcpu_info, time));
>>   	if (vcpu->xen.vcpu_time_info_set)
>>   		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
>> -	if (v == kvm_get_vcpu(v->kvm, 0))
>> -		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
>> +
>> +	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
>> +
>>   	return 0;
>>   }
>>   
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-15 15:55     ` Vitaly Kuznetsov
@ 2021-03-15 16:23       ` Paolo Bonzini
  2021-03-16 12:29         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-03-15 16:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On 15/03/21 16:55, Vitaly Kuznetsov wrote:
>> I think we should instead write 0 to the page in kvm_gen_update_masterclock.
>
> We can do that but we will also need to invalidate
> hv->tsc_ref.tsc_sequence to prevent MSR based clocksource
> (HV_X64_MSR_TIME_REF_COUNT -> get_time_ref_counter()) from using stale
> hv->tsc_ref.tsc_scale/tsc_offset values (in case we had them
> calculated).

Yes, we're talking about the same thing (invalidating tsc_sequence is 
done by writing 0 to it).

Paolo

> Also, we can't really disable TSC page for nested scenario when guest
> opted for reenlightenment (PATCH4) but we're not going to update the
> page anyway so there's not much different.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-15 15:34     ` Vitaly Kuznetsov
@ 2021-03-16 12:24       ` Vitaly Kuznetsov
  2021-03-16 15:20         ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 12:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Sean Christopherson <seanjc@google.com> writes:
>
>> On Mon, Mar 15, 2021, Vitaly Kuznetsov wrote:
>>> Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
>>> was updated from guest/host side or if we've failed to set it up (because
>>> e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
>>> need to retry.
>>> 
>>> Also, in a hypothetical situation when we are in 'always catchup' mode for
>>> TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
>>> setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
>>> returns false.
>>> 
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>>> index eefb85b86fe8..2a8d078b16cb 100644
>>> --- a/arch/x86/kvm/hyperv.c
>>> +++ b/arch/x86/kvm/hyperv.c
>>> @@ -1087,7 +1087,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>>>  	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
>>>  	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);
>>>  
>>> -	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
>>> +	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN)
>>>  		return;
>>>  
>>>  	mutex_lock(&hv->hv_lock);
>>
>> ...
>>
>>> @@ -1133,6 +1133,12 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>>>  	hv->tsc_ref.tsc_sequence = tsc_seq;
>>>  	kvm_write_guest(kvm, gfn_to_gpa(gfn),
>>>  			&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence));
>>> +
>>> +	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
>>> +	goto out_unlock;
>>> +
>>> +out_err:
>>> +	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
>>>  out_unlock:
>>>  	mutex_unlock(&hv->hv_lock);
>>>  }
>>> @@ -1193,8 +1199,13 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
>>>  	}
>>>  	case HV_X64_MSR_REFERENCE_TSC:
>>>  		hv->hv_tsc_page = data;
>>> -		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
>>> +		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) {
>>> +			if (!host)
>>> +				hv->hv_tsc_page_status = HV_TSC_PAGE_GUEST_CHANGED;
>>> +			else
>>> +				hv->hv_tsc_page_status = HV_TSC_PAGE_HOST_CHANGED;
>>
>> Writing the status without taking hv->hv_lock could cause the update to be lost,
>> e.g. if a different vCPU fails kvm_hv_setup_tsc_page() at the same time, its
>> write to set status to HV_TSC_PAGE_BROKEN would race with this write.
>>
>
> Oh, right you are, the lock was somewhere in my brain :-) Will do in
> v2.

Actually no, kvm_hv_set_msr_pw() is only called from
kvm_hv_set_msr_common() with hv->hv_lock held so we're already
synchronized.

... and of course I figured that our by putting another
mutex_lock()/mutex_unlock() here and then wondering why everything hangs
:-)

>
>>>  			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
>>> +		}
>>>  		break;
>>>  	case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
>>>  		return kvm_hv_msr_set_crash_data(kvm,
>>> -- 
>>> 2.30.2
>>> 
>>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  2021-03-15 16:23       ` Paolo Bonzini
@ 2021-03-16 12:29         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-16 12:29 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 15/03/21 16:55, Vitaly Kuznetsov wrote:
>>> I think we should instead write 0 to the page in kvm_gen_update_masterclock.
>>
>> We can do that but we will also need to invalidate
>> hv->tsc_ref.tsc_sequence to prevent MSR based clocksource
>> (HV_X64_MSR_TIME_REF_COUNT -> get_time_ref_counter()) from using stale
>> hv->tsc_ref.tsc_scale/tsc_offset values (in case we had them
>> calculated).
>
> Yes, we're talking about the same thing (invalidating tsc_sequence is 
> done by writing 0 to it).
>

Yes, 'hv->tsc_ref' is a 'shadow TSC page' which almost always caches
what's in the 'real' one. One notable exception is that after migration
our cache is out of sync until the first successful
kvm_hv_setup_tsc_page() call.

What I was trying to say is that we not only need to write '0' to the
'real' TSC page but also invalidate our internal 'hv->tsc_ref'. Anyway,
I think we're in a violent agreement here, v2 is coming with this change
after I'm done testing it.

Thanks!

> Paolo
>
>> Also, we can't really disable TSC page for nested scenario when guest
>> opted for reenlightenment (PATCH4) but we're not going to update the
>> page anyway so there's not much different.
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status
  2021-03-16 12:24       ` Vitaly Kuznetsov
@ 2021-03-16 15:20         ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2021-03-16 15:20 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Marcelo Tosatti

On Tue, Mar 16, 2021, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > Sean Christopherson <seanjc@google.com> writes:
> >
> >> On Mon, Mar 15, 2021, Vitaly Kuznetsov wrote:
> >>> @@ -1193,8 +1199,13 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
> >>>  	}
> >>>  	case HV_X64_MSR_REFERENCE_TSC:
> >>>  		hv->hv_tsc_page = data;
> >>> -		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
> >>> +		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) {
> >>> +			if (!host)
> >>> +				hv->hv_tsc_page_status = HV_TSC_PAGE_GUEST_CHANGED;
> >>> +			else
> >>> +				hv->hv_tsc_page_status = HV_TSC_PAGE_HOST_CHANGED;
> >>
> >> Writing the status without taking hv->hv_lock could cause the update to be lost,
> >> e.g. if a different vCPU fails kvm_hv_setup_tsc_page() at the same time, its
> >> write to set status to HV_TSC_PAGE_BROKEN would race with this write.
> >>
> >
> > Oh, right you are, the lock was somewhere in my brain :-) Will do in
> > v2.
> 
> Actually no, kvm_hv_set_msr_pw() is only called from
> kvm_hv_set_msr_common() with hv->hv_lock held so we're already
> synchronized.
> 
> ... and of course I figured that our by putting another
> mutex_lock()/mutex_unlock() here and then wondering why everything hangs
> :-)

Doh, sorry :-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-16 15:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 14:37 [PATCH 0/4] KVM: x86: hyper-v: TSC page fixes Vitaly Kuznetsov
2021-03-15 14:37 ` [PATCH 1/4] KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS Vitaly Kuznetsov
2021-03-15 14:37 ` [PATCH 2/4] KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs Vitaly Kuznetsov
2021-03-15 15:45   ` Paolo Bonzini
2021-03-15 15:55     ` Vitaly Kuznetsov
2021-03-15 16:23       ` Paolo Bonzini
2021-03-16 12:29         ` Vitaly Kuznetsov
2021-03-15 14:37 ` [PATCH 3/4] KVM: x86: hyper-v: Track Hyper-V TSC page status Vitaly Kuznetsov
2021-03-15 15:15   ` Sean Christopherson
2021-03-15 15:34     ` Vitaly Kuznetsov
2021-03-16 12:24       ` Vitaly Kuznetsov
2021-03-16 15:20         ` Sean Christopherson
2021-03-15 14:37 ` [PATCH 4/4] KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).