linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] KVM: correctly restore the TSC value on nested migration
@ 2020-09-17 11:07 Maxim Levitsky
  2020-09-17 11:07 ` [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for " Maxim Levitsky
  0 siblings, 1 reply; 7+ messages in thread
From: Maxim Levitsky @ 2020-09-17 11:07 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Sean Christopherson, Borislav Petkov, Joerg Roedel,
	H. Peter Anvin, Paolo Bonzini, Wanpeng Li, Ingo Molnar,
	Thomas Gleixner, Vitaly Kuznetsov, Maxim Levitsky

This patch is a result of a long investigation I made to understand
why the nested migration more often than not makes the nested guest hang.
Sometimes the nested guest recovers and sometimes it hangs forever.

The root cause of this is that reading MSR_IA32_TSC while nested guest is
running returns its TSC value, that is (assuming no tsc scaling)
host tsc + L1 tsc offset + L2 tsc offset.

This is correct but it is a result of a nice curiosity of X86 VMX
(and apparently SVM too, according to my tests) implementation:

As a rule MSR reads done by the guest should either trap to host, or just
return host value, and therefore kvm_get_msr and friends, should basically
always return the L1 value of any msr.

Well, MSR_IA32_TSC is an exception. Intel's PRM states that when you disable
its interception, then in guest mode the host adds the TSC offset to
the read value.

I haven't found anything like that in AMD's PRM but according to the few
tests I made, it behaves the same.

However, there is no such exception when writing MSR_IA32_TSC, and this
poses a problem for nested migration.

When MSR_IA32_TSC is read, we read L2 value (smaller since L2 is started
after L1), and when we restore it after migration, the value is interpreted
as L1 value, thus resulting in huge TSC jump backward which the guest usually
really doesn't like, especially on AMD with APIC deadline timer, which
usually just doesn't fire afterward sending the guest into endless wait for it.

The proposed patch fixes this by making read of MSR_IA32_TSC depend on
'msr_info->host_initiated'

If guest reads the MSR, we add the TSC offset, but when host's qemu reads
the msr we skip that silly emulation of TSC offset, and return the real value
for the L1 guest which is host tsc + L1 offset.

This patch was tested on both SVM and VMX, and on both it fixes hangs.
On VMX since it uses VMX preemption timer for APIC deadline, the guest seems
to recover after a while without that patch.

To make sure that the nested migration happens I usually used
-overcommit cpu_pm=on but I reproduced this with just running an endless loop
in L2.

This is tested both with and without -invtsc,tsc-frequency=...

The migration was done by saving the migration stream to a file, and then
loading the qemu with '-incoming'

Maxim Levitsky (1):
  KVM: x86: fix MSR_IA32_TSC read for nested migration

 arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration
  2020-09-17 11:07 [PATCH 0/1] KVM: correctly restore the TSC value on nested migration Maxim Levitsky
@ 2020-09-17 11:07 ` Maxim Levitsky
  2020-09-17 16:11   ` Sean Christopherson
  2020-09-17 16:14   ` Sean Christopherson
  0 siblings, 2 replies; 7+ messages in thread
From: Maxim Levitsky @ 2020-09-17 11:07 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Sean Christopherson, Borislav Petkov, Joerg Roedel,
	H. Peter Anvin, Paolo Bonzini, Wanpeng Li, Ingo Molnar,
	Thomas Gleixner, Vitaly Kuznetsov, Maxim Levitsky

MSR reads/writes should always access the L1 state, since the (nested)
hypervisor should intercept all the msrs it wants to adjust, and these
that it doesn't should be read by the guest as if the host had read it.

However IA32_TSC is an exception.Even when not intercepted, guest still
reads the value + TSC offset.
The write however does not take any TSC offset in the account.

This is documented in Intel's PRM and seems also to happen on AMD as well.

This creates a problem when userspace wants to read the IA32_TSC value and then
write it. (e.g for migration)

In this case it reads L2 value but write is interpreted as an L1 value.
To fix this make the userspace initiated reads of IA32_TSC return L1 value
as well.

Huge thanks to Dave Gilbert for helping me understand this very confusing
semantic of MSR writes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 17f4995e80a7e..d10d5c6add359 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2025,6 +2025,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+static u64 kvm_read_l2_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
+{
+	return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
+}
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
@@ -3220,7 +3225,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
 		break;
 	case MSR_IA32_TSC:
-		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset;
+		/*
+		 * Intel PRM states that MSR_IA32_TSC read adds the TSC offset
+		 * even when not intercepted. AMD manual doesn't define this
+		 * but appears to behave the same
+		 *
+		 * However when userspace wants to read this MSR, return its
+		 * real L1 value so that its restore will be correct
+		 *
+		 */
+		if (msr_info->host_initiated)
+			msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc());
+		else
+			msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc());
 		break;
 	case MSR_MTRRcap:
 	case 0x200 ... 0x2ff:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration
  2020-09-17 11:07 ` [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for " Maxim Levitsky
@ 2020-09-17 16:11   ` Sean Christopherson
  2020-09-21  9:25     ` Maxim Levitsky
  2020-09-17 16:14   ` Sean Christopherson
  1 sibling, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2020-09-17 16:11 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, linux-kernel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Vitaly Kuznetsov

On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote:
> MSR reads/writes should always access the L1 state, since the (nested)
> hypervisor should intercept all the msrs it wants to adjust, and these
> that it doesn't should be read by the guest as if the host had read it.
> 
> However IA32_TSC is an exception.Even when not intercepted, guest still

Missing a space after the period.

> reads the value + TSC offset.
> The write however does not take any TSC offset in the account.

s/in the/into

> This is documented in Intel's PRM and seems also to happen on AMD as well.

Ideally we'd get confirmation from AMD that this is the correct behavior.

> This creates a problem when userspace wants to read the IA32_TSC value and then
> write it. (e.g for migration)
> 
> In this case it reads L2 value but write is interpreted as an L1 value.

It _may_ read the L2 value, e.g. it's not going to read the L2 value if L1
is active.

> To fix this make the userspace initiated reads of IA32_TSC return L1 value
> as well.
> 
> Huge thanks to Dave Gilbert for helping me understand this very confusing
> semantic of MSR writes.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 17f4995e80a7e..d10d5c6add359 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2025,6 +2025,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
>  }
>  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
>  
> +static u64 kvm_read_l2_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)

This is definitely not L2 specific.  I would vote to just omit the helper so
that we don't need to come up with a name that is correct across the board,
e.g. "raw" is also not quite correct.

An alternative would be to do:

	u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
						    vcpu->arch.tsc_offset;

	msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;

Which I kind of like because the behavioral difference is a bit more obvious.

> +{
> +	return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
> +}
> +
>  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  {
>  	vcpu->arch.l1_tsc_offset = offset;
> @@ -3220,7 +3225,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
>  		break;
>  	case MSR_IA32_TSC:
> -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset;
> +		/*
> +		 * Intel PRM states that MSR_IA32_TSC read adds the TSC offset
> +		 * even when not intercepted. AMD manual doesn't define this
> +		 * but appears to behave the same
> +		 *
> +		 * However when userspace wants to read this MSR, return its
> +		 * real L1 value so that its restore will be correct
> +		 *

Extra line is unnecessary.

> +		 */
> +		if (msr_info->host_initiated)
> +			msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc());
> +		else
> +			msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc());
>  		break;
>  	case MSR_MTRRcap:
>  	case 0x200 ... 0x2ff:
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration
  2020-09-17 11:07 ` [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for " Maxim Levitsky
  2020-09-17 16:11   ` Sean Christopherson
@ 2020-09-17 16:14   ` Sean Christopherson
  2020-09-21  9:26     ` Maxim Levitsky
  1 sibling, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2020-09-17 16:14 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, linux-kernel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Vitaly Kuznetsov

On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote:
> +		 * Intel PRM states that MSR_IA32_TSC read adds the TSC offset

One more nit, "Intel SDM" would be preferred as that's most commonly used in
KVM changelogs, and there are multiple PRM acronyms in Intel's dictionary
these days.

> +		 * even when not intercepted. AMD manual doesn't define this
> +		 * but appears to behave the same
> +		 *
> +		 * However when userspace wants to read this MSR, return its
> +		 * real L1 value so that its restore will be correct
> +		 *
> +		 */
> +		if (msr_info->host_initiated)
> +			msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc());
> +		else
> +			msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc());
>  		break;
>  	case MSR_MTRRcap:
>  	case 0x200 ... 0x2ff:
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration
  2020-09-17 16:11   ` Sean Christopherson
@ 2020-09-21  9:25     ` Maxim Levitsky
  2020-09-21  9:39       ` Maxim Levitsky
  0 siblings, 1 reply; 7+ messages in thread
From: Maxim Levitsky @ 2020-09-21  9:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Vitaly Kuznetsov

On Thu, 2020-09-17 at 09:11 -0700, Sean Christopherson wrote:
> On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote:
> > MSR reads/writes should always access the L1 state, since the (nested)
> > hypervisor should intercept all the msrs it wants to adjust, and these
> > that it doesn't should be read by the guest as if the host had read it.
> > 
> > However IA32_TSC is an exception.Even when not intercepted, guest still
> 
> Missing a space after the period.
Fixed
> 
> > reads the value + TSC offset.
> > The write however does not take any TSC offset in the account.
> 
> s/in the/into
Fixed.
> 
> > This is documented in Intel's PRM and seems also to happen on AMD as well.
> 
> Ideally we'd get confirmation from AMD that this is the correct behavior.
It would be great. This isn't a blocker for this patch however since I didn't
change the current emulation behavier which already assumes this.
Also we don't really trap TSC reads, so this code isn't really executed.

(I haven't checked what corner cases when we do this. It can happen in theory,
if MSR read is done from the emulator or something like that).

> 
> > This creates a problem when userspace wants to read the IA32_TSC value and then
> > write it. (e.g for migration)
> > 
> > In this case it reads L2 value but write is interpreted as an L1 value.
> 
> It _may_ read the L2 value, e.g. it's not going to read the L2 value if L1
> is active.

I didn't thought about this this way. I guess I always thought that L2 is,
L2 if L2 is running, otherwise L1, but now I understand what you mean,
and I agree.

> 
> > To fix this make the userspace initiated reads of IA32_TSC return L1 value
> > as well.
> > 
> > Huge thanks to Dave Gilbert for helping me understand this very confusing
> > semantic of MSR writes.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 17f4995e80a7e..d10d5c6add359 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2025,6 +2025,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
> >  
> > +static u64 kvm_read_l2_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
> 
> This is definitely not L2 specific.  I would vote to just omit the helper so
> that we don't need to come up with a name that is correct across the board,
> e.g. "raw" is also not quite correct.
Yes, now I see this.

> 
> An alternative would be to do:
> 
> 	u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> 						    vcpu->arch.tsc_offset;
> 
> 	msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> 
> Which I kind of like because the behavioral difference is a bit more obvious.
Yep did that. The onl minor downside is that I need a C scope in the switch block.
I can add kvm_read_tsc but I think that this is not worth it.

> 
> > +{
> > +	return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
> > +}
> > +
> >  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> >  {
> >  	vcpu->arch.l1_tsc_offset = offset;
> > @@ -3220,7 +3225,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
> >  		break;
> >  	case MSR_IA32_TSC:
> > -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset;
> > +		/*
> > +		 * Intel PRM states that MSR_IA32_TSC read adds the TSC offset
> > +		 * even when not intercepted. AMD manual doesn't define this
> > +		 * but appears to behave the same
> > +		 *
> > +		 * However when userspace wants to read this MSR, return its
> > +		 * real L1 value so that its restore will be correct
> > +		 *
> 
> Extra line is unnecessary.
This is a bit of my OCD :-) I don't mind to remove it.
> 
> > +		 */
> > +		if (msr_info->host_initiated)
> > +			msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc());
> > +		else
> > +			msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc());
> >  		break;
> >  	case MSR_MTRRcap:
> >  	case 0x200 ... 0x2ff:
> > -- 
> > 2.26.2
> > 

Thanks for the review, the V2 is on the way.
Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration
  2020-09-17 16:14   ` Sean Christopherson
@ 2020-09-21  9:26     ` Maxim Levitsky
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Levitsky @ 2020-09-21  9:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Vitaly Kuznetsov

On Thu, 2020-09-17 at 09:14 -0700, Sean Christopherson wrote:
> On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote:
> > +		 * Intel PRM states that MSR_IA32_TSC read adds the TSC offset
> 
> One more nit, "Intel SDM" would be preferred as that's most commonly used in
> KVM changelogs, and there are multiple PRM acronyms in Intel's dictionary
> these days.
Fixed.

Best regards,
	Maxim Levitsky

> 
> > +		 * even when not intercepted. AMD manual doesn't define this
> > +		 * but appears to behave the same
> > +		 *
> > +		 * However when userspace wants to read this MSR, return its
> > +		 * real L1 value so that its restore will be correct
> > +		 *
> > +		 */
> > +		if (msr_info->host_initiated)
> > +			msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc());
> > +		else
> > +			msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc());
> >  		break;
> >  	case MSR_MTRRcap:
> >  	case 0x200 ... 0x2ff:
> > -- 
> > 2.26.2
> > 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration
  2020-09-21  9:25     ` Maxim Levitsky
@ 2020-09-21  9:39       ` Maxim Levitsky
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Levitsky @ 2020-09-21  9:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Vitaly Kuznetsov

On Mon, 2020-09-21 at 12:25 +0300, Maxim Levitsky wrote:
> On Thu, 2020-09-17 at 09:11 -0700, Sean Christopherson wrote:
> > On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote:
> > > MSR reads/writes should always access the L1 state, since the (nested)
> > > hypervisor should intercept all the msrs it wants to adjust, and these
> > > that it doesn't should be read by the guest as if the host had read it.
> > > 
> > > However IA32_TSC is an exception.Even when not intercepted, guest still
> > 
> > Missing a space after the period.
> Fixed
> > > reads the value + TSC offset.
> > > The write however does not take any TSC offset in the account.
> > 
> > s/in the/into
> Fixed.
> > > This is documented in Intel's PRM and seems also to happen on AMD as well.
> > 
> > Ideally we'd get confirmation from AMD that this is the correct behavior.
> It would be great. This isn't a blocker for this patch however since I didn't
> change the current emulation behavier which already assumes this.
> Also we don't really trap TSC reads, so this code isn't really executed.

I did now find out an explict mention that AMD's TSC scaling affects the MSR reads,
and while this doesn't expictily mention the offset, this does give more ground
to the assumption that the offset is added as well.

"Writing to the TSC Ratio MSR allows the hypervisor to control the guest's view of the Time Stamp
Counter. The contents of TSC Ratio MSR sets the value of the TSCRatio. This constant scales the
timestamp value returned when the TSC is read by a guest via the RDTSC or RDTSCP instructions or
when the TSC, MPERF, or MPerfReadOnly MSRs are read via the RDMSR instruction by a guest
running under virtualization."

Best regards,
	Maxim Levitsky

> 
> (I haven't checked what corner cases when we do this. It can happen in theory,
> if MSR read is done from the emulator or something like that).
> 
> > > This creates a problem when userspace wants to read the IA32_TSC value and then
> > > write it. (e.g for migration)
> > > 
> > > In this case it reads L2 value but write is interpreted as an L1 value.
> > 
> > It _may_ read the L2 value, e.g. it's not going to read the L2 value if L1
> > is active.
> 
> I didn't thought about this this way. I guess I always thought that L2 is,
> L2 if L2 is running, otherwise L1, but now I understand what you mean,
> and I agree.
> 
> > > To fix this make the userspace initiated reads of IA32_TSC return L1 value
> > > as well.
> > > 
> > > Huge thanks to Dave Gilbert for helping me understand this very confusing
> > > semantic of MSR writes.
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
> > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 17f4995e80a7e..d10d5c6add359 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2025,6 +2025,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
> > >  
> > > +static u64 kvm_read_l2_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
> > 
> > This is definitely not L2 specific.  I would vote to just omit the helper so
> > that we don't need to come up with a name that is correct across the board,
> > e.g. "raw" is also not quite correct.
> Yes, now I see this.
> 
> > An alternative would be to do:
> > 
> > 	u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset :
> > 						    vcpu->arch.tsc_offset;
> > 
> > 	msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset;
> > 
> > Which I kind of like because the behavioral difference is a bit more obvious.
> Yep did that. The onl minor downside is that I need a C scope in the switch block.
> I can add kvm_read_tsc but I think that this is not worth it.
> 
> > > +{
> > > +	return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc);
> > > +}
> > > +
> > >  static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > >  {
> > >  	vcpu->arch.l1_tsc_offset = offset;
> > > @@ -3220,7 +3225,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >  		msr_info->data = vcpu->arch.msr_ia32_power_ctl;
> > >  		break;
> > >  	case MSR_IA32_TSC:
> > > -		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset;
> > > +		/*
> > > +		 * Intel PRM states that MSR_IA32_TSC read adds the TSC offset
> > > +		 * even when not intercepted. AMD manual doesn't define this
> > > +		 * but appears to behave the same
> > > +		 *
> > > +		 * However when userspace wants to read this MSR, return its
> > > +		 * real L1 value so that its restore will be correct
> > > +		 *
> > 
> > Extra line is unnecessary.
> This is a bit of my OCD :-) I don't mind to remove it.
> > > +		 */
> > > +		if (msr_info->host_initiated)
> > > +			msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc());
> > > +		else
> > > +			msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc());
> > >  		break;
> > >  	case MSR_MTRRcap:
> > >  	case 0x200 ... 0x2ff:
> > > -- 
> > > 2.26.2
> > > 
> 
> Thanks for the review, the V2 is on the way.
> Best regards,
> 	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-21  9:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-17 11:07 [PATCH 0/1] KVM: correctly restore the TSC value on nested migration Maxim Levitsky
2020-09-17 11:07 ` [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for " Maxim Levitsky
2020-09-17 16:11   ` Sean Christopherson
2020-09-21  9:25     ` Maxim Levitsky
2020-09-21  9:39       ` Maxim Levitsky
2020-09-17 16:14   ` Sean Christopherson
2020-09-21  9:26     ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).