linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Oliver Upton <oupton@google.com>
Cc: kvm list <kvm@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	Borislav Petkov <bp@alien8.de>, Shuah Khan <shuah@kernel.org>,
	Andrew Jones <drjones@redhat.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE
Date: Thu, 10 Dec 2020 16:25:57 +0200	[thread overview]
Message-ID: <7846c5452644a3c029d27361759cb82c2b8d4f2e.camel@redhat.com> (raw)
In-Reply-To: <7c25e8c0-a7d4-8906-ae47-20714e6699fe@redhat.com>

On Thu, 2020-12-10 at 12:48 +0100, Paolo Bonzini wrote:
> On 08/12/20 18:08, Maxim Levitsky wrote:
> > > Even if you support TSCADJUST and let the guest write to it does not
> > > change the per guest offset at all. TSCADJUST is per [v]CPU and adds on
> > > top:
> > > 
> > >      tscvcpu = tsc_host + guest_offset + TSC_ADJUST
> > > 
> > > Scaling is just orthogonal and does not change any of this.
> > 
> > I agree with this, and I think that this is what we will end up doing.
> > Paulo, what do you think about this?
> 
> Yes, you can have a VM ioctl that saves/restores cur_tsc_nsec and 
> cur_tsc_write.  The restore side would loop on all vcpus.

Why would the restore need to run on all VCPUs though?

The picture that I have in mind is that we we will restore the 
cur_tsc_nsec/cur_tsc_write pair once per VM, and then restore the 
TSC_ADJUST value on all vCPUs as if the guest wrote it (with the quirk disabled), 
which will restore all the horrible things that the guest did to its TSC.

Since TSC adjust is enabled by default, if the user disables it in the CPUID, 
we might as well just disable the whole thing, 
make TSC readonly or something similar.

This way we don't need to worry about TSC writes as 
those will be reflected in the TSC_ADJUST.

> 
> However, it is not so easy: 1) it would have to be usable only if 
> KVM_X86_QUIRK_TSC_HOST_ACCESS is false, 2) it would fail if 
> kvm->arch.nr_vcpus_matched_tsc == kvm->online_vcpus (which basically 
> means that userspace didn't mess up the TSC configuration).  If not, it 
> would return -EINVAL.

Note though that we don't track the guest tsc/tsc_adjust writes anymore
via the tsc sync code, and with the quirk disabled we don't track them
even for the host writes, thus the (2) condition will always be true
(the only call left to kvm_synchronize_tsc is in the kvm_arch_vcpu_postcreate)

However I indeed see no reason to allow new per VM API when the masterclock is
disabled, and therefore using cur_tsc_nsec/cur_tsc_write is reasonable.

The cur_tsc_nsec should IMHO be converted to absolute time (CLOCK_REALTIME
or similiar) or should the conversion be done in the userspace? 
I don't know yet if I can convert between different POSIX clocks
in race/error free manner. (e.g get the offset between them).


> 
> Also, while at it let's burn and pour salt on the support for 
> KVM_SET_TSC_KHZ unless TSC scaling is supported, together with 
> vcpu->tsc_catchup and all the "tolerance" crap that is in 
> kvm_set_tsc_khz.  And initialize vcpu->arch.virtual_tsc_khz to 
> kvm->arch.last_tsc_khz before calling kvm_synchronize_tsc.

The tsc_catchup is enabled when host TSC is unstable, so that guest
TSC at least roughtly follows real time (which host kernel gets
by other means).

We push the guest tsc forward roughtly each time VM entry from userspace
happens:

(On each kvm_arch_vcpu_load, we raise KVM_REQ_GLOBAL_CLOCK_UPDATE
request which (with small delay) makes all vcpus go through 
kvm_guest_time_update which pushes the guest tsc forward)

However we also have the 'tsc_always_catchup' mode which is indeed 
something I would like to remove.

It is a poor man simulation of the TSC scaling which is enabled 
when the host doesn't support TSC scaling, but we were asked 
to run at frequency faster than host TSC frequency is.

This way guest tsc runs slower than it should, but on each VM exit,
we punt the guest tsc offset forward so on average the guest tsc
doesn't lag behind.

Unless backward compatibility is an issue, I have no objections
to remove that code.

The tolerance thing might be needed. On many systems 
(including mine new 3970X), the hardware doesn't have means to report 
the unscaled host TSC frequency, thus the kernel has to 
measure it, and since the measurement is not 100% accurate, a slightly
different value is used every time the host boots.

Thus without a small tolerance, we will always have to use TSC scaling,
while migrating even between two identical systems.
I don't know if this is an issue.


Best regards,
	Maxim Levitsky


> 
> Paolo
> 



  reply	other threads:[~2020-12-10 20:19 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 17:11 [PATCH v2 0/3] RFC: Precise TSC migration Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE Maxim Levitsky
2020-12-06 16:19   ` Thomas Gleixner
2020-12-07 12:16     ` Maxim Levitsky
2020-12-07 13:16       ` Vitaly Kuznetsov
2020-12-07 17:41         ` Thomas Gleixner
2020-12-08  9:48           ` Peter Zijlstra
2020-12-10 11:42           ` Paolo Bonzini
2020-12-10 12:14             ` Peter Zijlstra
2020-12-10 12:22               ` Paolo Bonzini
2020-12-10 13:01                 ` Peter Zijlstra
2020-12-10 20:20                   ` Thomas Gleixner
2020-12-07 16:38       ` Thomas Gleixner
2020-12-07 16:53         ` Andy Lutomirski
2020-12-07 17:00           ` Maxim Levitsky
2020-12-07 18:04             ` Andy Lutomirski
2020-12-07 23:11               ` Marcelo Tosatti
2020-12-08 17:43                 ` Andy Lutomirski
2020-12-08 19:24                   ` Thomas Gleixner
2020-12-08 20:32                     ` Andy Lutomirski
2020-12-09  0:19                       ` Thomas Gleixner
2020-12-09  4:08                         ` Andy Lutomirski
2020-12-09 10:14                           ` Thomas Gleixner
2020-12-10 23:42                             ` Andy Lutomirski
2020-12-08 11:24               ` Maxim Levitsky
2020-12-08  9:35         ` Peter Zijlstra
2020-12-07 23:34     ` Marcelo Tosatti
2020-12-07 17:29   ` Oliver Upton
2020-12-08 11:13     ` Maxim Levitsky
2020-12-08 15:57       ` Oliver Upton
2020-12-08 15:58         ` Oliver Upton
2020-12-08 17:10           ` Maxim Levitsky
2020-12-08 16:40       ` Thomas Gleixner
2020-12-08 17:08         ` Maxim Levitsky
2020-12-10 11:48           ` Paolo Bonzini
2020-12-10 14:25             ` Maxim Levitsky [this message]
2020-12-07 23:29   ` Marcelo Tosatti
2020-12-08 14:50     ` Maxim Levitsky
2020-12-08 16:02       ` Thomas Gleixner
2020-12-08 16:25         ` Maxim Levitsky
2020-12-08 17:33           ` Andy Lutomirski
2020-12-08 21:25             ` Thomas Gleixner
2020-12-08 18:12           ` Marcelo Tosatti
2020-12-08 21:35             ` Thomas Gleixner
2020-12-08 21:20           ` Thomas Gleixner
2020-12-10 11:48             ` Paolo Bonzini
2020-12-10 14:52               ` Maxim Levitsky
2020-12-10 15:16                 ` Andy Lutomirski
2020-12-10 17:59                   ` Oliver Upton
2020-12-10 18:05                     ` Paolo Bonzini
2020-12-10 18:13                       ` Oliver Upton
2020-12-10 21:25                   ` Thomas Gleixner
2020-12-10 22:01                     ` Andy Lutomirski
2020-12-10 22:28                       ` Thomas Gleixner
2020-12-10 23:19                         ` Andy Lutomirski
2020-12-11  0:03                           ` Thomas Gleixner
2020-12-08 18:11         ` Marcelo Tosatti
2020-12-08 21:33           ` Thomas Gleixner
2020-12-09 16:34             ` Marcelo Tosatti
2020-12-09 20:58               ` Thomas Gleixner
2020-12-10 15:26                 ` Marcelo Tosatti
2020-12-10 21:48                   ` Thomas Gleixner
2020-12-11  0:27                     ` Marcelo Tosatti
2020-12-11 13:30                       ` Thomas Gleixner
2020-12-11 14:18                         ` Marcelo Tosatti
2020-12-11 21:04                           ` Thomas Gleixner
2020-12-11 21:59                             ` Paolo Bonzini
2020-12-12 13:03                               ` Thomas Gleixner
2020-12-15 10:59                               ` Marcelo Tosatti
2020-12-15 16:55                                 ` Andy Lutomirski
2020-12-15 22:34                                 ` Thomas Gleixner
2020-12-11 13:37                       ` Paolo Bonzini
2020-12-08 17:35       ` Marcelo Tosatti
2020-12-03 17:11 ` [PATCH v2 2/3] KVM: x86: introduce KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 3/3] kvm/selftests: update tsc_msrs_test to cover KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-07 23:16 ` [PATCH v2 0/3] RFC: Precise TSC migration Marcelo Tosatti
2020-12-10 11:48 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7846c5452644a3c029d27361759cb82c2b8d4f2e.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=drjones@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).