kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oliver Upton <oupton@google.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Maxim Levitsky <mlevitsk@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	kvm list <kvm@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	Borislav Petkov <bp@alien8.de>, Shuah Khan <shuah@kernel.org>,
	Andrew Jones <drjones@redhat.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE
Date: Thu, 10 Dec 2020 11:59:44 -0600	[thread overview]
Message-ID: <CAOQ_QshW0UvwSS3TUCK5PxkLQhHTqDNXNeMxwVDyf+DXc23fXQ@mail.gmail.com> (raw)
In-Reply-To: <E4F263BE-6CAA-4152-8818-187D34D8D0FD@amacapital.net>

On Thu, Dec 10, 2020 at 9:16 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
> > On Dec 10, 2020, at 6:52 AM, Maxim Levitsky <mlevitsk@redhat.com> wrote:
> >
> > On Thu, 2020-12-10 at 12:48 +0100, Paolo Bonzini wrote:
> >>> On 08/12/20 22:20, Thomas Gleixner wrote:
> >>> So now life migration comes a long time after timekeeping had set the
> >>> limits and just because it's virt it expects that everything works and it
> >>> just can ignore these limits.
> >>>
> >>> TBH. That's not any different than SMM or hard/firmware taking the
> >>> machine out for lunch. It's exactly the same: It's broken.
> >>
> >> I agree.  If *live* migration stops the VM for 200 seconds, it's broken.
> >>
> >> Sure, there's the case of snapshotting the VM over the weekend.  My
> >> favorite solution would be to just put it in S3 before doing that.  *Do
> >> what bare metal does* and you can't go that wrong.
> >
> > Note though that qemu has a couple of issues with s3, and it is disabled
> > by default in libvirt.
> > I would be very happy to work on improving this if there is a need for that.
>
> There’s also the case where someone has a VM running on a laptop and someone closes the lid. The host QEMU might not have a chance to convince the guest to enter S3.
>
> >
> >
> >>
> >> In general it's userspace policy whether to keep the TSC value the same
> >> across live migration.  There's pros and cons to both approaches, so KVM
> >> should provide the functionality to keep the TSC running (which the
> >> guest will see as a very long, but not extreme SMI), and this is what
> >> this series does.  Maxim will change it to operate per-VM.  Thanks
> >> Thomas, Oliver and everyone else for the input.

So, to be clear, this per-VM ioctl will work something like the following:

static u64 kvm_read_tsc_base(struct kvm *kvm, u64 host_tsc)
{
        return kvm_scale_tsc(kvm, host_tsc) + kvm->host_tsc_offset;
}

case KVM_GET_TSC_BASE:
        struct kvm_tsc_base base = {
                .flags = KVM_TSC_BASE_TIMESTAMP_VALID;
        };
        u64 host_tsc;

        kvm_get_walltime(&base.nsec, &host_tsc);
        base.tsc = kvm_read_tsc_base(kvm, host_tsc);

        copy_to_user(...);

[...]

case KVM_SET_TSC_BASE:
        struct kvm_tsc_base base;
        u64 host_tsc, nsec;
        s64 delta = 0;

        copy_from_user(...);

        kvm_get_walltime(&nsec, &host_tsc);
        delta += base.tsc - kvm_read_tsc_base(kvm, host_tsc);

        if (base.flags & KVM_TSC_BASE_TIMESTAMP_VALID) {
                u64 delta_nsec = nsec - base.nsec;

                if (delta_nsec > 0)
                        delta += nsec_to_cycles(kvm, diff);
                else
                        delta -= nsec_to_cycles(kvm, -diff);
        }

        kvm->host_tsc_offset += delta;
        /* plumb host_tsc_offset through to each vcpu */

However, I don't believe we can assume the guest's TSCs to be synchronized,
even if sane guests will never touch them. In this case, I think a per-vCPU
ioctl is still warranted, allowing userspace to get at the guest CPU adjust
component of Thomas' equation below (paraphrased):

        TSC guest CPU = host tsc base + guest base offset + guest CPU adjust

Alternatively, a write from userspace to the guest's IA32_TSC_ADJUST with
KVM_X86_QUIRK_TSC_HOST_ACCESS could have the same effect, but that seems to be
problematic for a couple reasons. First, depending on the guest's CPUID the
TSC_ADJUST MSR may not even be available, meaning that the guest could've used
IA32_TSC to adjust the TSC (eww). Second, userspace replaying writes to IA32_TSC
(in the case IA32_TSC_ADJUST doesn't exist for the guest) seems _very_
unlikely to work given all the magic handling that KVM does for
writes to it.

Is this roughly where we are or have I entirely missed the mark? :-)

--
Thanks,
Oliver

> >
> > I agree with that.
> >
> > I still think though that we should have a discussion on feasibility
> > of making the kernel time code deal with large *forward* tsc jumps
> > without crashing.
> >
> > If that is indeed hard to do, or will cause performance issues,
> > then I agree that we might indeed inform the guest of time jumps instead.
> >
>
> Tglx, even without fancy shared host/guest timekeeping, count the guest kernel manage to update its timekeeping if the host sent the guest an interrupt or NMI on all CPUs synchronously on resume?
>
> Alternatively, if we had the explicit “max TSC value that makes sense right now” in the timekeeping data, the guest would reliably notice the large jump and could at least do something intelligent about it instead of overflowing its internal calculation.

  reply	other threads:[~2020-12-10 18:02 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 17:11 [PATCH v2 0/3] RFC: Precise TSC migration Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE Maxim Levitsky
2020-12-06 16:19   ` Thomas Gleixner
2020-12-07 12:16     ` Maxim Levitsky
2020-12-07 13:16       ` Vitaly Kuznetsov
2020-12-07 17:41         ` Thomas Gleixner
2020-12-08  9:48           ` Peter Zijlstra
2020-12-10 11:42           ` Paolo Bonzini
2020-12-10 12:14             ` Peter Zijlstra
2020-12-10 12:22               ` Paolo Bonzini
2020-12-10 13:01                 ` Peter Zijlstra
2020-12-10 20:20                   ` Thomas Gleixner
2020-12-07 16:38       ` Thomas Gleixner
2020-12-07 16:53         ` Andy Lutomirski
2020-12-07 17:00           ` Maxim Levitsky
2020-12-07 18:04             ` Andy Lutomirski
2020-12-07 23:11               ` Marcelo Tosatti
2020-12-08 17:43                 ` Andy Lutomirski
2020-12-08 19:24                   ` Thomas Gleixner
2020-12-08 20:32                     ` Andy Lutomirski
2020-12-09  0:19                       ` Thomas Gleixner
2020-12-09  4:08                         ` Andy Lutomirski
2020-12-09 10:14                           ` Thomas Gleixner
2020-12-10 23:42                             ` Andy Lutomirski
2020-12-08 11:24               ` Maxim Levitsky
2020-12-08  9:35         ` Peter Zijlstra
2020-12-07 23:34     ` Marcelo Tosatti
2020-12-07 17:29   ` Oliver Upton
2020-12-08 11:13     ` Maxim Levitsky
2020-12-08 15:57       ` Oliver Upton
2020-12-08 15:58         ` Oliver Upton
2020-12-08 17:10           ` Maxim Levitsky
2020-12-08 16:40       ` Thomas Gleixner
2020-12-08 17:08         ` Maxim Levitsky
2020-12-10 11:48           ` Paolo Bonzini
2020-12-10 14:25             ` Maxim Levitsky
2020-12-07 23:29   ` Marcelo Tosatti
2020-12-08 14:50     ` Maxim Levitsky
2020-12-08 16:02       ` Thomas Gleixner
2020-12-08 16:25         ` Maxim Levitsky
2020-12-08 17:33           ` Andy Lutomirski
2020-12-08 21:25             ` Thomas Gleixner
2020-12-08 18:12           ` Marcelo Tosatti
2020-12-08 21:35             ` Thomas Gleixner
2020-12-08 21:20           ` Thomas Gleixner
2020-12-10 11:48             ` Paolo Bonzini
2020-12-10 14:52               ` Maxim Levitsky
2020-12-10 15:16                 ` Andy Lutomirski
2020-12-10 17:59                   ` Oliver Upton [this message]
2020-12-10 18:05                     ` Paolo Bonzini
2020-12-10 18:13                       ` Oliver Upton
2020-12-10 21:25                   ` Thomas Gleixner
2020-12-10 22:01                     ` Andy Lutomirski
2020-12-10 22:28                       ` Thomas Gleixner
2020-12-10 23:19                         ` Andy Lutomirski
2020-12-11  0:03                           ` Thomas Gleixner
2020-12-08 18:11         ` Marcelo Tosatti
2020-12-08 21:33           ` Thomas Gleixner
2020-12-09 16:34             ` Marcelo Tosatti
2020-12-09 20:58               ` Thomas Gleixner
2020-12-10 15:26                 ` Marcelo Tosatti
2020-12-10 21:48                   ` Thomas Gleixner
2020-12-11  0:27                     ` Marcelo Tosatti
2020-12-11 13:30                       ` Thomas Gleixner
2020-12-11 14:18                         ` Marcelo Tosatti
2020-12-11 21:04                           ` Thomas Gleixner
2020-12-11 21:59                             ` Paolo Bonzini
2020-12-12 13:03                               ` Thomas Gleixner
2020-12-15 10:59                               ` Marcelo Tosatti
2020-12-15 16:55                                 ` Andy Lutomirski
2020-12-15 22:34                                 ` Thomas Gleixner
2020-12-11 13:37                       ` Paolo Bonzini
2020-12-08 17:35       ` Marcelo Tosatti
2020-12-03 17:11 ` [PATCH v2 2/3] KVM: x86: introduce KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 3/3] kvm/selftests: update tsc_msrs_test to cover KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-07 23:16 ` [PATCH v2 0/3] RFC: Precise TSC migration Marcelo Tosatti
2020-12-10 11:48 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ_QshW0UvwSS3TUCK5PxkLQhHTqDNXNeMxwVDyf+DXc23fXQ@mail.gmail.com \
    --to=oupton@google.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=drjones@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).