From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: kvmclock doesn't work, help?
Date: Thu, 10 Dec 2015 19:33:47 -0200
Message-ID: <20151210213347.GB4836@amt.cnet>
References: <CALCETrVZwDddGcW8axAb4PP+YZyfz5TGR9xYwZXv3d_aghLBtA@mail.gmail.com>
 <56689A2B.6090500@redhat.com>
 <CALCETrXb2pM86XK_BoXk+LVpmNVF0a59_brDvD338DOJZQH+Dg@mail.gmail.com>
 <5668A76A.7050707@redhat.com>
 <CALCETrU3to=8SNU+0Hr0MNGqs4GxA8j=0KBKWcy-Schu2PoaFw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm list <kvm@vger.kernel.org>,
	Radim Krcmar <rkrcmar@redhat.com>, X86 ML <x86@kernel.org>,
	Alexander Graf <agraf@suse.de>
To: Andy Lutomirski <luto@amacapital.net>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:57222 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753073AbbLKVTm (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 11 Dec 2015 16:19:42 -0500
Content-Disposition: inline
In-Reply-To: <CALCETrU3to=8SNU+0Hr0MNGqs4GxA8j=0KBKWcy-Schu2PoaFw@mail.gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Wed, Dec 09, 2015 at 02:27:36PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> >
> > On 09/12/2015 22:49, Andy Lutomirski wrote:
> >> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>>
> >>>
> >>> On 09/12/2015 22:10, Andy Lutomirski wrote:
> >>>> Can we please stop making kvmclock more complex?  It's a beast right
> >>>> now, and not in a good way.  It's far too tangled with the vclock
> >>>> machinery on both the host and guest sides, the pvclock stuff is not
> >>>> well thought out (even in principle in an ABI sense), and it's never
> >>>> been clear to my what problem exactly the kvmclock stuff is supposed
> >>>> to solve.
> >>>
> >>> It's supposed to solve the problem that:
> >>>
> >>> - not all hosts have a working TSC
> >>
> >> Fine, but we don't need any vdso integration for that.
> >
> > Well, you still want a fast time source.  That was a given. :)
> 
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.
> 
> >
> >>> - even if they all do, virtual machines can be migrated (or
> >>> saved/restored) to a host with a different TSC frequency
> >>>
> >>> - any MMIO- or PIO-based mechanism to access the current time is orders
> >>> of magnitude slower than the TSC and less precise too.
> >>
> >> Yup.  But TSC by itself gets that benefit, too.
> >
> > Yes, the problem is if you want to solve all three of them.  The first
> > two are solved by the ACPI PM timer with a decent resolution (70
> > ns---much faster anyway than an I/O port access).  The third is solved
> > by TSC.  To solve all three, you need kvmclock.
> 
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?
> 
> >
> >>>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> >>>> start over.  A correctly functioning KVM guest using TSC (i.e.
> >>>> ignoring kvmclock entirely) seems to work rather more reliably and
> >>>> considerably faster than a kvmclock guest.
> >>>
> >>> If all your hosts have a working TSC and you don't do migration or
> >>> save/restore, that's a valid configuration.  It's not a good default,
> >>> however.
> >>
> >> Er?
> >>
> >> kvmclock is still really quite slow and buggy.
> >
> > Unless it takes 3-4000 clock cycles for a gettimeofday, which it
> > shouldn't even with vdso disabled, it's definitely not slower than PIO.
> >
> >> And the patch I identified is definitely a problem here:
> >>
> >> [  136.131241] KVM: disabling fast timing permanently due to inability
> >> to recover from suspend
> >>
> >> I got that on the host with this whitespace-damaged patch:
> >>
> >>         if (backwards_tsc) {
> >>                 u64 delta_cyc = max_tsc - local_tsc;
> >> +               if (!backwards_tsc_observed)
> >> +                       pr_warn("KVM: disabling fast timing
> >> permanently due to inability to recover from suspend\n");
> >>
> >> when I suspended and resumed.
> >>
> >> Can anyone explain what problem
> >> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
> >> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
> >> normal TSC logic handle that case right?  After all, all vcpus should
> >> be paused when we resume from suspend.  At worst, we should just need
> >> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
> >> shouldn't we do that regardless of which way the TSC jumped on
> >> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
> >> likely to change except on the very small handful of CPUs (if any)
> >> that keep the TSC running in S3 and hibernate.
> >
> > I don't recall the details of that patch, so Marcelo will have to answer
> > this, or Alex too since he chimed in the original thread.  At least it
> > should be made conditional on the existence of a VM at suspend time (and
> > the master clock stuff should be made per VM, as I suggested at
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).
> >
> > It would indeed be great if the master clock could be dropped.  But I'm
> > definitely missing some of the subtle details. :(
> 
> Me, too.
> 
> Anyway, see the attached untested patch.  Marcelo?
> 
> --Andy

Read the last email, about the problem.