From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: kvmclock doesn't work, help? Date: Thu, 17 Dec 2015 17:08:51 -0200 Message-ID: <20151217190850.GA13981@amt.cnet> References: <20151210213212.GA4836@amt.cnet> <566EC7AF.3090508@redhat.com> <20151214220027.GA24973@amt.cnet> <566FD25C.5040806@redhat.com> <20151216215731.GA9950@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Paolo Bonzini , kvm list , Radim Krcmar , X86 ML To: Andy Lutomirski Return-path: Received: from mx1.redhat.com ([209.132.183.28]:46616 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752424AbbLRAdm (ORCPT ); Thu, 17 Dec 2015 19:33:42 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote: > On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti wrote: > > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote: > >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski wrote: > >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini wrote: > >> >> > >> >> > >> >> On 14/12/2015 23:31, Andy Lutomirski wrote: > >> >>> > RAW TSC NTP corrected TSC > >> >>> > t0 10 10 > >> >>> > t1 20 19.99 > >> >>> > t2 30 29.98 > >> >>> > t3 40 39.97 > >> >>> > t4 50 49.96 (1) > >> >>> > > >> >>> > ... > >> >>> > > >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC, > >> >>> > you can see what will happen. > >> >>> > >> >>> Sure, but why would you ever switch from one to the other? > >> >> > >> >> The guest uses the raw TSC and systemtime = 0 until suspend. After > >> >> resume, the TSC certainly increases at the same rate as before, but the > >> >> raw TSC restarted counting from 0 and systemtime has increased slower > >> >> than the guest kvmclock. > >> > > >> > Wait, are we talking about the host's NTP or the guest's NTP? > >> > > >> > If it's the host's, then wouldn't systemtime be reset after resume to > >> > the NTP corrected value? If so, the guest wouldn't see time go > >> > backwards. > >> > > >> > If it's the guest's, then the guest's NTP correction is applied on top > >> > of kvmclock, and this shouldn't matter. > >> > > >> > I still feel like I'm missing something very basic here. > >> > > >> > >> OK, I think I get it. > >> > >> Marcelo, I thought that kvmclock was supposed to propagate the host's > >> correction to the guest. If it did, indeed, propagate the correction > >> then, after resume, the host's new system_time would match the guest's > >> idea of it (after accounting for the guest's long nap), and I don't > >> think there would be a problem. > >> That being said, I can't find the code in the masterclock stuff that > >> would actually do this. > > > > Guest clock is maintained by guest timekeeping code, which does: > > > > timer_interrupt() > > offset = read clocksource since last timer interrupt > > accumulate_to_systemclock(offset) > > > > The frequency correction of NTP in the host can be applied to > > kvmclock, which will be visible to the guest > > at "read clocksource since last timer interrupt" > > (kvmclock_clocksource_read function). > > pvclock_clocksource_read? That seems to do the same thing as all the > other clocksource access functions. > > > > > This does not mean that the NTP correction in the host is propagated > > to the guests system clock directly. > > > > (For example, the guest can run NTP which is free to do further > > adjustments at "accumulate_to_systemclock(offset)" time). > > Of course. But I expected that, in the absence of NTP on the guest, > that the guest would track the host's *corrected* time. > > > > >> If, on the other hand, the host's NTP correction is not supposed to > >> propagate to the guest, > > > > This is optional. There is a module option to control this, in fact. > > > > Its nice to have, because then you can execute a guest without NTP > > (say without network connection), and have a kvmclock (kvmclock is a > > clocksource, not a guest system clock) which is NTP corrected. > > Can you point to how this works? I found kvm_guest_time_update, whch > is called under circumstances that I haven't untangled. I can't > really tell what it's trying to do. Documentation/virtual/kvm/timekeeping.txt. > In any case, this still seems much more convoluted than it has to be. > In the case in which the host has a stable TSC (tsc is selected in the > core timekeeping code, VCLOCK_TSC is set, etc), which is basically all > the time on the last few generations of CPUs, then the core > timekeeping code is already exposing a linear function that's supposed > to be used for monotonic, cpu-local access to a corrected nanosecond > counter. It's even in pretty much exactly the right form to pass > through to the guest via pvclock in the gtod data. Why doesn't KVM > pass it through verbatim, updated in real time? Is there some legacy > reason that KVM must apply its own corrections and has to jump through > hoops to pause vcpus when updating those vcpu's copies of the pvclock > data? Read the comment on x86.c which starts with " * * Assuming a stable TSC across physical CPUS, and a stable TSC * across virtual CPUs, the following condition is possible. * Each numbered line represents an event visible to both * CPUs at the next numbered event. " > >> then shouldn't KVM just update system_time on > >> resume to whatever the guest would think it had (which I think would > >> be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly > >> shifted by some per-guest constant offset). > >> > >> --Andy > > > > Sure, you could add a correction to compensate and make sure > > the guest clock does not see time backwards. > > > > Could you help do that? You understand the code far better than I do. Sure, you have to save the guests view of time (system_time + scaled tsc read) when suspending, and add an offset to get_kernel_ns() to compensate the effect of (1) when resuming. Does that make sense? > As it stands, it simply doesn't work on any system that suspends and > resumes (unless maybe the system has the upcoming Intel ART feature, > and I have no clue when that'll show up). > > --Andy