From mboxrd@z Thu Jan 1 00:00:00 1970 From: Owen Hofmann Subject: Re: What time is it kvm-clock? Date: Wed, 24 Feb 2016 11:55:23 -0800 Message-ID: References: <56CDBAB1.6090405@redhat.com> <20160224173821.GA9364@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Marcelo Tosatti , Paolo Bonzini , KVM General , Peter Hornyack , Joao Martins To: Andy Lutomirski Return-path: Received: from mail-io0-f169.google.com ([209.85.223.169]:34132 "EHLO mail-io0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932366AbcBXTz0 (ORCPT ); Wed, 24 Feb 2016 14:55:26 -0500 Received: by mail-io0-f169.google.com with SMTP id 9so57566388iom.1 for ; Wed, 24 Feb 2016 11:55:25 -0800 (PST) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: >>> not-really-well-defined hybrid? >>> >>> --Andy >> >> 1. What is not well defined? I fail to spot anything >> specific in Owen's e-mail. > > If I start a guest and query kvm-clock, I get a nanosecond count. > AFAIK it is, in fact, ill-defined or at least ill-documented what that > nanosecond count means. To try to put the thoughts into specific questions: - What is the value returned by KVM_GET_CLOCK? How should it be used? - What is the value returned by a guest read of the kvm-clock structure? (This is also Andy's question) To me there are two possibilities for how to answer the second question: 1) kvm-clock is better than the host TSC: it propagates updates to frequency from the host (== CLOCK_MONOTONIC) 2) kvm-clock is a paravirtual source of truth on the guest TSC: whether it is stable and its approximate frequency. If the guest needs to synchronize to an external source of time, it runs NTP. (== CLOCK_MONOTONIC_RAW) To me, (1) sounds hard, (2) sounds easy, and its not clear how much additional value (1) provides. The recent patches Paolo sent move kvm-clock in the direction of (1), and it sounds like Andy and I might have slightly different opinions as well. But mostly I would like some clarity as to which is the stated goal for kvm-clock, and to have the implementation pick only one of those options. >>> > Since we cannot change the past, having kvmclock synchronize with the >>> > host TSC frequency is the only choice we can make.' I'm not sure I understand what previous decision locks kvm-clock into the current path. Can you clarify? On Wed, Feb 24, 2016 at 11:38 AM, Andy Lutomirski wrote: > On Wed, Feb 24, 2016 at 9:38 AM, Marcelo Tosatti wrote: >> On Wed, Feb 24, 2016 at 08:44:40AM -0800, Andy Lutomirski wrote: >>> On Wed, Feb 24, 2016 at 6:14 AM, Paolo Bonzini wrote: >>> > >>> > >>> > On 24/02/2016 03:31, Owen Hofmann wrote: >>> >> Specifically, what underlying source of time should be exposed through >>> >> kvm-clock and other paravirtual ABIs like the HyperV reference tsc >>> >> page? Recently a couple of threads on kvm-list, along with attempts >>> >> to produce reliable behavior from kvm-clock on our systems have >>> >> highlighted a tension between the current implementation of kvm-clock >>> >> and potentially diverging goals for paravirt time. Here are a few: >>> >> >>> >> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html >>> >> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html >>> >> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html >>> >> >>> >> This question is mostly in regards to kvm-clock in masterclock mode >>> >> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to >>> >> expose a source of time that is more 'true' than the underlying TSC? >>> >> For example, by passing through NTP correction from the host. For the >>> >> current implementation, the answer seems to be... why not both? Once >>> >> programmed, kvm-clock or the HyperV TSC page will advance with the TSC >>> >> multiplied by the frequency specified by kvm. On the other hand, >>> >> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR >>> >> are measured against corrected time from the host. A guest reading its >>> >> pvclock gets a very different result from a host KVM_GET_CLOCK if the >>> >> guest has run long enough to for TSC to diverge from NTP time. >>> > >>> > Right, in fact that's why QEMU is not really using KVM_GET_CLOCK >>> > anymore. In retrospect, the "fix" in QEMU was probably a bad idea. It >>> > would have been better to fix KVM_GET_CLOCK. >>> > >>> >> To me, kvm-clock and the HyperV TSC page are extremely effective as >>> >> simply a more enlightened path to the host TSC. Maintaining a >>> >> high-performance path to the TSC in the face of updates is tricky - >>> >> see the extended comment in pvclock_update_vm_gtod_copy, or the >>> >> discussion on the patchset in (2). Is the cost of auditing that the >>> >> path from host gettimeofday update -> kvm -> guest pvclock -> guest >>> >> gettimeofday both tracks host time correctly and does not produce any >>> >> backwards warps worth the added value, if it exists? As an >>> >> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a >>> >> function of the last update to kvm-clock or the reference TSC page, >>> >> respectively, sounds very straightforward. >>> > >>> > Yes, we could do that too. >>> > >>> > I think that vgettsc and do_monotonic_boot also would have to use the >>> > TSC frequency instead the NTP-adjusted host clock. >>> > >>> >> (Outside of masterclock mode, the requirement that the client >>> >> synchronizes across cpus for montonicity smoothes over a lot of >>> >> complexity - periodically updating kvm-clock to the current time is >>> >> simple and works.) >>> >> >>> >> Regardless of my opinion, I think that a clear statement of the design >>> >> goals for kvm-clock (and kvm's implementation of the reference TSC >>> >> page) would be valuable. >>> > >>> > Since we cannot change the past, having kvmclock synchronize with the >>> > host TSC frequency is the only choice we can make. >>> > >>> >>> Could we introduce a new kvm-clock or perhaps opt-in mode that: >>> >>> a) uses hypervisor-supplied IO pages and, >>> >>> b) synchronizes to host CLOCK_MONOTONIC instead of some bizarre >>> non-suspend-resume-safe >> >> Please be accurate. It is suspend safe. >> > > I'm being accurate enough, I think. Master clock mode is not suspend > safe. When I suspend and resume my laptop, the master clock code > determines that it messed up and disables itself. Unloading and > reloading the kvm modules turns it back on until the next suspect. > > I *think* that the underlying issue is that kvm-clock's master clock > tracks something ill-defined instead of exposing a well-defined host > clock. If the master clock accurately exposed CLOCK_MONOTONIC_RAW or > CLOCK_MONOTONIC (I much prefer the latter), then it would be fine > across suspend/resume. > > I think that part of the reason that it doesn't accurately export a > host clock is that the worst-case performance of atomic updates to the > pvclock data structures is abysmal due to having the data structures > living in guest memory. To be able to access and update all relevant > structures during host clock refreshes, the host would need to pin the > all pvclock pages for all running guests. This could be partially > mitigated by only updating pvclock data for running vcpus and for vcpu > 0 for all running guests synchronously and deferring the rest (8k > pinned per host cpu, max), but it would still be a mess. > > If someone redefined the interface so that the *host* could allocate > it, then the pages could be shared across all guests and this would be > vastly simpler and faster. > > Also, kvm-clock should really coordinate with the core timekeeping > code to handle this sort of time base export rather than hooking into > the host vdso support code. > >>> not-really-well-defined hybrid? >>> >>> --Andy >> >> 1. What is not well defined? I fail to spot anything >> specific in Owen's e-mail. > > If I start a guest and query kvm-clock, I get a nanosecond count. > AFAIK it is, in fact, ill-defined or at least ill-documented what that > nanosecond count means. > > [cc: Joao. Xen may want to take this stuff into consideration.] > > --Andy