Re: What time is it kvm-clock?

From: Joao Martins <joao.m.martins@oracle.com>
To: Owen Hofmann <osh@google.com>, Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	xen-devel <xen-devel@lists.xen.org>,
	KVM General <kvm@vger.kernel.org>,
	Peter Hornyack <peterhornyack@google.com>
Subject: Re: What time is it kvm-clock?
Date: Thu, 25 Feb 2016 12:22:02 +0000	[thread overview]
Message-ID: <56CEF1EA.6010809__18894.8961244479$1456403008$gmane$org@oracle.com> (raw)
In-Reply-To: <CANqFzA6TgdsoUyjKiwNa++et_wWnQgFRhEUZ6NE3_CShh0wc6g@mail.gmail.com>

On 02/24/2016 07:55 PM, Owen Hofmann wrote:
>>>> not-really-well-defined hybrid?
>>>>
>>>> --Andy
>>>
>>> 1. What is not well defined? I fail to spot anything
>>> specific in Owen's e-mail.
>>
>> If I start a guest and query kvm-clock, I get a nanosecond count.
>> AFAIK it is, in fact, ill-defined or at least ill-documented what that
>> nanosecond count means.
> 
> To try to put the thoughts into specific questions:
> - What is the value returned by KVM_GET_CLOCK? How should it be used?
> - What is the value returned by a guest read of the kvm-clock
> structure? (This is also Andy's question)
> To me there are two possibilities for how to answer the second question:
> 1) kvm-clock is better than the host TSC: it propagates updates to
> frequency from the host (== CLOCK_MONOTONIC)
> 2) kvm-clock is a paravirtual source of truth on the guest TSC:
> whether it is stable and its approximate frequency. If the guest needs
> to synchronize to an external source of time, it runs NTP. (==
> CLOCK_MONOTONIC_RAW)
> 
> To me, (1) sounds hard, (2) sounds easy, and its not clear how much
> additional value (1) provides. The recent patches Paolo sent move
> kvm-clock in the direction of (1), and it sounds like Andy and I might
> have slightly different opinions as well. But mostly I would like some
> clarity as to which is the stated goal for kvm-clock, and to have the
> implementation pick only one of those options.
> 
>>>>> Since we cannot change the past, having kvmclock synchronize with the
>>>>> host TSC frequency is the only choice we can make.'
> 
> I'm not sure I understand what previous decision locks kvm-clock into
> the current path. Can you clarify?
> 
> On Wed, Feb 24, 2016 at 11:38 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Feb 24, 2016 at 9:38 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> On Wed, Feb 24, 2016 at 08:44:40AM -0800, Andy Lutomirski wrote:
>>>> On Wed, Feb 24, 2016 at 6:14 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>
>>>>>
>>>>> On 24/02/2016 03:31, Owen Hofmann wrote:
>>>>>> Specifically, what underlying source of time should be exposed through
>>>>>> kvm-clock and other paravirtual ABIs like the HyperV reference tsc
>>>>>> page?  Recently a couple of threads on kvm-list, along with attempts
>>>>>> to produce reliable behavior from kvm-clock on our systems have
>>>>>> highlighted a tension between the current implementation of kvm-clock
>>>>>> and potentially diverging goals for paravirt time. Here are a few:
>>>>>>
>>>>>> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html
>>>>>> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html
>>>>>> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html
>>>>>>
>>>>>> This question is mostly in regards to kvm-clock in masterclock mode
>>>>>> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to
>>>>>> expose a source of time that is more 'true' than the underlying TSC?
>>>>>> For example, by passing through NTP correction from the host. For the
>>>>>> current implementation, the answer seems to be... why not both? Once
>>>>>> programmed, kvm-clock or the HyperV TSC page will advance with the TSC
>>>>>> multiplied by the frequency specified by kvm. On the other hand,
>>>>>> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR
>>>>>> are measured against corrected time from the host. A guest reading its
>>>>>> pvclock gets a very different result from a host KVM_GET_CLOCK if the
>>>>>> guest has run long enough to for TSC to diverge from NTP time.
>>>>>
>>>>> Right, in fact that's why QEMU is not really using KVM_GET_CLOCK
>>>>> anymore.  In retrospect, the "fix" in QEMU was probably a bad idea.  It
>>>>> would have been better to fix KVM_GET_CLOCK.
>>>>>
>>>>>> To me, kvm-clock and the HyperV TSC page are extremely effective as
>>>>>> simply a more enlightened path to the host TSC. Maintaining a
>>>>>> high-performance path to the TSC in the face of updates is tricky -
>>>>>> see the extended comment in pvclock_update_vm_gtod_copy, or the
>>>>>> discussion on the patchset in (2). Is the cost of auditing that the
>>>>>> path from host gettimeofday update -> kvm -> guest pvclock -> guest
>>>>>> gettimeofday both tracks host time correctly and does not produce any
>>>>>> backwards warps worth the added value, if it exists? As an
>>>>>> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a
>>>>>> function of the last update to kvm-clock or the reference TSC page,
>>>>>> respectively, sounds very straightforward.
>>>>>
>>>>> Yes, we could do that too.
>>>>>
>>>>> I think that vgettsc and do_monotonic_boot also would have to use the
>>>>> TSC frequency instead the NTP-adjusted host clock.
>>>>>
>>>>>> (Outside of masterclock mode, the requirement that the client
>>>>>> synchronizes across cpus for montonicity smoothes over a lot of
>>>>>> complexity - periodically updating kvm-clock to the current time is
>>>>>> simple and works.)
>>>>>>
>>>>>> Regardless of my opinion, I think that a clear statement of the design
>>>>>> goals for kvm-clock (and kvm's implementation of the reference TSC
>>>>>> page) would be valuable.
>>>>>
>>>>> Since we cannot change the past, having kvmclock synchronize with the
>>>>> host TSC frequency is the only choice we can make.
>>>>>
>>>>
>>>> Could we introduce a new kvm-clock or perhaps opt-in mode that:
>>>>
>>>> a) uses hypervisor-supplied IO pages and,
>>>>
>>>> b) synchronizes to host CLOCK_MONOTONIC instead of some bizarre
>>>> non-suspend-resume-safe
>>>
>>> Please be accurate. It is suspend safe.
>>>
>>
>> I'm being accurate enough, I think.  Master clock mode is not suspend
>> safe.  When I suspend and resume my laptop, the master clock code
>> determines that it messed up and disables itself.  Unloading and
>> reloading the kvm modules turns it back on until the next suspect.
>>
>> I *think* that the underlying issue is that kvm-clock's master clock
>> tracks something ill-defined instead of exposing a well-defined host
>> clock.  If the master clock accurately exposed CLOCK_MONOTONIC_RAW or
>> CLOCK_MONOTONIC (I much prefer the latter), then it would be fine
>> across suspend/resume.
>>
>> I think that part of the reason that it doesn't accurately export a
>> host clock is that the worst-case performance of atomic updates to the
>> pvclock data structures is abysmal due to having the data structures
>> living in guest memory.  To be able to access and update all relevant
>> structures during host clock refreshes, the host would need to pin the
>> all pvclock pages for all running guests.  This could be partially
>> mitigated by only updating pvclock data for running vcpus and for vcpu
>> 0 for all running guests synchronously and deferring the rest (8k
>> pinned per host cpu, max), but it would still be a mess.
>>
>> If someone redefined the interface so that the *host* could allocate
>> it, then the pages could be shared across all guests and this would be
>> vastly simpler and faster.
>>
>> Also, kvm-clock should really coordinate with the core timekeeping
>> code to handle this sort of time base export rather than hooking into
>> the host vdso support code.
>>
>>>> not-really-well-defined hybrid?
>>>>
>>>> --Andy
>>>
>>> 1. What is not well defined? I fail to spot anything
>>> specific in Owen's e-mail.
>>
>> If I start a guest and query kvm-clock, I get a nanosecond count.
>> AFAIK it is, in fact, ill-defined or at least ill-documented what that
>> nanosecond count means.
>>
>> [cc: Joao.  Xen may want to take this stuff into consideration.]
[CC-ing xen-devel folks too]

Joao
>> --Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel