From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: kvmclock doesn't work, help? Date: Fri, 11 Dec 2015 13:57:23 -0800 Message-ID: References: <20151210213212.GA4836@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: kvm list , Radim Krcmar , Paolo Bonzini , X86 ML To: Marcelo Tosatti Return-path: Received: from mail-ob0-f171.google.com ([209.85.214.171]:33857 "EHLO mail-ob0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753409AbbLKV5n (ORCPT ); Fri, 11 Dec 2015 16:57:43 -0500 Received: by obciw8 with SMTP id iw8so93278442obc.1 for ; Fri, 11 Dec 2015 13:57:43 -0800 (PST) In-Reply-To: <20151210213212.GA4836@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti wrote: > On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote: >> I'm trying to clean up kvmclock and I can't get it to work at all. My >> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC. >> >> If I boot an SMP (2 vcpus) guest, tracing says: >> >> qemu-system-x86-2517 [001] 102242.610654: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 0 >> qemu-system-x86-2521 [000] 102242.613742: kvm_track_tsc: >> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc >> qemu-system-x86-2522 [000] 102242.622959: kvm_track_tsc: >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2521 [000] 102242.645123: kvm_track_tsc: >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2522 [000] 102242.647291: kvm_track_tsc: >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2521 [000] 102242.653369: kvm_track_tsc: >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2522 [000] 102242.653429: kvm_track_tsc: >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2517 [001] 102242.653447: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> qemu-system-x86-2521 [000] 102242.653657: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> qemu-system-x86-2522 [002] 102242.664448: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> >> >> If I boot a UP guest, tracing says: >> >> qemu-system-x86-2567 [001] 102370.447484: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> qemu-system-x86-2571 [002] 102370.447688: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> >> I suspect, but I haven't verified, that this is fallout from: >> >> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f >> Author: Marcelo Tosatti >> Date: Wed May 14 12:43:24 2014 -0300 >> >> KVM: x86: disable master clock if TSC is reset during suspend >> >> Updating system_time from the kernel clock once master clock >> has been enabled can result in time backwards event, in case >> kernel clock frequency is lower than TSC frequency. >> >> Disable master clock in case it is necessary to update it >> from the resume path. >> >> Signed-off-by: Marcelo Tosatti >> Signed-off-by: Paolo Bonzini >> >> >> Can we please stop making kvmclock more complex? It's a beast right >> now, and not in a good way. It's far too tangled with the vclock >> machinery on both the host and guest sides, the pvclock stuff is not >> well thought out (even in principle in an ABI sense), and it's never >> been clear to my what problem exactly the kvmclock stuff is supposed >> to solve. >> >> I'm somewhat tempted to suggest that we delete kvmclock entirely and >> start over. A correctly functioning KVM guest using TSC (i.e. >> ignoring kvmclock entirely) >> seems to work rather more reliably and >> considerably faster than a kvmclock guest. >> >> --Andy >> >> -- >> Andy Lutomirski >> AMA Capital Management, LLC > > Andy, > > I am all for solving practical problems rather than pleasing aesthetic > pleasure. > >> Updating system_time from the kernel clock once master clock >> has been enabled can result in time backwards event, in case >> kernel clock frequency is lower than TSC frequency. >> >> Disable master clock in case it is necessary to update it >> from the resume path. > >> once master clock >> has been enabled can result in time backwards event, in case >> kernel clock frequency is lower than TSC frequency. > > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads. > > If the effective frequency of the kernel clock is lower (for example > due to NTP correcting the TSC frequency of the system), and you resume > and update the system, the following happens: > > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads=LARGE VALUE. > suspend/resume event. > guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0. > I'm still not seeing the issue. The formula is: (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >> pvti->tsc_shift) + pvti->system_time Obviously, if you reset pvti->tsc_timestamp to the current tsc value after suspend/resume, you would also need to update system_time. I don't see what this has to do with suspend/resume or with whether the effective scale factor is greater than or less than one. The only suspend/resume interaction I can see is that, if the host allows the guest-observed TSC value to jump (which is arguably a bug, what that's not important here), it needs to update pvti before resuming the guest. Can you clarify concretely what goes wrong here? (I'm also at a bit of a loss as to why this needs both system_time and tsc_timestamp. They're redundant in the sense that you could set tsc_timestamp to zero and subtract (tsc_timestamp * tsc_to_system_mul) >> tsc_shift to system_time without changing the result of the calculation.) --Andy -- Andy Lutomirski AMA Capital Management, LLC