From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 083BBC3A59F for ; Thu, 29 Aug 2019 10:32:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C0EEE2073F for ; Thu, 29 Aug 2019 10:32:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1567074771; bh=iYoTr6lPvyZ0eB9ynAJx05yS07vrwTA3NeP/4Ti7eTg=; h=Subject:To:Cc:References:From:Date:In-Reply-To:List-ID:From; b=WyrBw2KzSzWilgSF4lFMFMc7l9oiDaRThwEadJyHZGZdZsUu9v0sGiBkCpmLYox8w NsZPknEhMNVcHrinbYAzy5E0Wtm7OM0dOxWv6sMxjjKZHr4pcGgMev+9P294PIXdM/ ciS6M/SdSwMvaoYAgWRhEhbcvf5hx4IOL6J/DhlE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726991AbfH2Kcs (ORCPT ); Thu, 29 Aug 2019 06:32:48 -0400 Received: from foss.arm.com ([217.140.110.172]:41998 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbfH2Kcr (ORCPT ); Thu, 29 Aug 2019 06:32:47 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D58B628; Thu, 29 Aug 2019 03:32:45 -0700 (PDT) Received: from [10.1.197.61] (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1EAAC3F59C; Thu, 29 Aug 2019 03:32:44 -0700 (PDT) Subject: Re: [RFC PATCH 3/3] Enable ptp_kvm for arm64 To: Jianyong Wu , netdev@vger.kernel.org, pbonzini@redhat.com, sean.j.christopherson@intel.com, richardcochran@gmail.com, Mark.Rutland@arm.com, Will.Deacon@arm.com, suzuki.poulose@arm.com Cc: linux-kernel@vger.kernel.org, Steve.Capper@arm.com, Kaly.Xin@arm.com, justin.he@arm.com References: <20190829063952.18470-1-jianyong.wu@arm.com> <20190829063952.18470-4-jianyong.wu@arm.com> From: Marc Zyngier Organization: Approximate Message-ID: <4d04867c-2188-9574-fbd1-2356c6b99b7d@kernel.org> Date: Thu, 29 Aug 2019 11:32:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190829063952.18470-4-jianyong.wu@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 29/08/2019 07:39, Jianyong Wu wrote: > Currently in arm64 virtualization environment, there is no mechanism to > keep time sync between guest and host. Time in guest will drift compared > with host after boot up as they may both use third party time sources > to correct their time respectively. The time deviation will be in order > of milliseconds but some scenarios ask for higher time precision, like > in cloud envirenment, we want all the VMs running in the host aquire the > same level accuracy from host clock. > > Use of kvm ptp clock, which choose the host clock source clock as a > reference clock to sync time clock between guest and host has been adopted > by x86 which makes the time sync order from milliseconds to nanoseconds. > > This patch enable kvm ptp on arm64 and we get the similar clock drift as > found with x86 with kvm ptp. > > Test result comparison between with kvm ptp and without it in arm64 are > as follows. This test derived from the result of command 'chronyc > sources'. we should take more cure of the last sample column which shows > the offset between the local clock and the source at the last measurement. > > no kvm ptp in guest: > MS Name/IP address Stratum Poll Reach LastRx Last sample > ======================================================================== > ^* dns1.synet.edu.cn 2 6 377 13 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 21 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 29 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 37 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 45 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 53 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 61 +1040us[+1581us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 4 -130us[ +796us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 12 -130us[ +796us] +/- 21ms > ^* dns1.synet.edu.cn 2 6 377 20 -130us[ +796us] +/- 21ms > > in host: > MS Name/IP address Stratum Poll Reach LastRx Last sample > ======================================================================== > ^* 120.25.115.20 2 7 377 72 -470us[ -603us] +/- 18ms > ^* 120.25.115.20 2 7 377 92 -470us[ -603us] +/- 18ms > ^* 120.25.115.20 2 7 377 112 -470us[ -603us] +/- 18ms > ^* 120.25.115.20 2 7 377 2 +872ns[-6808ns] +/- 17ms > ^* 120.25.115.20 2 7 377 22 +872ns[-6808ns] +/- 17ms > ^* 120.25.115.20 2 7 377 43 +872ns[-6808ns] +/- 17ms > ^* 120.25.115.20 2 7 377 63 +872ns[-6808ns] +/- 17ms > ^* 120.25.115.20 2 7 377 83 +872ns[-6808ns] +/- 17ms > ^* 120.25.115.20 2 7 377 103 +872ns[-6808ns] +/- 17ms > ^* 120.25.115.20 2 7 377 123 +872ns[-6808ns] +/- 17ms > > The dns1.synet.edu.cn is the network reference clock for guest and > 120.25.115.20 is the network reference clock for host. we can't get the > clock error between guest and host directly, but a roughly estimated value > will be in order of hundreds of us to ms. > > with kvm ptp in guest: > chrony has been disabled in host to remove the disturb by network clock. Is that a realistic use case? Why should the host not use NTP? > > MS Name/IP address Stratum Poll Reach LastRx Last sample > ======================================================================== > * PHC0 0 3 377 8 -7ns[ +1ns] +/- 3ns > * PHC0 0 3 377 8 +1ns[ +16ns] +/- 3ns > * PHC0 0 3 377 6 -4ns[ -0ns] +/- 6ns > * PHC0 0 3 377 6 -8ns[ -12ns] +/- 5ns > * PHC0 0 3 377 5 +2ns[ +4ns] +/- 4ns > * PHC0 0 3 377 13 +2ns[ +4ns] +/- 4ns > * PHC0 0 3 377 12 -4ns[ -6ns] +/- 4ns > * PHC0 0 3 377 11 -8ns[ -11ns] +/- 6ns > * PHC0 0 3 377 10 -14ns[ -20ns] +/- 4ns > * PHC0 0 3 377 8 +4ns[ +5ns] +/- 4ns > > The PHC0 is the ptp clock which choose the host clock as its source > clock. So we can be sure to say that the clock error between host and guest > is in order of ns. > > Signed-off-by: Jianyong Wu > --- > arch/arm64/include/asm/arch_timer.h | 3 ++ > arch/arm64/kvm/arch_ptp_kvm.c | 76 ++++++++++++++++++++++++++++ > drivers/clocksource/arm_arch_timer.c | 6 ++- > drivers/ptp/Kconfig | 2 +- > include/linux/arm-smccc.h | 14 +++++ > virt/kvm/arm/psci.c | 17 +++++++ > 6 files changed, 115 insertions(+), 3 deletions(-) > create mode 100644 arch/arm64/kvm/arch_ptp_kvm.c Please split this patch into two parts: the hypervisor code in a patch and the guest code in another patch. Having both of them together is confusing. > > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h > index 6756178c27db..880576a814b6 100644 > --- a/arch/arm64/include/asm/arch_timer.h > +++ b/arch/arm64/include/asm/arch_timer.h > @@ -229,4 +229,7 @@ static inline int arch_timer_arch_init(void) > return 0; > } > > +extern struct clocksource clocksource_counter; > +extern u64 arch_counter_read(struct clocksource *cs); I'm definitely not keen on exposing the internals of the arch_timer driver to random subsystems. Furthermore, you seem to expect that the guest kernel will only use the arch timer as a clocksource, and nothing really guarantees that (in which case get_device_system_crosststamp will fail). It looks to me that we'd be better off exposing a core timekeeping API that populates a struct system_counterval_t based on the *current* timekeeper monotonic clocksource. This would simplify the split between generic and arch-specific code. Whether or not tglx will be happy with the idea is another problem, but I'm certainly not taking any change to the arch timer code based on this. > + > #endif > diff --git a/arch/arm64/kvm/arch_ptp_kvm.c b/arch/arm64/kvm/arch_ptp_kvm.c We don't put non-hypervisor in arch/arm64/kvm. Please move it back to drivers/ptp (as well as its x86 counterpart), and just link the two parts there. This should also allow this to be enabled for 32bit guests. > new file mode 100644 > index 000000000000..6b2165ebce62 > --- /dev/null > +++ b/arch/arm64/kvm/arch_ptp_kvm.c > @@ -0,0 +1,76 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Virtual PTP 1588 clock for use with KVM guests > + * Copyright (C) 2019 ARM Ltd. > + * All Rights Reserved > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* > + * as trap call cause delay, this function will return the delay in nanosecond > + */ > +static u64 arm_smccc_1_1_invoke_delay(u32 id, struct arm_smccc_res *res) > +{ > + u64 ns, t1, t2; > + > + t1 = sched_clock(); > + arm_smccc_1_1_invoke(id, res); > + t2 = sched_clock(); > + t2 -= t1; > + ns = t2; > + return ns; I think you can get rid of the ns variable here... > +} > + > +int kvm_arch_ptp_init(void) > +{ > + return 0; > +} > + > +int kvm_arch_ptp_get_clock(struct timespec64 *ts) > +{ > + u64 ns; > + struct arm_smccc_res hvc_res; > + > + if (!kvm_arm_hyp_service_available( > + ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID)) { > + return -EOPNOTSUPP; > + } > + ns = arm_smccc_1_1_invoke_delay(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, > + &hvc_res); > + ts->tv_sec = hvc_res.a0; > + ts->tv_nsec = hvc_res.a1; > + timespec64_add_ns(ts, ns); > + return 0; > +} > + > +int kvm_arch_ptp_get_clock_fn(long *cycle, struct timespec64 *ts, > + struct clocksource **cs) > +{ > + u64 ns; > + struct arm_smccc_res hvc_res; > + > + if (!kvm_arm_hyp_service_available( > + ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID)) { > + return -EOPNOTSUPP; > + } > + ns = arm_smccc_1_1_invoke_delay(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, > + &hvc_res); > + ts->tv_sec = hvc_res.a0; > + ts->tv_nsec = hvc_res.a1; > + timespec64_add_ns(ts, ns); > + *cycle = hvc_res.a2; > + *cs = &clocksource_counter; > + > + return 0; > +} Why do we have two functions doing almost the same thing? Why do you call kvm_arm_hyp_service_available on each and every time? Isn't it enough to check in kvm_arch_ptp_init()? > + > +MODULE_AUTHOR("Marcelo Tosatti "); > +MODULE_DESCRIPTION("PTP clock using KVMCLOCK"); > +MODULE_LICENSE("GPL"); This should only exist in the generic code. > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c > index 07e57a49d1e8..021e3f69364c 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -175,23 +175,25 @@ static notrace u64 arch_counter_get_cntvct(void) > u64 (*arch_timer_read_counter)(void) = arch_counter_get_cntvct; > EXPORT_SYMBOL_GPL(arch_timer_read_counter); > > -static u64 arch_counter_read(struct clocksource *cs) > +u64 arch_counter_read(struct clocksource *cs) > { > return arch_timer_read_counter(); > } > +EXPORT_SYMBOL(arch_counter_read); > > static u64 arch_counter_read_cc(const struct cyclecounter *cc) > { > return arch_timer_read_counter(); > } > > -static struct clocksource clocksource_counter = { > +struct clocksource clocksource_counter = { > .name = "arch_sys_counter", > .rating = 400, > .read = arch_counter_read, > .mask = CLOCKSOURCE_MASK(56), > .flags = CLOCK_SOURCE_IS_CONTINUOUS, > }; > +EXPORT_SYMBOL(clocksource_counter); I've said what I thought about this. Not happening. > > static struct cyclecounter cyclecounter __ro_after_init = { > .read = arch_counter_read_cc, > diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig > index 9b8fee5178e8..e032fafdafa7 100644 > --- a/drivers/ptp/Kconfig > +++ b/drivers/ptp/Kconfig > @@ -110,7 +110,7 @@ config PTP_1588_CLOCK_PCH > config PTP_1588_CLOCK_KVM > tristate "KVM virtual PTP clock" > depends on PTP_1588_CLOCK > - depends on KVM_GUEST && X86 > + depends on KVM_GUEST && X86 || ARM64 > default y > help > This driver adds support for using kvm infrastructure as a PTP > diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h > index a6e4d3e3d10a..2a222a1a8594 100644 > --- a/include/linux/arm-smccc.h > +++ b/include/linux/arm-smccc.h > @@ -94,6 +94,7 @@ > > /* KVM "vendor specific" services */ > #define ARM_SMCCC_KVM_FUNC_FEATURES 0 > +#define ARM_SMCCC_KVM_PTP 1 > #define ARM_SMCCC_KVM_FUNC_FEATURES_2 127 > #define ARM_SMCCC_KVM_NUM_FUNCS 128 > > @@ -102,6 +103,16 @@ > ARM_SMCCC_SMC_32, \ > ARM_SMCCC_OWNER_VENDOR_HYP, \ > ARM_SMCCC_KVM_FUNC_FEATURES) > +/* > + * This ID used for virtual ptp kvm clock and it will pass second value > + * and nanosecond value of host real time and system counter by vcpu > + * register to guest. > + */ > +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \ > + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ > + ARM_SMCCC_SMC_32, \ > + ARM_SMCCC_OWNER_VENDOR_HYP, \ > + ARM_SMCCC_KVM_PTP) > > #ifndef __ASSEMBLY__ > > @@ -373,5 +384,8 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, > method; \ > }) > > +#include > +#include > + > #endif /*__ASSEMBLY__*/ > #endif /*__LINUX_ARM_SMCCC_H*/ > diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c > index 0debf49bf259..7fffdb25d32c 100644 > --- a/virt/kvm/arm/psci.c > +++ b/virt/kvm/arm/psci.c > @@ -392,6 +392,8 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) > u32 func_id = smccc_get_function(vcpu); > u32 val[4] = {}; > u32 option; > + struct timespec *ts; > + u64 cnt; > > val[0] = SMCCC_RET_NOT_SUPPORTED; > > @@ -431,6 +433,21 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) > case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: > val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); > break; > + /* > + * This will used for virtual ptp kvm clock. three > + * values will be passed back. > + * reg0 stores seconds of host real time; > + * reg1 stores nanoseconds of host real time; > + * reg2 stotes system counter cycle value. stores > + */ > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: > + getnstimeofday(ts); > + cnt = arch_timer_read_counter(); > + val[0] = ts->tv_sec; > + val[1] = ts->tv_nsec; > + val[2] = cnt; Can you explain what the purpose of exposing this counter is? The guest should have access to the physical counter already. > + val[3] = 0; > + break; This will probably conflict with Steven's stolen time series. Not a big deal though. > default: > return kvm_psci_call(vcpu); > } > Other questions: how does this works with VM migration? Specially when moving from a hypervisor that supports the feature to one that doesn't? Thanks, M. -- Jazz is not dead, it just smells funny...