All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Oliver Upton <oupton@google.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>,
	Jim Mattson <jmattson@google.com>,
	David Matlack <dmatlack@google.com>,
	Ricardo Koller <ricarkol@google.com>,
	Jing Zhang <jingzhangos@google.com>,
	Raghavendra Rao Anata <rananta@google.com>,
	James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Andrew Jones <drjones@redhat.com>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Date: Thu, 30 Sep 2021 16:21:07 -0300	[thread overview]
Message-ID: <20210930192107.GB19068@fuller.cnet> (raw)
In-Reply-To: <20210929185629.GA10933@fuller.cnet>

On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote:
> Oliver,
> 
> Do you have any numbers for the improvement in guests CLOCK_REALTIME
> accuracy across migration, when this is in place?
> 
> On Thu, Sep 16, 2021 at 06:15:35PM +0000, Oliver Upton wrote:
> > Handling the migration of TSCs correctly is difficult, in part because
> > Linux does not provide userspace with the ability to retrieve a (TSC,
> > realtime) clock pair for a single instant in time. In lieu of a more
> > convenient facility, KVM can report similar information in the kvm_clock
> > structure.
> > 
> > Provide userspace with a host TSC & realtime pair iff the realtime clock
> > is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
> > realtime value, advance the KVM clock by the amount of elapsed time. Do
> > not step the KVM clock backwards, though, as it is a monotonic
> > oscillator.
> > 
> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst  | 42 ++++++++++++++++++++++++++-------
> >  arch/x86/include/asm/kvm_host.h |  3 +++
> >  arch/x86/kvm/x86.c              | 36 +++++++++++++++++++++-------
> >  include/uapi/linux/kvm.h        |  7 +++++-
> >  4 files changed, 70 insertions(+), 18 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a6729c8cf063..d0b9c986cf6c 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -993,20 +993,34 @@ such as migration.
> >  When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
> >  set of bits that KVM can return in struct kvm_clock_data's flag member.
> >  
> > -The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
> > -value is the exact kvmclock value seen by all VCPUs at the instant
> > -when KVM_GET_CLOCK was called.  If clear, the returned value is simply
> > -CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> > -with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
> > -but the exact value read by each VCPU could differ, because the host
> > -TSC is not stable.
> > +FLAGS:
> > +
> > +KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
> > +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
> > +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
> > +offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
> > +to make all VCPUs follow this clock, but the exact value read by each
> > +VCPU could differ, because the host TSC is not stable.
> > +
> > +KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
> > +structure is populated with the value of the host's real time
> > +clocksource at the instant when KVM_GET_CLOCK was called. If clear,
> > +the `realtime` field does not contain a value.
> > +
> > +KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
> > +structure is populated with the value of the host's timestamp counter (TSC)
> > +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
> > +does not contain a value.
> >  
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
> >  In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
> >  such as migration.
> >  
> > +FLAGS:
> > +
> > +KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` field
> > +with the value of the host's real time clocksource at the instant when
> > +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
> > +kvmclock value that will be provided to guests.
> > +
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index be6805fc0260..9c34b5b63e39 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void);
> >  
> >  int alloc_all_memslots_rmaps(struct kvm *kvm);
> >  
> > +#define KVM_CLOCK_VALID_FLAGS						\
> > +	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
> > +
> >  #endif /* _ASM_X86_KVM_HOST_H */
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 523c4e5c109f..cb5d5cad5124 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
> >  	get_cpu();
> >  
> >  	if (__this_cpu_read(cpu_tsc_khz)) {
> > +#ifdef CONFIG_X86_64
> > +		struct timespec64 ts;
> > +
> > +		if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) {
> > +			data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
> > +			data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
> > +		} else
> > +#endif
> > +		data->host_tsc = rdtsc();
> > +
> >  		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> >  				   &hv_clock.tsc_shift,
> >  				   &hv_clock.tsc_to_system_mul);
> > -		data->clock = __pvclock_read_cycles(&hv_clock, rdtsc());
> > +		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> >  	} else {
> >  		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
> >  	}
> > @@ -4062,7 +4072,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  		r = KVM_SYNC_X86_VALID_FIELDS;
> >  		break;
> >  	case KVM_CAP_ADJUST_CLOCK:
> > -		r = KVM_CLOCK_TSC_STABLE;
> > +		r = KVM_CLOCK_VALID_FLAGS;
> >  		break;
> >  	case KVM_CAP_X86_DISABLE_EXITS:
> >  		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
> > @@ -5859,12 +5869,12 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  {
> >  	struct kvm_arch *ka = &kvm->arch;
> >  	struct kvm_clock_data data;
> > -	u64 now_ns;
> > +	u64 now_raw_ns;
> >  
> >  	if (copy_from_user(&data, argp, sizeof(data)))
> >  		return -EFAULT;
> >  
> > -	if (data.flags)
> > +	if (data.flags & ~KVM_CLOCK_REALTIME)
> >  		return -EINVAL;
> >  
> >  	kvm_hv_invalidate_tsc_page(kvm);
> > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  	 * is slightly ahead) here we risk going negative on unsigned
> >  	 * 'system_time' when 'data.clock' is very small.
> >  	 */
> > -	if (kvm->arch.use_master_clock)
> > -		now_ns = ka->master_kernel_ns;
> > +	if (data.flags & KVM_CLOCK_REALTIME) {
> > +		u64 now_real_ns = ktime_get_real_ns();
> > +
> > +		/*
> > +		 * Avoid stepping the kvmclock backwards.
> > +		 */
> > +		if (now_real_ns > data.realtime)
> > +			data.clock += now_real_ns - data.realtime;
> > +	}
> 
> Forward jumps can also cause problems, for example:
> 
> * Kernel watchdogs
> 
> * https://patchwork.ozlabs.org/project/qemu-devel/patch/20130618233825.GA19042@amt.cnet/
> 
> So perhaps limiting the amount of forward jump that is allowed 
> would be a good thing? (which can happen if the two hosts realtime
> clocks are off).
> 
> Now by how much, i am not sure.
> 
> Or, as mentioned earlier, only enable KVM_CLOCK_REALTIME if userspace
> KVM code checks clock synchronization.
> 
> Thomas, CC'ed, has deeper understanding of problems with 
> forward time jumps than I do. Thomas, any comments?

Thomas,

Based on the earlier discussion about the problems of synchronizing
the guests clock via a notification to the NTP/Chrony daemon 
(where there is a window where applications can read the stale
value of the clock), a possible solution would be triggering
an NMI on the destination (so that it runs ASAP, with higher
priority than application/kernel).

What would this NMI do, exactly?

> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag 
> for either vm pause / vm resume (well, if paused for long periods of time) 
> or savevm / restorevm.

Maybe with the NMI above, it would be possible to use
the realtime clock as a way to know time elapsed between
events and advance guest clock without the current 
problematic window.


WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Oliver Upton <oupton@google.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	kvm@vger.kernel.org, Will Deacon <will@kernel.org>,
	Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>,
	David Matlack <dmatlack@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Date: Thu, 30 Sep 2021 16:21:07 -0300	[thread overview]
Message-ID: <20210930192107.GB19068@fuller.cnet> (raw)
In-Reply-To: <20210929185629.GA10933@fuller.cnet>

On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote:
> Oliver,
> 
> Do you have any numbers for the improvement in guests CLOCK_REALTIME
> accuracy across migration, when this is in place?
> 
> On Thu, Sep 16, 2021 at 06:15:35PM +0000, Oliver Upton wrote:
> > Handling the migration of TSCs correctly is difficult, in part because
> > Linux does not provide userspace with the ability to retrieve a (TSC,
> > realtime) clock pair for a single instant in time. In lieu of a more
> > convenient facility, KVM can report similar information in the kvm_clock
> > structure.
> > 
> > Provide userspace with a host TSC & realtime pair iff the realtime clock
> > is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
> > realtime value, advance the KVM clock by the amount of elapsed time. Do
> > not step the KVM clock backwards, though, as it is a monotonic
> > oscillator.
> > 
> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst  | 42 ++++++++++++++++++++++++++-------
> >  arch/x86/include/asm/kvm_host.h |  3 +++
> >  arch/x86/kvm/x86.c              | 36 +++++++++++++++++++++-------
> >  include/uapi/linux/kvm.h        |  7 +++++-
> >  4 files changed, 70 insertions(+), 18 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a6729c8cf063..d0b9c986cf6c 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -993,20 +993,34 @@ such as migration.
> >  When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
> >  set of bits that KVM can return in struct kvm_clock_data's flag member.
> >  
> > -The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
> > -value is the exact kvmclock value seen by all VCPUs at the instant
> > -when KVM_GET_CLOCK was called.  If clear, the returned value is simply
> > -CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> > -with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
> > -but the exact value read by each VCPU could differ, because the host
> > -TSC is not stable.
> > +FLAGS:
> > +
> > +KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
> > +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
> > +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
> > +offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
> > +to make all VCPUs follow this clock, but the exact value read by each
> > +VCPU could differ, because the host TSC is not stable.
> > +
> > +KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
> > +structure is populated with the value of the host's real time
> > +clocksource at the instant when KVM_GET_CLOCK was called. If clear,
> > +the `realtime` field does not contain a value.
> > +
> > +KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
> > +structure is populated with the value of the host's timestamp counter (TSC)
> > +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
> > +does not contain a value.
> >  
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
> >  In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
> >  such as migration.
> >  
> > +FLAGS:
> > +
> > +KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` field
> > +with the value of the host's real time clocksource at the instant when
> > +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
> > +kvmclock value that will be provided to guests.
> > +
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index be6805fc0260..9c34b5b63e39 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void);
> >  
> >  int alloc_all_memslots_rmaps(struct kvm *kvm);
> >  
> > +#define KVM_CLOCK_VALID_FLAGS						\
> > +	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
> > +
> >  #endif /* _ASM_X86_KVM_HOST_H */
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 523c4e5c109f..cb5d5cad5124 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
> >  	get_cpu();
> >  
> >  	if (__this_cpu_read(cpu_tsc_khz)) {
> > +#ifdef CONFIG_X86_64
> > +		struct timespec64 ts;
> > +
> > +		if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) {
> > +			data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
> > +			data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
> > +		} else
> > +#endif
> > +		data->host_tsc = rdtsc();
> > +
> >  		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> >  				   &hv_clock.tsc_shift,
> >  				   &hv_clock.tsc_to_system_mul);
> > -		data->clock = __pvclock_read_cycles(&hv_clock, rdtsc());
> > +		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> >  	} else {
> >  		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
> >  	}
> > @@ -4062,7 +4072,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  		r = KVM_SYNC_X86_VALID_FIELDS;
> >  		break;
> >  	case KVM_CAP_ADJUST_CLOCK:
> > -		r = KVM_CLOCK_TSC_STABLE;
> > +		r = KVM_CLOCK_VALID_FLAGS;
> >  		break;
> >  	case KVM_CAP_X86_DISABLE_EXITS:
> >  		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
> > @@ -5859,12 +5869,12 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  {
> >  	struct kvm_arch *ka = &kvm->arch;
> >  	struct kvm_clock_data data;
> > -	u64 now_ns;
> > +	u64 now_raw_ns;
> >  
> >  	if (copy_from_user(&data, argp, sizeof(data)))
> >  		return -EFAULT;
> >  
> > -	if (data.flags)
> > +	if (data.flags & ~KVM_CLOCK_REALTIME)
> >  		return -EINVAL;
> >  
> >  	kvm_hv_invalidate_tsc_page(kvm);
> > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  	 * is slightly ahead) here we risk going negative on unsigned
> >  	 * 'system_time' when 'data.clock' is very small.
> >  	 */
> > -	if (kvm->arch.use_master_clock)
> > -		now_ns = ka->master_kernel_ns;
> > +	if (data.flags & KVM_CLOCK_REALTIME) {
> > +		u64 now_real_ns = ktime_get_real_ns();
> > +
> > +		/*
> > +		 * Avoid stepping the kvmclock backwards.
> > +		 */
> > +		if (now_real_ns > data.realtime)
> > +			data.clock += now_real_ns - data.realtime;
> > +	}
> 
> Forward jumps can also cause problems, for example:
> 
> * Kernel watchdogs
> 
> * https://patchwork.ozlabs.org/project/qemu-devel/patch/20130618233825.GA19042@amt.cnet/
> 
> So perhaps limiting the amount of forward jump that is allowed 
> would be a good thing? (which can happen if the two hosts realtime
> clocks are off).
> 
> Now by how much, i am not sure.
> 
> Or, as mentioned earlier, only enable KVM_CLOCK_REALTIME if userspace
> KVM code checks clock synchronization.
> 
> Thomas, CC'ed, has deeper understanding of problems with 
> forward time jumps than I do. Thomas, any comments?

Thomas,

Based on the earlier discussion about the problems of synchronizing
the guests clock via a notification to the NTP/Chrony daemon 
(where there is a window where applications can read the stale
value of the clock), a possible solution would be triggering
an NMI on the destination (so that it runs ASAP, with higher
priority than application/kernel).

What would this NMI do, exactly?

> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag 
> for either vm pause / vm resume (well, if paused for long periods of time) 
> or savevm / restorevm.

Maybe with the NMI above, it would be possible to use
the realtime clock as a way to know time elapsed between
events and advance guest clock without the current 
problematic window.

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Oliver Upton <oupton@google.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>,
	Jim Mattson <jmattson@google.com>,
	David Matlack <dmatlack@google.com>,
	Ricardo Koller <ricarkol@google.com>,
	Jing Zhang <jingzhangos@google.com>,
	Raghavendra Rao Anata <rananta@google.com>,
	James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Andrew Jones <drjones@redhat.com>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Date: Thu, 30 Sep 2021 16:21:07 -0300	[thread overview]
Message-ID: <20210930192107.GB19068@fuller.cnet> (raw)
In-Reply-To: <20210929185629.GA10933@fuller.cnet>

On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote:
> Oliver,
> 
> Do you have any numbers for the improvement in guests CLOCK_REALTIME
> accuracy across migration, when this is in place?
> 
> On Thu, Sep 16, 2021 at 06:15:35PM +0000, Oliver Upton wrote:
> > Handling the migration of TSCs correctly is difficult, in part because
> > Linux does not provide userspace with the ability to retrieve a (TSC,
> > realtime) clock pair for a single instant in time. In lieu of a more
> > convenient facility, KVM can report similar information in the kvm_clock
> > structure.
> > 
> > Provide userspace with a host TSC & realtime pair iff the realtime clock
> > is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
> > realtime value, advance the KVM clock by the amount of elapsed time. Do
> > not step the KVM clock backwards, though, as it is a monotonic
> > oscillator.
> > 
> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst  | 42 ++++++++++++++++++++++++++-------
> >  arch/x86/include/asm/kvm_host.h |  3 +++
> >  arch/x86/kvm/x86.c              | 36 +++++++++++++++++++++-------
> >  include/uapi/linux/kvm.h        |  7 +++++-
> >  4 files changed, 70 insertions(+), 18 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a6729c8cf063..d0b9c986cf6c 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -993,20 +993,34 @@ such as migration.
> >  When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
> >  set of bits that KVM can return in struct kvm_clock_data's flag member.
> >  
> > -The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
> > -value is the exact kvmclock value seen by all VCPUs at the instant
> > -when KVM_GET_CLOCK was called.  If clear, the returned value is simply
> > -CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> > -with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
> > -but the exact value read by each VCPU could differ, because the host
> > -TSC is not stable.
> > +FLAGS:
> > +
> > +KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
> > +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
> > +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
> > +offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
> > +to make all VCPUs follow this clock, but the exact value read by each
> > +VCPU could differ, because the host TSC is not stable.
> > +
> > +KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
> > +structure is populated with the value of the host's real time
> > +clocksource at the instant when KVM_GET_CLOCK was called. If clear,
> > +the `realtime` field does not contain a value.
> > +
> > +KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
> > +structure is populated with the value of the host's timestamp counter (TSC)
> > +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
> > +does not contain a value.
> >  
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
> >  In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
> >  such as migration.
> >  
> > +FLAGS:
> > +
> > +KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` field
> > +with the value of the host's real time clocksource at the instant when
> > +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
> > +kvmclock value that will be provided to guests.
> > +
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index be6805fc0260..9c34b5b63e39 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void);
> >  
> >  int alloc_all_memslots_rmaps(struct kvm *kvm);
> >  
> > +#define KVM_CLOCK_VALID_FLAGS						\
> > +	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
> > +
> >  #endif /* _ASM_X86_KVM_HOST_H */
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 523c4e5c109f..cb5d5cad5124 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
> >  	get_cpu();
> >  
> >  	if (__this_cpu_read(cpu_tsc_khz)) {
> > +#ifdef CONFIG_X86_64
> > +		struct timespec64 ts;
> > +
> > +		if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) {
> > +			data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
> > +			data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
> > +		} else
> > +#endif
> > +		data->host_tsc = rdtsc();
> > +
> >  		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> >  				   &hv_clock.tsc_shift,
> >  				   &hv_clock.tsc_to_system_mul);
> > -		data->clock = __pvclock_read_cycles(&hv_clock, rdtsc());
> > +		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> >  	} else {
> >  		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
> >  	}
> > @@ -4062,7 +4072,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  		r = KVM_SYNC_X86_VALID_FIELDS;
> >  		break;
> >  	case KVM_CAP_ADJUST_CLOCK:
> > -		r = KVM_CLOCK_TSC_STABLE;
> > +		r = KVM_CLOCK_VALID_FLAGS;
> >  		break;
> >  	case KVM_CAP_X86_DISABLE_EXITS:
> >  		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
> > @@ -5859,12 +5869,12 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  {
> >  	struct kvm_arch *ka = &kvm->arch;
> >  	struct kvm_clock_data data;
> > -	u64 now_ns;
> > +	u64 now_raw_ns;
> >  
> >  	if (copy_from_user(&data, argp, sizeof(data)))
> >  		return -EFAULT;
> >  
> > -	if (data.flags)
> > +	if (data.flags & ~KVM_CLOCK_REALTIME)
> >  		return -EINVAL;
> >  
> >  	kvm_hv_invalidate_tsc_page(kvm);
> > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  	 * is slightly ahead) here we risk going negative on unsigned
> >  	 * 'system_time' when 'data.clock' is very small.
> >  	 */
> > -	if (kvm->arch.use_master_clock)
> > -		now_ns = ka->master_kernel_ns;
> > +	if (data.flags & KVM_CLOCK_REALTIME) {
> > +		u64 now_real_ns = ktime_get_real_ns();
> > +
> > +		/*
> > +		 * Avoid stepping the kvmclock backwards.
> > +		 */
> > +		if (now_real_ns > data.realtime)
> > +			data.clock += now_real_ns - data.realtime;
> > +	}
> 
> Forward jumps can also cause problems, for example:
> 
> * Kernel watchdogs
> 
> * https://patchwork.ozlabs.org/project/qemu-devel/patch/20130618233825.GA19042@amt.cnet/
> 
> So perhaps limiting the amount of forward jump that is allowed 
> would be a good thing? (which can happen if the two hosts realtime
> clocks are off).
> 
> Now by how much, i am not sure.
> 
> Or, as mentioned earlier, only enable KVM_CLOCK_REALTIME if userspace
> KVM code checks clock synchronization.
> 
> Thomas, CC'ed, has deeper understanding of problems with 
> forward time jumps than I do. Thomas, any comments?

Thomas,

Based on the earlier discussion about the problems of synchronizing
the guests clock via a notification to the NTP/Chrony daemon 
(where there is a window where applications can read the stale
value of the clock), a possible solution would be triggering
an NMI on the destination (so that it runs ASAP, with higher
priority than application/kernel).

What would this NMI do, exactly?

> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag 
> for either vm pause / vm resume (well, if paused for long periods of time) 
> or savevm / restorevm.

Maybe with the NMI above, it would be possible to use
the realtime clock as a way to know time elapsed between
events and advance guest clock without the current 
problematic window.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-09-30 19:23 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 18:15 [PATCH v8 0/7] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
2021-09-16 18:15 ` Oliver Upton
2021-09-16 18:15 ` Oliver Upton
2021-09-16 18:15 ` [PATCH v8 1/7] kvm: x86: abstract locking around pvclock_update_vm_gtod_copy Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15 ` [PATCH v8 2/7] KVM: x86: extract KVM_GET_CLOCK/KVM_SET_CLOCK to separate functions Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15 ` [PATCH v8 3/7] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-29 13:33   ` Marcelo Tosatti
2021-09-29 13:33     ` Marcelo Tosatti
2021-09-29 13:33     ` Marcelo Tosatti
2021-09-16 18:15 ` [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values " Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-28 18:53   ` Marcelo Tosatti
2021-09-28 18:53     ` Marcelo Tosatti
2021-09-28 18:53     ` Marcelo Tosatti
2021-09-29 11:20     ` Paolo Bonzini
2021-09-29 11:20       ` Paolo Bonzini
2021-09-29 11:20       ` Paolo Bonzini
2021-09-29 18:56   ` Marcelo Tosatti
2021-09-29 18:56     ` Marcelo Tosatti
2021-09-29 18:56     ` Marcelo Tosatti
2021-09-30 19:21     ` Marcelo Tosatti [this message]
2021-09-30 19:21       ` Marcelo Tosatti
2021-09-30 19:21       ` Marcelo Tosatti
2021-09-30 23:02       ` Thomas Gleixner
2021-09-30 23:02         ` Thomas Gleixner
2021-09-30 23:02         ` Thomas Gleixner
2021-10-01 12:05         ` Marcelo Tosatti
2021-10-01 12:05           ` Marcelo Tosatti
2021-10-01 12:05           ` Marcelo Tosatti
2021-10-01 12:10           ` Marcelo Tosatti
2021-10-01 12:10             ` Marcelo Tosatti
2021-10-01 12:10             ` Marcelo Tosatti
2021-10-01 19:59           ` Thomas Gleixner
2021-10-01 19:59             ` Thomas Gleixner
2021-10-01 19:59             ` Thomas Gleixner
2021-10-01 21:03             ` Oliver Upton
2021-10-01 21:03               ` Oliver Upton
2021-10-01 21:03               ` Oliver Upton
2021-10-01 14:17         ` Paolo Bonzini
2021-10-01 14:17           ` Paolo Bonzini
2021-10-01 14:17           ` Paolo Bonzini
2021-10-01 14:39   ` Paolo Bonzini
2021-10-01 14:39     ` Paolo Bonzini
2021-10-01 14:39     ` Paolo Bonzini
2021-10-01 14:41     ` Paolo Bonzini
2021-10-01 14:41       ` Paolo Bonzini
2021-10-01 14:41       ` Paolo Bonzini
2021-10-01 15:39       ` Oliver Upton
2021-10-01 15:39         ` Oliver Upton
2021-10-01 15:39         ` Oliver Upton
2021-10-01 16:42         ` Paolo Bonzini
2021-10-01 16:42           ` Paolo Bonzini
2021-10-01 16:42           ` Paolo Bonzini
2024-01-17 14:28   ` David Woodhouse
2024-01-17 14:28     ` David Woodhouse
2021-09-16 18:15 ` [PATCH v8 5/7] kvm: x86: protect masterclock with a seqcount Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-24 16:42   ` Paolo Bonzini
2021-09-24 16:42     ` Paolo Bonzini
2021-09-24 16:42     ` Paolo Bonzini
2021-09-30 17:51   ` Marcelo Tosatti
2021-09-30 17:51     ` Marcelo Tosatti
2021-09-30 17:51     ` Marcelo Tosatti
2021-10-01 16:48   ` Paolo Bonzini
2021-10-01 16:48     ` Paolo Bonzini
2021-10-01 16:48     ` Paolo Bonzini
2021-09-16 18:15 ` [PATCH v8 6/7] KVM: x86: Refactor tsc synchronization code Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15 ` [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-16 18:15   ` Oliver Upton
2021-09-30 19:14   ` Marcelo Tosatti
2021-09-30 19:14     ` Marcelo Tosatti
2021-09-30 19:14     ` Marcelo Tosatti
2021-10-01  9:17     ` Paolo Bonzini
2021-10-01  9:17       ` Paolo Bonzini
2021-10-01  9:17       ` Paolo Bonzini
2021-10-01 10:32       ` Marcelo Tosatti
2021-10-01 10:32         ` Marcelo Tosatti
2021-10-01 10:32         ` Marcelo Tosatti
2021-10-01 15:12         ` Paolo Bonzini
2021-10-01 15:12           ` Paolo Bonzini
2021-10-01 15:12           ` Paolo Bonzini
2021-10-01 19:11           ` Marcelo Tosatti
2021-10-01 19:11             ` Marcelo Tosatti
2021-10-01 19:11             ` Marcelo Tosatti
2021-10-01 19:33             ` Oliver Upton
2021-10-01 19:33               ` Oliver Upton
2021-10-01 19:33               ` Oliver Upton
2021-10-04 14:30               ` Marcelo Tosatti
2021-10-04 14:30                 ` Marcelo Tosatti
2021-10-04 14:30                 ` Marcelo Tosatti
2021-10-04 11:44             ` Paolo Bonzini
2021-10-04 11:44               ` Paolo Bonzini
2021-10-04 11:44               ` Paolo Bonzini
2021-10-05 15:22   ` Sean Christopherson
2021-10-05 15:22     ` Sean Christopherson
2021-10-05 15:22     ` Sean Christopherson
2022-02-23 10:02   ` David Woodhouse
2022-02-23 10:02     ` David Woodhouse
2022-02-23 10:02     ` David Woodhouse
2021-09-24 16:43 ` [PATCH v8 0/7] KVM: x86: Add idempotent controls for migrating system counter state Paolo Bonzini
2021-09-24 16:43   ` Paolo Bonzini
2021-09-24 16:43   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210930192107.GB19068@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=alexandru.elisei@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=drjones@redhat.com \
    --cc=james.morse@arm.com \
    --cc=jingzhangos@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=pshier@google.com \
    --cc=rananta@google.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.