* [PATCH 1/2] KVM: x86: reorganize pvclock_gtod_data members
[not found] <1579702953-24184-1-git-send-email-pbonzini@redhat.com>
@ 2020-01-22 14:22 ` Paolo Bonzini
2020-01-23 11:32 ` Vitaly Kuznetsov
2020-01-22 14:22 ` [PATCH 2/2] KVM: x86: use raw clock values consistently Paolo Bonzini
1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2020-01-22 14:22 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: mtosatti, stable
We will need a copy of tk->offs_boot in the next patch. Store it and
cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot
included, store the raw value in struct pvclock_clock and sum tk->offs_boot
in do_monotonic_raw and do_realtime. tk->tkr_xxx.xtime_nsec also moves
to struct pvclock_clock.
While at it, fix a (usually harmless) typo in do_monotonic_raw, which
was using gtod->clock.shift instead of gtod->raw_clock.shift.
Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/x86.c | 29 ++++++++++++-----------------
1 file changed, 12 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 89621025577a..1b4273cce63c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1532,6 +1532,8 @@ struct pvclock_clock {
u64 mask;
u32 mult;
u32 shift;
+ u64 base_cycles;
+ u64 offset;
};
struct pvclock_gtod_data {
@@ -1540,11 +1542,8 @@ struct pvclock_gtod_data {
struct pvclock_clock clock; /* extract of a clocksource struct */
struct pvclock_clock raw_clock; /* extract of a clocksource struct */
- u64 boot_ns_raw;
- u64 boot_ns;
- u64 nsec_base;
+ ktime_t offs_boot;
u64 wall_time_sec;
- u64 monotonic_raw_nsec;
};
static struct pvclock_gtod_data pvclock_gtod_data;
@@ -1552,10 +1551,6 @@ struct pvclock_gtod_data {
static void update_pvclock_gtod(struct timekeeper *tk)
{
struct pvclock_gtod_data *vdata = &pvclock_gtod_data;
- u64 boot_ns, boot_ns_raw;
-
- boot_ns = ktime_to_ns(ktime_add(tk->tkr_mono.base, tk->offs_boot));
- boot_ns_raw = ktime_to_ns(ktime_add(tk->tkr_raw.base, tk->offs_boot));
write_seqcount_begin(&vdata->seq);
@@ -1565,20 +1560,20 @@ static void update_pvclock_gtod(struct timekeeper *tk)
vdata->clock.mask = tk->tkr_mono.mask;
vdata->clock.mult = tk->tkr_mono.mult;
vdata->clock.shift = tk->tkr_mono.shift;
+ vdata->clock.base_cycles = tk->tkr_mono.xtime_nsec;
+ vdata->clock.offset = tk->tkr_mono.base;
vdata->raw_clock.vclock_mode = tk->tkr_raw.clock->archdata.vclock_mode;
vdata->raw_clock.cycle_last = tk->tkr_raw.cycle_last;
vdata->raw_clock.mask = tk->tkr_raw.mask;
vdata->raw_clock.mult = tk->tkr_raw.mult;
vdata->raw_clock.shift = tk->tkr_raw.shift;
-
- vdata->boot_ns = boot_ns;
- vdata->nsec_base = tk->tkr_mono.xtime_nsec;
+ vdata->raw_clock.base_cycles = tk->tkr_raw.xtime_nsec;
+ vdata->raw_clock.offset = tk->tkr_raw.base;
vdata->wall_time_sec = tk->xtime_sec;
- vdata->boot_ns_raw = boot_ns_raw;
- vdata->monotonic_raw_nsec = tk->tkr_raw.xtime_nsec;
+ vdata->offs_boot = tk->offs_boot;
write_seqcount_end(&vdata->seq);
}
@@ -2048,10 +2043,10 @@ static int do_monotonic_raw(s64 *t, u64 *tsc_timestamp)
do {
seq = read_seqcount_begin(>od->seq);
- ns = gtod->monotonic_raw_nsec;
+ ns = gtod->raw_clock.base_cycles;
ns += vgettsc(>od->raw_clock, tsc_timestamp, &mode);
- ns >>= gtod->clock.shift;
- ns += gtod->boot_ns_raw;
+ ns >>= gtod->raw_clock.shift;
+ ns += ktime_to_ns(ktime_add(gtod->raw_clock.offset, gtod->offs_boot));
} while (unlikely(read_seqcount_retry(>od->seq, seq)));
*t = ns;
@@ -2068,7 +2063,7 @@ static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp)
do {
seq = read_seqcount_begin(>od->seq);
ts->tv_sec = gtod->wall_time_sec;
- ns = gtod->nsec_base;
+ ns = gtod->clock.base_cycles;
ns += vgettsc(>od->clock, tsc_timestamp, &mode);
ns >>= gtod->clock.shift;
} while (unlikely(read_seqcount_retry(>od->seq, seq)));
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] KVM: x86: use raw clock values consistently
[not found] <1579702953-24184-1-git-send-email-pbonzini@redhat.com>
2020-01-22 14:22 ` [PATCH 1/2] KVM: x86: reorganize pvclock_gtod_data members Paolo Bonzini
@ 2020-01-22 14:22 ` Paolo Bonzini
2020-01-23 13:43 ` Vitaly Kuznetsov
1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2020-01-22 14:22 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: mtosatti, stable
Commit 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw
clock") changed kvmclock to use tkr_raw instead of tkr_mono. However,
the default kvmclock_offset for the VM was still based on the monotonic
clock and, if the raw clock drifted enough from the monotonic clock,
this could cause a negative system_time to be written to the guest's
struct pvclock. RHEL5 does not like it and (if it boots fast enough to
observe a negative time value) it hangs.
There is another thing to be careful about: getboottime64 returns the
host boot time in tkr_mono units, and subtracting tkr_raw units will
cause the wallclock to be off if tkr_raw drifts from tkr_mono. To
avoid this, compute the wallclock delta from the current time instead
of being clever and using getboottime64.
Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/x86.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b4273cce63c..b5e0648580e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1577,6 +1577,18 @@ static void update_pvclock_gtod(struct timekeeper *tk)
write_seqcount_end(&vdata->seq);
}
+
+static s64 get_kvmclock_base_ns(void)
+{
+ /* Count up from boot time, but with the frequency of the raw clock. */
+ return ktime_to_ns(ktime_add(ktime_get_raw(), pvclock_gtod_data.offs_boot));
+}
+#else
+static s64 get_kvmclock_base_ns(void)
+{
+ /* Master clock not used, so we can just use CLOCK_BOOTTIME. */
+ return ktime_get_boottime_ns();
+}
#endif
void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
@@ -1590,7 +1602,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
int version;
int r;
struct pvclock_wall_clock wc;
- struct timespec64 boot;
+ u64 wall_nsec;
if (!wall_clock)
return;
@@ -1610,17 +1622,12 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
/*
* The guest calculates current wall clock time by adding
* system time (updated by kvm_guest_time_update below) to the
- * wall clock specified here. guest system time equals host
- * system time for us, thus we must fill in host boot time here.
+ * wall clock specified here. We do the reverse here.
*/
- getboottime64(&boot);
+ wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(kvm);
- if (kvm->arch.kvmclock_offset) {
- struct timespec64 ts = ns_to_timespec64(kvm->arch.kvmclock_offset);
- boot = timespec64_sub(boot, ts);
- }
- wc.sec = (u32)boot.tv_sec; /* overflow in 2106 guest time */
- wc.nsec = boot.tv_nsec;
+ wc.nsec = do_div(wall_nsec, 1000000000);
+ wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
wc.version = version;
kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
@@ -1868,7 +1875,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
offset = kvm_compute_tsc_offset(vcpu, data);
- ns = ktime_get_boottime_ns();
+ ns = get_kvmclock_base_ns();
elapsed = ns - kvm->arch.last_tsc_nsec;
if (vcpu->arch.virtual_tsc_khz) {
@@ -2206,7 +2213,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
spin_lock(&ka->pvclock_gtod_sync_lock);
if (!ka->use_master_clock) {
spin_unlock(&ka->pvclock_gtod_sync_lock);
- return ktime_get_boottime_ns() + ka->kvmclock_offset;
+ return get_kvmclock_base_ns() + ka->kvmclock_offset;
}
hv_clock.tsc_timestamp = ka->master_cycle_now;
@@ -2222,7 +2229,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
&hv_clock.tsc_to_system_mul);
ret = __pvclock_read_cycles(&hv_clock, rdtsc());
} else
- ret = ktime_get_boottime_ns() + ka->kvmclock_offset;
+ ret = get_kvmclock_base_ns() + ka->kvmclock_offset;
put_cpu();
@@ -2321,7 +2328,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
}
if (!use_master_clock) {
host_tsc = rdtsc();
- kernel_ns = ktime_get_boottime_ns();
+ kernel_ns = get_kvmclock_base_ns();
}
tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
@@ -2361,6 +2368,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
vcpu->last_guest_tsc = tsc_timestamp;
+ WARN_ON(vcpu->hv_clock.system_time < 0);
/* If the host uses TSC clocksource, then it is stable */
pvclock_flags = 0;
@@ -9473,7 +9481,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
mutex_init(&kvm->arch.apic_map_lock);
spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
- kvm->arch.kvmclock_offset = -ktime_get_boottime_ns();
+ kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
pvclock_update_vm_gtod_copy(kvm);
kvm->arch.guest_can_read_msr_platform_info = true;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reorganize pvclock_gtod_data members
2020-01-22 14:22 ` [PATCH 1/2] KVM: x86: reorganize pvclock_gtod_data members Paolo Bonzini
@ 2020-01-23 11:32 ` Vitaly Kuznetsov
2020-01-23 11:35 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-23 11:32 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: mtosatti, stable, linux-kernel, kvm
Paolo Bonzini <pbonzini@redhat.com> writes:
> We will need a copy of tk->offs_boot in the next patch. Store it and
> cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot
> included, store the raw value in struct pvclock_clock and sum tk->offs_boot
> in do_monotonic_raw and do_realtime. tk->tkr_xxx.xtime_nsec also moves
> to struct pvclock_clock.
>
> While at it, fix a (usually harmless) typo in do_monotonic_raw, which
> was using gtod->clock.shift instead of gtod->raw_clock.shift.
>
> Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/x86.c | 29 ++++++++++++-----------------
> 1 file changed, 12 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 89621025577a..1b4273cce63c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1532,6 +1532,8 @@ struct pvclock_clock {
> u64 mask;
> u32 mult;
> u32 shift;
> + u64 base_cycles;
> + u64 offset;
> };
>
> struct pvclock_gtod_data {
> @@ -1540,11 +1542,8 @@ struct pvclock_gtod_data {
> struct pvclock_clock clock; /* extract of a clocksource struct */
> struct pvclock_clock raw_clock; /* extract of a clocksource struct */
>
> - u64 boot_ns_raw;
> - u64 boot_ns;
> - u64 nsec_base;
> + ktime_t offs_boot;
> u64 wall_time_sec;
> - u64 monotonic_raw_nsec;
> };
>
> static struct pvclock_gtod_data pvclock_gtod_data;
> @@ -1552,10 +1551,6 @@ struct pvclock_gtod_data {
> static void update_pvclock_gtod(struct timekeeper *tk)
> {
> struct pvclock_gtod_data *vdata = &pvclock_gtod_data;
> - u64 boot_ns, boot_ns_raw;
> -
> - boot_ns = ktime_to_ns(ktime_add(tk->tkr_mono.base, tk->offs_boot));
> - boot_ns_raw = ktime_to_ns(ktime_add(tk->tkr_raw.base, tk->offs_boot));
>
> write_seqcount_begin(&vdata->seq);
>
> @@ -1565,20 +1560,20 @@ static void update_pvclock_gtod(struct timekeeper *tk)
> vdata->clock.mask = tk->tkr_mono.mask;
> vdata->clock.mult = tk->tkr_mono.mult;
> vdata->clock.shift = tk->tkr_mono.shift;
> + vdata->clock.base_cycles = tk->tkr_mono.xtime_nsec;
> + vdata->clock.offset = tk->tkr_mono.base;
>
> vdata->raw_clock.vclock_mode = tk->tkr_raw.clock->archdata.vclock_mode;
> vdata->raw_clock.cycle_last = tk->tkr_raw.cycle_last;
> vdata->raw_clock.mask = tk->tkr_raw.mask;
> vdata->raw_clock.mult = tk->tkr_raw.mult;
> vdata->raw_clock.shift = tk->tkr_raw.shift;
> -
> - vdata->boot_ns = boot_ns;
> - vdata->nsec_base = tk->tkr_mono.xtime_nsec;
> + vdata->raw_clock.base_cycles = tk->tkr_raw.xtime_nsec;
> + vdata->raw_clock.offset = tk->tkr_raw.base;
Likely a personal preference but the suggested naming is a bit
confusing: we use 'base_cycles' to keep 'xtime_nsec' and 'offset' to
keep ... 'base'. Not that I think that 'struct timekeeper' is perfect
but at least it is documented. Should we maybe just stick to it (and
name 'struct pvclock_clock' fields accordingly?)
>
> vdata->wall_time_sec = tk->xtime_sec;
>
> - vdata->boot_ns_raw = boot_ns_raw;
> - vdata->monotonic_raw_nsec = tk->tkr_raw.xtime_nsec;
> + vdata->offs_boot = tk->offs_boot;
>
> write_seqcount_end(&vdata->seq);
> }
> @@ -2048,10 +2043,10 @@ static int do_monotonic_raw(s64 *t, u64 *tsc_timestamp)
>
> do {
> seq = read_seqcount_begin(>od->seq);
> - ns = gtod->monotonic_raw_nsec;
> + ns = gtod->raw_clock.base_cycles;
> ns += vgettsc(>od->raw_clock, tsc_timestamp, &mode);
> - ns >>= gtod->clock.shift;
> - ns += gtod->boot_ns_raw;
> + ns >>= gtod->raw_clock.shift;
> + ns += ktime_to_ns(ktime_add(gtod->raw_clock.offset, gtod->offs_boot));
> } while (unlikely(read_seqcount_retry(>od->seq, seq)));
> *t = ns;
>
> @@ -2068,7 +2063,7 @@ static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp)
> do {
> seq = read_seqcount_begin(>od->seq);
> ts->tv_sec = gtod->wall_time_sec;
> - ns = gtod->nsec_base;
> + ns = gtod->clock.base_cycles;
> ns += vgettsc(>od->clock, tsc_timestamp, &mode);
> ns >>= gtod->clock.shift;
> } while (unlikely(read_seqcount_retry(>od->seq, seq)));
FWIW,
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
--
Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reorganize pvclock_gtod_data members
2020-01-23 11:32 ` Vitaly Kuznetsov
@ 2020-01-23 11:35 ` Paolo Bonzini
0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2020-01-23 11:35 UTC (permalink / raw)
To: Vitaly Kuznetsov; +Cc: mtosatti, stable, linux-kernel, kvm
On 23/01/20 12:32, Vitaly Kuznetsov wrote:
> Likely a personal preference but the suggested naming is a bit
> confusing: we use 'base_cycles' to keep 'xtime_nsec' and 'offset' to
> keep ... 'base'. Not that I think that 'struct timekeeper' is perfect
> but at least it is documented. Should we maybe just stick to it (and
> name 'struct pvclock_clock' fields accordingly?)
>
The problem is that xtime_nsec is not nanoseconds, and I'd really not
want to have a worse name just for consistency. :( I chose
"base_cycles" as an incremental improvement over nsec_base, even though
that meant also changing struct timekeeper's "base" to "offset".
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] KVM: x86: use raw clock values consistently
2020-01-22 14:22 ` [PATCH 2/2] KVM: x86: use raw clock values consistently Paolo Bonzini
@ 2020-01-23 13:43 ` Vitaly Kuznetsov
2020-01-23 13:54 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-23 13:43 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: mtosatti, stable, linux-kernel, kvm
Paolo Bonzini <pbonzini@redhat.com> writes:
> Commit 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw
> clock") changed kvmclock to use tkr_raw instead of tkr_mono. However,
> the default kvmclock_offset for the VM was still based on the monotonic
> clock and, if the raw clock drifted enough from the monotonic clock,
> this could cause a negative system_time to be written to the guest's
> struct pvclock. RHEL5 does not like it and (if it boots fast enough to
> observe a negative time value) it hangs.
>
> There is another thing to be careful about: getboottime64 returns the
> host boot time in tkr_mono units, and subtracting tkr_raw units will
> cause the wallclock to be off if tkr_raw drifts from tkr_mono. To
> avoid this, compute the wallclock delta from the current time instead
> of being clever and using getboottime64.
>
> Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/x86.c | 38 +++++++++++++++++++++++---------------
> 1 file changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1b4273cce63c..b5e0648580e1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1577,6 +1577,18 @@ static void update_pvclock_gtod(struct timekeeper *tk)
>
> write_seqcount_end(&vdata->seq);
> }
> +
> +static s64 get_kvmclock_base_ns(void)
> +{
> + /* Count up from boot time, but with the frequency of the raw clock. */
> + return ktime_to_ns(ktime_add(ktime_get_raw(), pvclock_gtod_data.offs_boot));
> +}
> +#else
> +static s64 get_kvmclock_base_ns(void)
> +{
> + /* Master clock not used, so we can just use CLOCK_BOOTTIME. */
> + return ktime_get_boottime_ns();
> +}
> #endif
But we could've still used the RAW+offs_boot version, right? And this is
just to basically preserve the existing behavior on !x86.
>
> void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
> @@ -1590,7 +1602,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
> int version;
> int r;
> struct pvclock_wall_clock wc;
> - struct timespec64 boot;
> + u64 wall_nsec;
>
> if (!wall_clock)
> return;
> @@ -1610,17 +1622,12 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
> /*
> * The guest calculates current wall clock time by adding
> * system time (updated by kvm_guest_time_update below) to the
> - * wall clock specified here. guest system time equals host
> - * system time for us, thus we must fill in host boot time here.
> + * wall clock specified here. We do the reverse here.
> */
> - getboottime64(&boot);
> + wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(kvm);
There are not that many hosts with more than 50 years uptime and likely
none running Linux with live kernel patching support so I bet noone will
ever see this overflowing, however, as wall_nsec is u64 and we're
dealing with kvmclock here I'd suggest to add a WARN_ON().
>
> - if (kvm->arch.kvmclock_offset) {
> - struct timespec64 ts = ns_to_timespec64(kvm->arch.kvmclock_offset);
> - boot = timespec64_sub(boot, ts);
> - }
> - wc.sec = (u32)boot.tv_sec; /* overflow in 2106 guest time */
> - wc.nsec = boot.tv_nsec;
> + wc.nsec = do_div(wall_nsec, 1000000000);
> + wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
> wc.version = version;
>
> kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
> @@ -1868,7 +1875,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
>
> raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
> offset = kvm_compute_tsc_offset(vcpu, data);
> - ns = ktime_get_boottime_ns();
> + ns = get_kvmclock_base_ns();
> elapsed = ns - kvm->arch.last_tsc_nsec;
>
> if (vcpu->arch.virtual_tsc_khz) {
> @@ -2206,7 +2213,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
> spin_lock(&ka->pvclock_gtod_sync_lock);
> if (!ka->use_master_clock) {
> spin_unlock(&ka->pvclock_gtod_sync_lock);
> - return ktime_get_boottime_ns() + ka->kvmclock_offset;
> + return get_kvmclock_base_ns() + ka->kvmclock_offset;
> }
>
> hv_clock.tsc_timestamp = ka->master_cycle_now;
> @@ -2222,7 +2229,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
> &hv_clock.tsc_to_system_mul);
> ret = __pvclock_read_cycles(&hv_clock, rdtsc());
> } else
> - ret = ktime_get_boottime_ns() + ka->kvmclock_offset;
> + ret = get_kvmclock_base_ns() + ka->kvmclock_offset;
>
> put_cpu();
>
> @@ -2321,7 +2328,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> }
> if (!use_master_clock) {
> host_tsc = rdtsc();
> - kernel_ns = ktime_get_boottime_ns();
> + kernel_ns = get_kvmclock_base_ns();
> }
>
> tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
> @@ -2361,6 +2368,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
> vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
> vcpu->last_guest_tsc = tsc_timestamp;
> + WARN_ON(vcpu->hv_clock.system_time < 0);
>
> /* If the host uses TSC clocksource, then it is stable */
> pvclock_flags = 0;
> @@ -9473,7 +9481,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> mutex_init(&kvm->arch.apic_map_lock);
> spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
>
> - kvm->arch.kvmclock_offset = -ktime_get_boottime_ns();
> + kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
> pvclock_update_vm_gtod_copy(kvm);
>
> kvm->arch.guest_can_read_msr_platform_info = true;
This looks correct to me but kvmclock is a glorious beast so take this
with a grain of salt)
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
--
Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] KVM: x86: use raw clock values consistently
2020-01-23 13:43 ` Vitaly Kuznetsov
@ 2020-01-23 13:54 ` Paolo Bonzini
0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2020-01-23 13:54 UTC (permalink / raw)
To: Vitaly Kuznetsov; +Cc: mtosatti, stable, linux-kernel, kvm
On 23/01/20 14:43, Vitaly Kuznetsov wrote:
>> +
>> +static s64 get_kvmclock_base_ns(void)
>> +{
>> + /* Count up from boot time, but with the frequency of the raw clock. */
>> + return ktime_to_ns(ktime_add(ktime_get_raw(), pvclock_gtod_data.offs_boot));
>> +}
>> +#else
>> +static s64 get_kvmclock_base_ns(void)
>> +{
>> + /* Master clock not used, so we can just use CLOCK_BOOTTIME. */
>> + return ktime_get_boottime_ns();
>> +}
>> #endif
> But we could've still used the RAW+offs_boot version, right? And this is
> just to basically preserve the existing behavior on !x86.
Yes, there's no reason to restrict the pvclock_gtod notifier to x86_64.
But this is stable material so I kept it easy.
>>
>> - getboottime64(&boot);
>> + wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(kvm);
>
> There are not that many hosts with more than 50 years uptime and likely
> none running Linux with live kernel patching support so I bet noone will
> ever see this overflowing, however, as wall_nsec is u64 and we're
> dealing with kvmclock here I'd suggest to add a WARN_ON().
You're off by a factor of 10, 2^64 nanoseconds are about 584 years
(584*365*10^9*86400). :)
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-01-23 13:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1579702953-24184-1-git-send-email-pbonzini@redhat.com>
2020-01-22 14:22 ` [PATCH 1/2] KVM: x86: reorganize pvclock_gtod_data members Paolo Bonzini
2020-01-23 11:32 ` Vitaly Kuznetsov
2020-01-23 11:35 ` Paolo Bonzini
2020-01-22 14:22 ` [PATCH 2/2] KVM: x86: use raw clock values consistently Paolo Bonzini
2020-01-23 13:43 ` Vitaly Kuznetsov
2020-01-23 13:54 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).