kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] read kvmclock from guest memory if !correct_tsc_shift
@ 2023-01-20  1:11 Marcelo Tosatti
  2023-01-20  1:11 ` [PATCH 1/2] linux-headers: sync KVM_CLOCK_CORRECT_TSC_SHIFT flag Marcelo Tosatti
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2023-01-20  1:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, kvm

Before kernel commit 78db6a5037965429c04d708281f35a6e5562d31b,
kvm_guest_time_update() would use vcpu->virtual_tsc_khz to calculate
tsc_shift value in the vcpus pvclock structure written to guest memory.

For those kernels, if vcpu->virtual_tsc_khz != tsc_khz (which can be the
case when guest state is restored via migration, or if tsc-khz option is
passed to QEMU), and TSC scaling is not enabled (which happens if the
difference between the frequency requested via KVM_SET_TSC_KHZ and the
host TSC KHZ is smaller than 250ppm), then there can be a difference
between what KVM_GET_CLOCK would return and what the guest reads as
kvmclock value.

The effect is that the guest sees a jump in kvmclock value
(either forwards or backwards) in such case.

To fix incoming migration from pre-78db6a5037965 hosts,
read kvmclock value from guest memory.

Unless the KVM_CLOCK_CORRECT_TSC_SHIFT bit indicates
that the value retrieved by KVM_GET_CLOCK on the source
is safe to be used.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] linux-headers: sync KVM_CLOCK_CORRECT_TSC_SHIFT flag
  2023-01-20  1:11 [PATCH 0/2] read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
@ 2023-01-20  1:11 ` Marcelo Tosatti
  2023-01-20  1:11 ` [PATCH 2/2] hw/i386/kvm/clock.c: read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
  2023-01-20  8:54 ` [PATCH 0/2] " Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2023-01-20  1:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, kvm, Marcelo Tosatti

Sync new KVM_CLOCK_CORRECT_TSC_SHIFT from upstream Linux headers.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: qemu/linux-headers/linux/kvm.h
===================================================================
--- qemu.orig/linux-headers/linux/kvm.h
+++ qemu/linux-headers/linux/kvm.h
@@ -1300,6 +1300,9 @@ struct kvm_irqfd {
 #define KVM_CLOCK_TSC_STABLE		2
 #define KVM_CLOCK_REALTIME		(1 << 2)
 #define KVM_CLOCK_HOST_TSC		(1 << 3)
+/* whether tsc_shift as seen by the guest matches guest visible TSC */
+/* This is true since commit 78db6a5037965429c04d708281f35a6e5562d31b */
+#define KVM_CLOCK_CORRECT_TSC_SHIFT	(1 << 4)
 
 struct kvm_clock_data {
 	__u64 clock;



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] hw/i386/kvm/clock.c: read kvmclock from guest memory if !correct_tsc_shift
  2023-01-20  1:11 [PATCH 0/2] read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
  2023-01-20  1:11 ` [PATCH 1/2] linux-headers: sync KVM_CLOCK_CORRECT_TSC_SHIFT flag Marcelo Tosatti
@ 2023-01-20  1:11 ` Marcelo Tosatti
  2023-01-20  8:54 ` [PATCH 0/2] " Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2023-01-20  1:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, kvm, Marcelo Tosatti

Before kernel commit 78db6a5037965429c04d708281f35a6e5562d31b,
kvm_guest_time_update() would use vcpu->virtual_tsc_khz to calculate
tsc_shift value in the vcpus pvclock structure written to guest memory.

For those kernels, if vcpu->virtual_tsc_khz != tsc_khz (which can be the
case when guest state is restored via migration, or if tsc-khz option is
passed to QEMU), and TSC scaling is not enabled (which happens if the
difference between the frequency requested via KVM_SET_TSC_KHZ and the
host TSC KHZ is smaller than 250ppm), then there can be a difference
between what KVM_GET_CLOCK would return and what the guest reads as
kvmclock value.

The effect is that the guest sees a jump in kvmclock value
(either forwards or backwards) in such case.

To fix incoming migration from pre-78db6a5037965 hosts, 
read kvmclock value from guest memory.

Unless the KVM_CLOCK_CORRECT_TSC_SHIFT bit indicates
that the value retrieved by KVM_GET_CLOCK on the source
is safe to be used.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: qemu/hw/i386/kvm/clock.c
===================================================================
--- qemu.orig/hw/i386/kvm/clock.c
+++ qemu/hw/i386/kvm/clock.c
@@ -50,6 +50,16 @@ struct KVMClockState {
     /* whether the 'clock' value was obtained in a host with
      * reliable KVM_GET_CLOCK */
     bool clock_is_reliable;
+
+    /* whether machine type supports correct_tsc_shift */
+    bool mach_use_correct_tsc_shift;
+
+    /*
+     * whether the 'clock' value was obtained in a host
+     * that computes correct tsc_shift field (the one
+     * written to guest memory)
+     */
+    bool clock_correct_tsc_shift;
 };
 
 struct pvclock_vcpu_time_info {
@@ -150,6 +160,8 @@ static void kvm_update_clock(KVMClockSta
      *               read from memory
      */
     s->clock_is_reliable = kvm_has_adjust_clock_stable();
+
+    s->clock_correct_tsc_shift = kvm_has_correct_tsc_shift();
 }
 
 static void do_kvmclock_ctrl(CPUState *cpu, run_on_cpu_data data)
@@ -176,7 +188,7 @@ static void kvmclock_vm_state_change(voi
          * If the host where s->clock was read did not support reliable
          * KVM_GET_CLOCK, read kvmclock value from memory.
          */
-        if (!s->clock_is_reliable) {
+        if (!s->clock_is_reliable || !s->clock_correct_tsc_shift) {
             uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
             /* We can't rely on the saved clock value, just discard it */
             if (pvclock_via_mem) {
@@ -252,14 +264,40 @@ static const VMStateDescription kvmclock
 };
 
 /*
+ * Sending clock_correct_tsc_shift=true means that the destination
+ * can use VMSTATE_UINT64(clock, KVMClockState) value,
+ * instead of reading from guest memory.
+ */
+static bool kvmclock_clock_correct_tsc_shift_needed(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    return s->mach_use_correct_tsc_shift;
+}
+
+static const VMStateDescription kvmclock_correct_tsc_shift = {
+    .name = "kvmclock/clock_correct_tsc_shift",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = kvmclock_clock_correct_tsc_shift_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_BOOL(clock_correct_tsc_shift, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+/*
  * When migrating, assume the source has an unreliable
- * KVM_GET_CLOCK unless told otherwise.
+ * KVM_GET_CLOCK (and computes tsc shift
+ * in guest memory using vcpu->virtual_tsc_khz),
+ * unless told otherwise.
  */
 static int kvmclock_pre_load(void *opaque)
 {
     KVMClockState *s = opaque;
 
     s->clock_is_reliable = false;
+    s->clock_correct_tsc_shift = false;
 
     return 0;
 }
@@ -301,6 +339,7 @@ static const VMStateDescription kvmclock
     },
     .subsections = (const VMStateDescription * []) {
         &kvmclock_reliable_get_clock,
+        &kvmclock_correct_tsc_shift,
         NULL
     }
 };
@@ -308,6 +347,8 @@ static const VMStateDescription kvmclock
 static Property kvmclock_properties[] = {
     DEFINE_PROP_BOOL("x-mach-use-reliable-get-clock", KVMClockState,
                       mach_use_reliable_get_clock, true),
+    DEFINE_PROP_BOOL("x-mach-use-correct-tsc-shift", KVMClockState,
+                      mach_use_correct_tsc_shift, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
Index: qemu/target/i386/kvm/kvm.c
===================================================================
--- qemu.orig/target/i386/kvm/kvm.c
+++ qemu/target/i386/kvm/kvm.c
@@ -164,6 +164,13 @@ bool kvm_has_adjust_clock_stable(void)
     return (ret & KVM_CLOCK_TSC_STABLE);
 }
 
+bool kvm_has_correct_tsc_shift(void)
+{
+    int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
+
+    return ret & KVM_CLOCK_CORRECT_TSC_SHIFT;
+}
+
 bool kvm_has_adjust_clock(void)
 {
     return kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
Index: qemu/target/i386/kvm/kvm_i386.h
===================================================================
--- qemu.orig/target/i386/kvm/kvm_i386.h
+++ qemu/target/i386/kvm/kvm_i386.h
@@ -35,6 +35,7 @@
 bool kvm_has_smm(void);
 bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
+bool kvm_has_correct_tsc_shift(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
 void kvm_arch_reset_vcpu(X86CPU *cs);



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] read kvmclock from guest memory if !correct_tsc_shift
  2023-01-20  1:11 [PATCH 0/2] read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
  2023-01-20  1:11 ` [PATCH 1/2] linux-headers: sync KVM_CLOCK_CORRECT_TSC_SHIFT flag Marcelo Tosatti
  2023-01-20  1:11 ` [PATCH 2/2] hw/i386/kvm/clock.c: read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
@ 2023-01-20  8:54 ` Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2023-01-20  8:54 UTC (permalink / raw)
  To: Marcelo Tosatti, qemu-devel; +Cc: kvm

On 1/20/23 02:11, Marcelo Tosatti wrote:
> Before kernel commit 78db6a5037965429c04d708281f35a6e5562d31b,
> kvm_guest_time_update() would use vcpu->virtual_tsc_khz to calculate
> tsc_shift value in the vcpus pvclock structure written to guest memory.

To clarify, the problem is that kvm_guest_time_update() uses the guest
TSC frequency *that userspace desired* instead of the *actual* TSC 
frequency.  Because, within the 250 ppm tolerance, TSC scaling is not 
enabled, the guest kvmclock is incorrect; KVM_GET_CLOCK instead returns 
the correct value, and the bug occurs when migrating from a host that is 
publishing a buggy kvmclock to the guest.

> For those kernels, if vcpu->virtual_tsc_khz != tsc_khz (which can be the
> case when guest state is restored via migration, or if tsc-khz option is
> passed to QEMU), and TSC scaling is not enabled (which happens if the
> difference between the frequency requested via KVM_SET_TSC_KHZ and the
> host TSC KHZ is smaller than 250ppm), then there can be a difference
> between what KVM_GET_CLOCK would return and what the guest reads as
> kvmclock value.

In practice, to trigger the bug you need to do two migrations from a 
six-year-old kernel; I just can't see too many people stumbling upon 
this in the wild, and I don't think it makes sense to hobble _all_ 
migrations from a kernel that is less than six years old for such an 
edge case.  New versions of QEMU do not even support running with such 
old kernels (it will for example complain about no support for certain 
KVM PV features).

It is not a huge request for the user to know if they are in the 
problematic case.  It is easiest to use a custom QEMU on the 
destination, and always compute the kvmclock value from memory if the 
page is valid.

Once you do a migration to the custom QEMU + a fixed kernel, the bug is 
gone for good and there is no need to introduce new user API for that.

Paolo

> The effect is that the guest sees a jump in kvmclock value
> (either forwards or backwards) in such case.
> 
> To fix incoming migration from pre-78db6a5037965 hosts,
> read kvmclock value from guest memory.
> 
> Unless the KVM_CLOCK_CORRECT_TSC_SHIFT bit indicates
> that the value retrieved by KVM_GET_CLOCK on the source
> is safe to be used.
> 
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-01-20  8:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-20  1:11 [PATCH 0/2] read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
2023-01-20  1:11 ` [PATCH 1/2] linux-headers: sync KVM_CLOCK_CORRECT_TSC_SHIFT flag Marcelo Tosatti
2023-01-20  1:11 ` [PATCH 2/2] hw/i386/kvm/clock.c: read kvmclock from guest memory if !correct_tsc_shift Marcelo Tosatti
2023-01-20  8:54 ` [PATCH 0/2] " Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).