All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Allister <jalliste@amazon.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>, <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Paul Durrant <paul@xen.org>,
	"Jack Allister" <jalliste@amazon.com>, <kvm@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for KVM clock drift fixup
Date: Mon, 8 Apr 2024 22:07:03 +0000	[thread overview]
Message-ID: <20240408220705.7637-2-jalliste@amazon.com> (raw)
In-Reply-To: <20240408220705.7637-1-jalliste@amazon.com>

There is a potential for drift between the TSC and a KVM/PV clock when the
guest TSC is scaled (as seen previously in [1]). Which fixed drift between
timers over the lifetime of a VM.

However, there is another factor which will cause a drift. In a situation
such as a kexec/live-update of the kernel or a live-migration of a VM the
PV clock information is recalculated by KVM (KVM_REQ_MASTERCLOCK_UPDATE).
This update samples a new system_time & tsc_timestamp to be used in the
structure.

For example, when a guest is running with a TSC frequency of 1.5GHz but the
host frequency is 3.0GHz upon an update of the PV time information a delta
of ~3500ns is observed between the TSC and the KVM/PV clock. There is no
reason why a fixup creating an accuracy of ±1ns cannot be achieved.

Additional interfaces are added to retrieve & fixup the PV time information
when a VMM may believe is appropriate (deserialization after live-update/
migration). KVM_GET_CLOCK_GUEST can be used for the VMM to retrieve the
currently used PV time information and then when the VMM believes a drift
may occur can then instruct KVM to perform a correction via the setter
KVM_SET_CLOCK_GUEST.

The KVM_SET_CLOCK_GUEST ioctl works under the following premise. The host
TSC & kernel timstamp are sampled at a singular point in time. Using the
already known scaling/offset for L1 the guest TSC is then derived from this
information.

From here two PV time information structures are created, one which is the
original time information structure prior to whatever may have caused a PV
clock re-calculation (live-update/migration). The second is then using the
singular point in time sampled just prior. An individual KVM/PV clock for
each of the PV time information structures using the singular guest TSC.

A delta is then determined between the two calculated PV times, which is
then used as a correction offset added onto the kvmclock_offset for the VM.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=451a707813ae

Suggested-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jack Allister <jalliste@amazon.com>
CC: Paul Durrant <paul@xen.org>
---
 Documentation/virt/kvm/api.rst | 43 +++++++++++++++++
 arch/x86/kvm/x86.c             | 87 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  3 ++
 3 files changed, 133 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..5f74d8ac1002 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6352,6 +6352,49 @@ a single guest_memfd file, but the bound ranges must not overlap).
 
 See KVM_SET_USER_MEMORY_REGION2 for additional details.
 
+4.143 KVM_GET_CLOCK_GUEST
+----------------------------
+
+:Capability: none
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct pvclock_vcpu_time_info (out)
+:Returns: 0 on success, <0 on error
+
+Retrieves the current time information structure used for KVM/PV clocks.
+On x86 a PV clock is derived from the current TSC and is then scaled based
+upon the a specified multiplier and shift. The result of this is then added
+to a system time.
+
+The guest needs a way to determine the system time, multiplier and shift. This
+can be done by multiple ways, for KVM guests this can be via an MSR write to
+MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW which defines the guest physical
+address KVM shall put the structure. On Xen guests this can be found in the Xen
+vcpu_info.
+
+This is structure is useful information for a VMM to also know when taking into
+account potential timer drift on live-update/migration.
+
+4.144 KVM_SET_CLOCK_GUEST
+----------------------------
+
+:Capability: none
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct pvclock_vcpu_time_info (in)
+:Returns: 0 on success, <0 on error
+
+Triggers KVM to perform a correction of the KVM/PV clock structure based upon a
+known prior PV clock structure (see KVM_GET_CLOCK_GUEST).
+
+If a VM is utilizing TSC scaling there is a potential for a drift between the
+KVM/PV clock and the TSC itself. This is due to the loss of precision when
+performing a multiply and shift rather than divide for the TSC.
+
+To perform the correction a delta is calculated between the original time info
+(which is assumed correct) at a singular point in time X. The KVM clock offset
+is then offset by this delta.
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47d9f03b7778..5d2e10cd1c30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6988,6 +6988,87 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
 	return 0;
 }
 
+static struct kvm_vcpu *kvm_get_bsp_vcpu(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; i++) {
+		vcpu = kvm_get_vcpu_by_id(kvm, i);
+		if (!vcpu)
+			continue;
+
+		if (kvm_vcpu_is_reset_bsp(vcpu))
+			break;
+	}
+
+	return vcpu;
+}
+
+static int kvm_vm_ioctl_get_clock_guest(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kvm_get_bsp_vcpu(kvm);
+	if (!vcpu)
+		return -EINVAL;
+
+	if (!vcpu->arch.hv_clock.tsc_timestamp || !vcpu->arch.hv_clock.system_time)
+		return -EIO;
+
+	if (copy_to_user(argp, &vcpu->arch.hv_clock, sizeof(vcpu->arch.hv_clock)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_vm_ioctl_set_clock_guest(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_vcpu *vcpu;
+	struct pvclock_vcpu_time_info orig_pvti;
+	struct pvclock_vcpu_time_info dummy_pvti;
+	int64_t kernel_ns;
+	uint64_t host_tsc, guest_tsc;
+	uint64_t clock_orig, clock_dummy;
+	int64_t correction;
+	unsigned long i;
+
+	vcpu = kvm_get_bsp_vcpu(kvm);
+	if (!vcpu)
+		return -EINVAL;
+
+	if (copy_from_user(&orig_pvti, argp, sizeof(orig_pvti)))
+		return -EFAULT;
+
+	/*
+	 * Sample the kernel time and host TSC at a singular point.
+	 * We then calculate the guest TSC using this exact point in time,
+	 * From here we can then determine the delta using the
+	 * PV time info requested from the user and what we currently have
+	 * using the fixed point in time. This delta is then used as a
+	 * correction factor to fixup the potential drift.
+	 */
+	if (!kvm_get_time_and_clockread(&kernel_ns, &host_tsc))
+		return -EFAULT;
+
+	guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
+
+	dummy_pvti = orig_pvti;
+	dummy_pvti.tsc_timestamp = guest_tsc;
+	dummy_pvti.system_time = kernel_ns + kvm->arch.kvmclock_offset;
+
+	clock_orig = __pvclock_read_cycles(&orig_pvti, guest_tsc);
+	clock_dummy = __pvclock_read_cycles(&dummy_pvti, guest_tsc);
+
+	correction = clock_orig - clock_dummy;
+	kvm->arch.kvmclock_offset += correction;
+
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+
+	return 0;
+}
+
 int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	struct kvm *kvm = filp->private_data;
@@ -7246,6 +7327,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 	case KVM_GET_CLOCK:
 		r = kvm_vm_ioctl_get_clock(kvm, argp);
 		break;
+	case KVM_SET_CLOCK_GUEST:
+		r = kvm_vm_ioctl_set_clock_guest(kvm, argp);
+		break;
+	case KVM_GET_CLOCK_GUEST:
+		r = kvm_vm_ioctl_get_clock_guest(kvm, argp);
+		break;
 	case KVM_SET_TSC_KHZ: {
 		u32 user_tsc_khz;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..0d306311e4d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1548,4 +1548,7 @@ struct kvm_create_guest_memfd {
 	__u64 reserved[6];
 };
 
+#define KVM_SET_CLOCK_GUEST       _IOW(KVMIO,  0xd5, struct pvclock_vcpu_time_info)
+#define KVM_GET_CLOCK_GUEST       _IOR(KVMIO,  0xd6, struct pvclock_vcpu_time_info)
+
 #endif /* __LINUX_KVM_H */
-- 
2.40.1


  reply	other threads:[~2024-04-08 22:07 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08 22:07 [PATCH 0/2] Add API to correct KVM/PV clock drift Jack Allister
2024-04-08 22:07 ` Jack Allister [this message]
2024-04-09  0:34   ` [PATCH 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for KVM clock drift fixup Dongli Zhang
2024-04-09  3:50     ` David Woodhouse
2024-04-10 10:08     ` Allister, Jack
2024-04-08 22:07 ` [PATCH 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer drift correction Jack Allister
2024-04-09  0:43   ` Dongli Zhang
2024-04-09  4:23     ` David Woodhouse
2024-04-10 10:15     ` Allister, Jack
2024-04-11 13:28       ` David Woodhouse
2024-04-19 17:13   ` Chen, Zide
     [not found]     ` <17F1A2E9-6BAD-40E7-ACDD-B110CFC124B3@infradead.org>
2024-04-19 18:43       ` David Woodhouse
2024-04-19 23:54         ` Chen, Zide
2024-04-20 10:32           ` David Woodhouse
2024-04-20 16:03           ` David Woodhouse
2024-04-22 22:02             ` Chen, Zide
2024-04-23  7:49               ` David Woodhouse
2024-04-23 17:59                 ` Chen, Zide
2024-04-23 21:02                   ` David Woodhouse
2024-04-24 12:58               ` David Woodhouse
2024-04-19 19:34     ` David Woodhouse
2024-04-19 23:53       ` Chen, Zide
2024-04-10  9:52 ` [PATCH v2 0/2] Add API for accurate KVM/PV clock migration Jack Allister
2024-04-10  9:52   ` [PATCH v2 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM " Jack Allister
2024-04-10 10:29     ` Paul Durrant
2024-04-10 12:09       ` David Woodhouse
2024-04-10 12:43         ` Paul Durrant
2024-04-17 19:50           ` David Woodhouse
2024-04-15  7:16     ` David Woodhouse
2024-04-10  9:52   ` [PATCH v2 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer correction Jack Allister
2024-04-10 10:36     ` Paul Durrant
2024-04-12  8:19     ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240408220705.7637-2-jalliste@amazon.com \
    --to=jalliste@amazon.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.