All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/2] kvm: Use host timekeeping in guest.
@ 2019-10-10  7:30 Suleiman Souhlal
  2019-10-10  7:30 ` [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Suleiman Souhlal @ 2019-10-10  7:30 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets,
	Suleiman Souhlal

This RFC is to try to solve the following problem:

We have some applications that are currently running in their
own namespace, that still talk to other processes on the
machine, using IPC, and expect to run on the same machine.

We want to move them into a virtual machine, for the usual
benefits of virtualization.

However, some of these programs use CLOCK_MONOTONIC and
CLOCK_BOOTTIME timestamps, as part of their protocol, when talking
to the host.

Generally speaking, we have multiple event sources, for example
sensors, input devices, display controller vsync, etc and we would
like to rely on them in the guest for various scenarios.

As a specific example, we are trying to run some wayland clients
(in the guest) who talk to the server (in the host), and the server
gives input events based on host time. Additionally, there are also
vsync events that the clients use for timing their rendering.

Another use case we have are timestamps from IIO sensors and cameras.
There are applications that need to determine how the timestamps
relate to the current time and the only way to get current time is
clock_gettime(), which would return a value from a different time
domain than the timestamps.

In this case, it is not feasible to change these programs, due to
the number of the places we would have to change.

We spent some time thinking about this, and the best solution we
could come up with was the following:

Make the guest kernel return the same CLOCK_MONOTONIC and
CLOCK_GETTIME timestamps as the host.

To do that, I am changing kvmclock to request to the host to copy
its timekeeping parameters (mult, base, cycle_last, etc), so that
the guest timekeeper can use the same values, so that time can
be synchronized between the guest and the host.

Any suggestions or feedback would be highly appreciated.

Changes in v2:
- Move out of kvmclock and into its own clocksource and file.
- Remove timekeeping.c #ifdefs.
- Fix i386 build.

Suleiman Souhlal (2):
  kvm: Mechanism to copy host timekeeping parameters into guest.
  x86/kvmclock: Introduce kvm-hostclock clocksource.

 arch/x86/Kconfig                     |   9 ++
 arch/x86/include/asm/kvm_host.h      |   3 +
 arch/x86/include/asm/kvmclock.h      |  12 +++
 arch/x86/include/asm/pvclock-abi.h   |  27 ++++++
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kernel/Makefile             |   2 +
 arch/x86/kernel/kvmclock.c           |   5 +-
 arch/x86/kernel/kvmhostclock.c       | 130 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                   | 121 +++++++++++++++++++++++++
 include/linux/timekeeper_internal.h  |   8 ++
 kernel/time/timekeeping.c            |   2 +
 11 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/kvmhostclock.c

-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest.
  2019-10-10  7:30 [RFC v2 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
@ 2019-10-10  7:30 ` Suleiman Souhlal
  2019-10-10  8:31   ` Vitaly Kuznetsov
  2019-10-10 10:50   ` Paolo Bonzini
  2019-10-10  7:30 ` [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource Suleiman Souhlal
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Suleiman Souhlal @ 2019-10-10  7:30 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets,
	Suleiman Souhlal

This is used to synchronize time between host and guest.
The guest can request the (guest) physical address it wants the
data in through the MSR_KVM_TIMEKEEPER_EN MSR.

It currently assumes the host timekeeper is "tsc".

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
---
 arch/x86/include/asm/kvm_host.h      |   3 +
 arch/x86/include/asm/pvclock-abi.h   |  27 ++++++
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kvm/x86.c                   | 121 +++++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 50eb430b0ad8..4d622450cb4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -659,7 +659,10 @@ struct kvm_vcpu_arch {
 	struct pvclock_vcpu_time_info hv_clock;
 	unsigned int hw_tsc_khz;
 	struct gfn_to_hva_cache pv_time;
+	struct gfn_to_hva_cache pv_timekeeper_g2h;
+	struct pvclock_timekeeper pv_timekeeper;
 	bool pv_time_enabled;
+	bool pv_timekeeper_enabled;
 	/* set guest stopped flag in pvclock flags field */
 	bool pvclock_set_guest_stopped_request;
 
diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index 1436226efe3e..2809008b9b26 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,6 +40,33 @@ struct pvclock_wall_clock {
 	u32   nsec;
 } __attribute__((__packed__));
 
+struct pvclock_read_base {
+	u64 mask;
+	u64 cycle_last;
+	u32 mult;
+	u32 shift;
+	u64 xtime_nsec;
+	u64 base;
+} __attribute__((__packed__));
+
+struct pvclock_timekeeper {
+	u64 gen;
+	u64 flags;
+	struct pvclock_read_base tkr_mono;
+	struct pvclock_read_base tkr_raw;
+	u64 xtime_sec;
+	u64 ktime_sec;
+	u64 wall_to_monotonic_sec;
+	u64 wall_to_monotonic_nsec;
+	u64 offs_real;
+	u64 offs_boot;
+	u64 offs_tai;
+	u64 raw_sec;
+	u64 tsc_offset;
+} __attribute__((__packed__));
+
+#define	PVCLOCK_TIMEKEEPER_ENABLED (1 << 0)
+
 #define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
 #define PVCLOCK_GUEST_STOPPED	(1 << 1)
 /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..3ebb1d87db3a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,7 @@
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
+#define MSR_KVM_TIMEKEEPER_EN	0x4b564d06
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 661e2bf38526..937f83cdda4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+static atomic_t pv_timekeepers_nr;
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -2729,6 +2731,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		break;
 	}
+	case MSR_KVM_TIMEKEEPER_EN:
+		if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
+		    &vcpu->arch.pv_timekeeper_g2h, data,
+		    sizeof(struct pvclock_timekeeper)))
+			vcpu->arch.pv_timekeeper_enabled = false;
+		else {
+			vcpu->arch.pv_timekeeper_enabled = true;
+			atomic_inc(&pv_timekeepers_nr);
+		}
+		break;
 	case MSR_KVM_ASYNC_PF_EN:
 		if (kvm_pv_enable_async_pf(vcpu, data))
 			return 1;
@@ -7097,6 +7109,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 };
 
 #ifdef CONFIG_X86_64
+static DEFINE_SPINLOCK(shadow_pvtk_lock);
+static struct pvclock_timekeeper shadow_pvtk;
+
+static void
+pvclock_copy_read_base(struct pvclock_read_base *pvtkr,
+    struct tk_read_base *tkr)
+{
+	pvtkr->cycle_last = tkr->cycle_last;
+	pvtkr->mult = tkr->mult;
+	pvtkr->shift = tkr->shift;
+	pvtkr->mask = tkr->mask;
+	pvtkr->xtime_nsec = tkr->xtime_nsec;
+	pvtkr->base = tkr->base;
+}
+
+static void
+kvm_copy_into_pvtk(struct kvm_vcpu *vcpu)
+{
+	struct pvclock_timekeeper *pvtk;
+	unsigned long flags;
+
+	if (!vcpu->arch.pv_timekeeper_enabled)
+		return;
+
+	pvtk = &vcpu->arch.pv_timekeeper;
+	if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC) {
+		pvtk->flags |= PVCLOCK_TIMEKEEPER_ENABLED;
+		spin_lock_irqsave(&shadow_pvtk_lock, flags);
+		pvtk->tkr_mono = shadow_pvtk.tkr_mono;
+		pvtk->tkr_raw = shadow_pvtk.tkr_raw;
+
+		pvtk->xtime_sec = shadow_pvtk.xtime_sec;
+		pvtk->ktime_sec = shadow_pvtk.ktime_sec;
+		pvtk->wall_to_monotonic_sec =
+		    shadow_pvtk.wall_to_monotonic_sec;
+		pvtk->wall_to_monotonic_nsec =
+		    shadow_pvtk.wall_to_monotonic_nsec;
+		pvtk->offs_real = shadow_pvtk.offs_real;
+		pvtk->offs_boot = shadow_pvtk.offs_boot;
+		pvtk->offs_tai = shadow_pvtk.offs_tai;
+		pvtk->raw_sec = shadow_pvtk.raw_sec;
+		spin_unlock_irqrestore(&shadow_pvtk_lock, flags);
+
+		pvtk->tsc_offset = kvm_x86_ops->read_l1_tsc_offset(vcpu);
+	} else
+		pvtk->flags &= ~PVCLOCK_TIMEKEEPER_ENABLED;
+
+	BUILD_BUG_ON(offsetof(struct pvclock_timekeeper, gen) != 0);
+
+	/*
+	 * Make the gen count odd to indicate we are in the process of
+	 * updating.
+	 */
+	vcpu->arch.pv_timekeeper.gen++;
+	vcpu->arch.pv_timekeeper.gen |= 1;
+
+	/*
+	 * See comment in kvm_guest_time_update() for why we have to do
+	 * multiple writes.
+	 */
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
+	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen));
+
+	smp_wmb();
+
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
+	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper));
+
+	smp_wmb();
+
+	vcpu->arch.pv_timekeeper.gen++;
+
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
+	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen));
+}
+
+static void
+update_shadow_pvtk(struct timekeeper *tk)
+{
+	struct pvclock_timekeeper *pvtk;
+	unsigned long flags;
+
+	pvtk = &shadow_pvtk;
+
+	if (atomic_read(&pv_timekeepers_nr) == 0 ||
+	    pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+		return;
+
+	spin_lock_irqsave(&shadow_pvtk_lock, flags);
+	pvclock_copy_read_base(&pvtk->tkr_mono, &tk->tkr_mono);
+	pvclock_copy_read_base(&pvtk->tkr_raw, &tk->tkr_raw);
+
+	pvtk->xtime_sec = tk->xtime_sec;
+	pvtk->ktime_sec = tk->ktime_sec;
+	pvtk->wall_to_monotonic_sec = tk->wall_to_monotonic.tv_sec;
+	pvtk->wall_to_monotonic_nsec = tk->wall_to_monotonic.tv_nsec;
+	pvtk->offs_real = tk->offs_real;
+	pvtk->offs_boot = tk->offs_boot;
+	pvtk->offs_tai = tk->offs_tai;
+	pvtk->raw_sec = tk->raw_sec;
+	spin_unlock_irqrestore(&shadow_pvtk_lock, flags);
+}
+
 static void pvclock_gtod_update_fn(struct work_struct *work)
 {
 	struct kvm *kvm;
@@ -7124,6 +7239,7 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
 	struct timekeeper *tk = priv;
 
 	update_pvclock_gtod(tk);
+	update_shadow_pvtk(tk);
 
 	/* disable master clock if host does not trust, or does not
 	 * use, TSC based clocksource.
@@ -7940,6 +8056,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
+	kvm_copy_into_pvtk(vcpu);
+
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu))
 			kvm_x86_ops->get_vmcs12_pages(vcpu);
@@ -9020,6 +9138,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 
 	kvmclock_reset(vcpu);
 
+	if (vcpu->arch.pv_timekeeper_enabled)
+		atomic_dec(&pv_timekeepers_nr);
+
 	kvm_x86_ops->vcpu_free(vcpu);
 	free_cpumask_var(wbinvd_dirty_mask);
 }
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.
  2019-10-10  7:30 [RFC v2 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
  2019-10-10  7:30 ` [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
@ 2019-10-10  7:30 ` Suleiman Souhlal
  2019-10-10  8:37   ` Vitaly Kuznetsov
  2019-10-10 10:55   ` Paolo Bonzini
  2019-10-10 10:39 ` [RFC v2 0/2] kvm: Use host timekeeping in guest Roman Kagan
  2019-10-10 10:43 ` Paolo Bonzini
  3 siblings, 2 replies; 13+ messages in thread
From: Suleiman Souhlal @ 2019-10-10  7:30 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets,
	Suleiman Souhlal

When kvm-hostclock is selected, and the host supports it, update our
timekeeping parameters to be the same as the host.
This lets us have our time synchronized with the host's,
even in the presence of host NTP or suspend.

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
---
 arch/x86/Kconfig                    |   9 ++
 arch/x86/include/asm/kvmclock.h     |  12 +++
 arch/x86/kernel/Makefile            |   2 +
 arch/x86/kernel/kvmclock.c          |   5 +-
 arch/x86/kernel/kvmhostclock.c      | 130 ++++++++++++++++++++++++++++
 include/linux/timekeeper_internal.h |   8 ++
 kernel/time/timekeeping.c           |   2 +
 7 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/kvmhostclock.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..c5b1257ea969 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -839,6 +839,15 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
 	bool
 
+config KVM_HOSTCLOCK
+	bool "kvmclock uses host timekeeping"
+	depends on KVM_GUEST
+	help
+	  Select this option to make the guest use the same timekeeping
+	  parameters as the host. This means that time will be almost
+	  exactly the same between the two. Only works if the host uses "tsc"
+	  clocksource.
+
 config JAILHOUSE_GUEST
 	bool "Jailhouse non-root cell support"
 	depends on X86_64 && PCI
diff --git a/arch/x86/include/asm/kvmclock.h b/arch/x86/include/asm/kvmclock.h
index eceea9299097..de1a590ff97e 100644
--- a/arch/x86/include/asm/kvmclock.h
+++ b/arch/x86/include/asm/kvmclock.h
@@ -2,6 +2,18 @@
 #ifndef _ASM_X86_KVM_CLOCK_H
 #define _ASM_X86_KVM_CLOCK_H
 
+#include <linux/timekeeper_internal.h>
+
 extern struct clocksource kvm_clock;
 
+unsigned long kvm_get_tsc_khz(void);
+
+#ifdef CONFIG_KVM_HOSTCLOCK
+void kvm_hostclock_init(void);
+#else
+static inline void kvm_hostclock_init(void)
+{
+}
+#endif /* KVM_HOSTCLOCK */
+
 #endif /* _ASM_X86_KVM_CLOCK_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..bc7be935fc5e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -17,6 +17,7 @@ CFLAGS_REMOVE_tsc.o = -pg
 CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
 CFLAGS_REMOVE_pvclock.o = -pg
 CFLAGS_REMOVE_kvmclock.o = -pg
+CFLAGS_REMOVE_kvmhostclock.o = -pg
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
@@ -112,6 +113,7 @@ obj-$(CONFIG_AMD_NB)		+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvmclock.o
+obj-$(CONFIG_KVM_HOSTCLOCK)	+= kvmhostclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 904494b924c1..4ab862de9777 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,7 +125,7 @@ static inline void kvm_sched_clock_init(bool stable)
  * poll of guests can be running and trouble each other. So we preset
  * lpj here
  */
-static unsigned long kvm_get_tsc_khz(void)
+unsigned long kvm_get_tsc_khz(void)
 {
 	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
 	return pvclock_tsc_khz(this_cpu_pvti());
@@ -366,5 +366,8 @@ void __init kvmclock_init(void)
 		kvm_clock.rating = 299;
 
 	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
+
+	kvm_hostclock_init();
+
 	pv_info.name = "KVM";
 }
diff --git a/arch/x86/kernel/kvmhostclock.c b/arch/x86/kernel/kvmhostclock.c
new file mode 100644
index 000000000000..9971343c2bed
--- /dev/null
+++ b/arch/x86/kernel/kvmhostclock.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM clocksource that uses host timekeeping.
+ * Copyright (c) 2019 Suleiman Souhlal, Google LLC
+ */
+
+#include <linux/clocksource.h>
+#include <linux/kvm_para.h>
+#include <asm/pvclock.h>
+#include <asm/msr.h>
+#include <asm/kvmclock.h>
+#include <linux/timekeeper_internal.h>
+
+struct pvclock_timekeeper pv_timekeeper;
+
+static bool pv_timekeeper_enabled;
+static bool pv_timekeeper_present;
+static int old_vclock_mode;
+
+static u64
+kvm_hostclock_get_cycles(struct clocksource *cs)
+{
+	return rdtsc_ordered();
+}
+
+static int
+kvm_hostclock_enable(struct clocksource *cs)
+{
+	pv_timekeeper_enabled = 1;
+
+	old_vclock_mode = kvm_clock.archdata.vclock_mode;
+	kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
+	return 0;
+}
+
+static void
+kvm_hostclock_disable(struct clocksource *cs)
+{
+	pv_timekeeper_enabled = 0;
+	kvm_clock.archdata.vclock_mode = old_vclock_mode;
+}
+
+struct clocksource kvm_hostclock = {
+	.name = "kvm-hostclock",
+	.read = kvm_hostclock_get_cycles,
+	.enable = kvm_hostclock_enable,
+	.disable = kvm_hostclock_disable,
+	.rating = 401, /* Higher than kvm-clock */
+	.mask = CLOCKSOURCE_MASK(64),
+	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static void
+pvclock_copy_into_read_base(struct pvclock_timekeeper *pvtk,
+    struct tk_read_base *tkr, struct pvclock_read_base *pvtkr)
+{
+	int shift_diff;
+
+	tkr->mask = pvtkr->mask;
+	tkr->cycle_last = pvtkr->cycle_last + pvtk->tsc_offset;
+	tkr->mult = pvtkr->mult;
+	shift_diff = tkr->shift - pvtkr->shift;
+	tkr->shift = pvtkr->shift;
+	tkr->xtime_nsec = pvtkr->xtime_nsec;
+	tkr->base = pvtkr->base;
+}
+
+static u64
+pvtk_read_begin(struct pvclock_timekeeper *pvtk)
+{
+	u64 gen;
+
+	gen = pvtk->gen & ~1;
+	/* Make sure that the gen count is read before the data. */
+	virt_rmb();
+
+	return gen;
+}
+
+static bool
+pvtk_read_retry(struct pvclock_timekeeper *pvtk, u64 gen)
+{
+	/* Make sure that the gen count is re-read after the data. */
+	virt_rmb();
+	return unlikely(gen != pvtk->gen);
+}
+
+void
+kvm_clock_copy_into_tk(struct timekeeper *tk)
+{
+	struct pvclock_timekeeper *pvtk;
+	u64 gen;
+
+	if (!pv_timekeeper_enabled)
+		return;
+
+	pvtk = &pv_timekeeper;
+	do {
+		gen = pvtk_read_begin(pvtk);
+		if (!(pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED))
+			return;
+
+		pvclock_copy_into_read_base(pvtk, &tk->tkr_mono,
+		    &pvtk->tkr_mono);
+		pvclock_copy_into_read_base(pvtk, &tk->tkr_raw, &pvtk->tkr_raw);
+
+		tk->xtime_sec = pvtk->xtime_sec;
+		tk->ktime_sec = pvtk->ktime_sec;
+		tk->wall_to_monotonic.tv_sec = pvtk->wall_to_monotonic_sec;
+		tk->wall_to_monotonic.tv_nsec = pvtk->wall_to_monotonic_nsec;
+		tk->offs_real = pvtk->offs_real;
+		tk->offs_boot = pvtk->offs_boot;
+		tk->offs_tai = pvtk->offs_tai;
+		tk->raw_sec = pvtk->raw_sec;
+	} while (pvtk_read_retry(pvtk, gen));
+}
+
+void __init
+kvm_hostclock_init(void)
+{
+	unsigned long pa;
+
+	pa = __pa(&pv_timekeeper);
+	wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa);
+	if (pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED) {
+		pv_timekeeper_present = 1;
+
+		clocksource_register_khz(&kvm_hostclock, kvm_get_tsc_khz());
+	}
+}
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 84ff2844df2a..43b036375cdc 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -153,4 +153,12 @@ static inline void update_vsyscall_tz(void)
 }
 #endif
 
+#ifdef CONFIG_KVM_HOSTCLOCK
+void kvm_clock_copy_into_tk(struct timekeeper *tk);
+#else
+static inline void kvm_clock_copy_into_tk(struct timekeeper *tk)
+{
+}
+#endif
+
 #endif /* _LINUX_TIMEKEEPER_INTERNAL_H */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ca69290bee2a..09bcf13b2334 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2107,6 +2107,8 @@ static void timekeeping_advance(enum timekeeping_adv_mode mode)
 	clock_set |= accumulate_nsecs_to_secs(tk);
 
 	write_seqcount_begin(&tk_core.seq);
+	kvm_clock_copy_into_tk(tk);
+
 	/*
 	 * Update the real timekeeper.
 	 *
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest.
  2019-10-10  7:30 ` [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
@ 2019-10-10  8:31   ` Vitaly Kuznetsov
  2019-10-10 10:50   ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2019-10-10  8:31 UTC (permalink / raw)
  To: Suleiman Souhlal, pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, Suleiman Souhlal

Suleiman Souhlal <suleiman@google.com> writes:

> This is used to synchronize time between host and guest.
> The guest can request the (guest) physical address it wants the
> data in through the MSR_KVM_TIMEKEEPER_EN MSR.
>
> It currently assumes the host timekeeper is "tsc".
>
> Signed-off-by: Suleiman Souhlal <suleiman@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h      |   3 +
>  arch/x86/include/asm/pvclock-abi.h   |  27 ++++++
>  arch/x86/include/uapi/asm/kvm_para.h |   1 +
>  arch/x86/kvm/x86.c                   | 121 +++++++++++++++++++++++++++
>  4 files changed, 152 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 50eb430b0ad8..4d622450cb4a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -659,7 +659,10 @@ struct kvm_vcpu_arch {
>  	struct pvclock_vcpu_time_info hv_clock;
>  	unsigned int hw_tsc_khz;
>  	struct gfn_to_hva_cache pv_time;
> +	struct gfn_to_hva_cache pv_timekeeper_g2h;
> +	struct pvclock_timekeeper pv_timekeeper;
>  	bool pv_time_enabled;
> +	bool pv_timekeeper_enabled;
>  	/* set guest stopped flag in pvclock flags field */
>  	bool pvclock_set_guest_stopped_request;
>  
> diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
> index 1436226efe3e..2809008b9b26 100644
> --- a/arch/x86/include/asm/pvclock-abi.h
> +++ b/arch/x86/include/asm/pvclock-abi.h
> @@ -40,6 +40,33 @@ struct pvclock_wall_clock {
>  	u32   nsec;
>  } __attribute__((__packed__));
>  
> +struct pvclock_read_base {
> +	u64 mask;
> +	u64 cycle_last;
> +	u32 mult;
> +	u32 shift;
> +	u64 xtime_nsec;
> +	u64 base;
> +} __attribute__((__packed__));
> +
> +struct pvclock_timekeeper {
> +	u64 gen;
> +	u64 flags;
> +	struct pvclock_read_base tkr_mono;
> +	struct pvclock_read_base tkr_raw;
> +	u64 xtime_sec;
> +	u64 ktime_sec;
> +	u64 wall_to_monotonic_sec;
> +	u64 wall_to_monotonic_nsec;
> +	u64 offs_real;
> +	u64 offs_boot;
> +	u64 offs_tai;
> +	u64 raw_sec;
> +	u64 tsc_offset;
> +} __attribute__((__packed__));
> +

AFAIU these structures mirror struct tk_read_base and struct timekeeper
but these are intenal to timekeeper so the risk I see is: we decide to
change timekeeper internals and these structures become hard-to-mirror
so I'd think about versioning them from the very beginning, e.g.: host
reports supported timekeer versions in MSR_KVM_TIMEKEEPER_EN (renamed)
and then guest asks for a particular version. This way we can deprecate
old versions eventually.

> +#define	PVCLOCK_TIMEKEEPER_ENABLED (1 << 0)
> +
>  #define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
>  #define PVCLOCK_GUEST_STOPPED	(1 << 1)
>  /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..3ebb1d87db3a 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -50,6 +50,7 @@
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN      0x4b564d04
>  #define MSR_KVM_POLL_CONTROL	0x4b564d05
> +#define MSR_KVM_TIMEKEEPER_EN	0x4b564d06
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 661e2bf38526..937f83cdda4b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>  int __read_mostly pi_inject_timer = -1;
>  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>  
> +static atomic_t pv_timekeepers_nr;
> +
>  #define KVM_NR_SHARED_MSRS 16
>  
>  struct kvm_shared_msrs_global {
> @@ -2729,6 +2731,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  
>  		break;
>  	}
> +	case MSR_KVM_TIMEKEEPER_EN:
> +		if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
> +		    &vcpu->arch.pv_timekeeper_g2h, data,
> +		    sizeof(struct pvclock_timekeeper)))
> +			vcpu->arch.pv_timekeeper_enabled = false;
> +		else {
> +			vcpu->arch.pv_timekeeper_enabled = true;
> +			atomic_inc(&pv_timekeepers_nr);
> +		}
> +		break;
>  	case MSR_KVM_ASYNC_PF_EN:
>  		if (kvm_pv_enable_async_pf(vcpu, data))
>  			return 1;
> @@ -7097,6 +7109,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
>  };
>  
>  #ifdef CONFIG_X86_64
> +static DEFINE_SPINLOCK(shadow_pvtk_lock);
> +static struct pvclock_timekeeper shadow_pvtk;
> +
> +static void
> +pvclock_copy_read_base(struct pvclock_read_base *pvtkr,
> +    struct tk_read_base *tkr)
> +{
> +	pvtkr->cycle_last = tkr->cycle_last;
> +	pvtkr->mult = tkr->mult;
> +	pvtkr->shift = tkr->shift;
> +	pvtkr->mask = tkr->mask;
> +	pvtkr->xtime_nsec = tkr->xtime_nsec;
> +	pvtkr->base = tkr->base;
> +}
> +
> +static void
> +kvm_copy_into_pvtk(struct kvm_vcpu *vcpu)
> +{
> +	struct pvclock_timekeeper *pvtk;
> +	unsigned long flags;
> +
> +	if (!vcpu->arch.pv_timekeeper_enabled)
> +		return;
> +
> +	pvtk = &vcpu->arch.pv_timekeeper;
> +	if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC) {
> +		pvtk->flags |= PVCLOCK_TIMEKEEPER_ENABLED;
> +		spin_lock_irqsave(&shadow_pvtk_lock, flags);
> +		pvtk->tkr_mono = shadow_pvtk.tkr_mono;
> +		pvtk->tkr_raw = shadow_pvtk.tkr_raw;
> +
> +		pvtk->xtime_sec = shadow_pvtk.xtime_sec;
> +		pvtk->ktime_sec = shadow_pvtk.ktime_sec;
> +		pvtk->wall_to_monotonic_sec =
> +		    shadow_pvtk.wall_to_monotonic_sec;
> +		pvtk->wall_to_monotonic_nsec =
> +		    shadow_pvtk.wall_to_monotonic_nsec;
> +		pvtk->offs_real = shadow_pvtk.offs_real;
> +		pvtk->offs_boot = shadow_pvtk.offs_boot;
> +		pvtk->offs_tai = shadow_pvtk.offs_tai;
> +		pvtk->raw_sec = shadow_pvtk.raw_sec;
> +		spin_unlock_irqrestore(&shadow_pvtk_lock, flags);
> +
> +		pvtk->tsc_offset = kvm_x86_ops->read_l1_tsc_offset(vcpu);
> +	} else
> +		pvtk->flags &= ~PVCLOCK_TIMEKEEPER_ENABLED;
> +
> +	BUILD_BUG_ON(offsetof(struct pvclock_timekeeper, gen) != 0);
> +
> +	/*
> +	 * Make the gen count odd to indicate we are in the process of
> +	 * updating.
> +	 */
> +	vcpu->arch.pv_timekeeper.gen++;
> +	vcpu->arch.pv_timekeeper.gen |= 1;
> +
> +	/*
> +	 * See comment in kvm_guest_time_update() for why we have to do
> +	 * multiple writes.
> +	 */
> +	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
> +	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen));
> +
> +	smp_wmb();
> +
> +	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
> +	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper));
> +
> +	smp_wmb();
> +
> +	vcpu->arch.pv_timekeeper.gen++;
> +
> +	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
> +	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen));
> +}
> +
> +static void
> +update_shadow_pvtk(struct timekeeper *tk)
> +{
> +	struct pvclock_timekeeper *pvtk;
> +	unsigned long flags;
> +
> +	pvtk = &shadow_pvtk;
> +
> +	if (atomic_read(&pv_timekeepers_nr) == 0 ||
> +	    pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
> +		return;
> +
> +	spin_lock_irqsave(&shadow_pvtk_lock, flags);
> +	pvclock_copy_read_base(&pvtk->tkr_mono, &tk->tkr_mono);
> +	pvclock_copy_read_base(&pvtk->tkr_raw, &tk->tkr_raw);
> +
> +	pvtk->xtime_sec = tk->xtime_sec;
> +	pvtk->ktime_sec = tk->ktime_sec;
> +	pvtk->wall_to_monotonic_sec = tk->wall_to_monotonic.tv_sec;
> +	pvtk->wall_to_monotonic_nsec = tk->wall_to_monotonic.tv_nsec;
> +	pvtk->offs_real = tk->offs_real;
> +	pvtk->offs_boot = tk->offs_boot;
> +	pvtk->offs_tai = tk->offs_tai;
> +	pvtk->raw_sec = tk->raw_sec;
> +	spin_unlock_irqrestore(&shadow_pvtk_lock, flags);
> +}
> +
>  static void pvclock_gtod_update_fn(struct work_struct *work)
>  {
>  	struct kvm *kvm;
> @@ -7124,6 +7239,7 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
>  	struct timekeeper *tk = priv;
>  
>  	update_pvclock_gtod(tk);
> +	update_shadow_pvtk(tk);
>  
>  	/* disable master clock if host does not trust, or does not
>  	 * use, TSC based clocksource.
> @@ -7940,6 +8056,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	bool req_immediate_exit = false;
>  
> +	kvm_copy_into_pvtk(vcpu);

It doesn't matter for pvclock_gtod_update_fn() but vcpu_enter_guest() is
performance critical. I would suggest to introduce a static key to avoid
this completely when no guest is using this new timekeeper.

> +
>  	if (kvm_request_pending(vcpu)) {
>  		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu))
>  			kvm_x86_ops->get_vmcs12_pages(vcpu);
> @@ -9020,6 +9138,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>  
>  	kvmclock_reset(vcpu);
>  
> +	if (vcpu->arch.pv_timekeeper_enabled)
> +		atomic_dec(&pv_timekeepers_nr);
> +
>  	kvm_x86_ops->vcpu_free(vcpu);
>  	free_cpumask_var(wbinvd_dirty_mask);
>  }

-- 
Vitaly

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.
  2019-10-10  7:30 ` [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource Suleiman Souhlal
@ 2019-10-10  8:37   ` Vitaly Kuznetsov
  2019-10-10 10:55   ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2019-10-10  8:37 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga,
	Suleiman Souhlal, pbonzini, tglx

Suleiman Souhlal <suleiman@google.com> writes:

> When kvm-hostclock is selected, and the host supports it, update our
> timekeeping parameters to be the same as the host.
> This lets us have our time synchronized with the host's,
> even in the presence of host NTP or suspend.
>
> Signed-off-by: Suleiman Souhlal <suleiman@google.com>
> ---
>  arch/x86/Kconfig                    |   9 ++
>  arch/x86/include/asm/kvmclock.h     |  12 +++
>  arch/x86/kernel/Makefile            |   2 +
>  arch/x86/kernel/kvmclock.c          |   5 +-
>  arch/x86/kernel/kvmhostclock.c      | 130 ++++++++++++++++++++++++++++
>  include/linux/timekeeper_internal.h |   8 ++
>  kernel/time/timekeeping.c           |   2 +
>  7 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/kvmhostclock.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d6e1faa28c58..c5b1257ea969 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -839,6 +839,15 @@ config PARAVIRT_TIME_ACCOUNTING
>  config PARAVIRT_CLOCK
>  	bool
>  
> +config KVM_HOSTCLOCK
> +	bool "kvmclock uses host timekeeping"
> +	depends on KVM_GUEST
> +	help
> +	  Select this option to make the guest use the same timekeeping
> +	  parameters as the host. This means that time will be almost
> +	  exactly the same between the two. Only works if the host uses "tsc"
> +	  clocksource.
> +
>  config JAILHOUSE_GUEST
>  	bool "Jailhouse non-root cell support"
>  	depends on X86_64 && PCI
> diff --git a/arch/x86/include/asm/kvmclock.h b/arch/x86/include/asm/kvmclock.h
> index eceea9299097..de1a590ff97e 100644
> --- a/arch/x86/include/asm/kvmclock.h
> +++ b/arch/x86/include/asm/kvmclock.h
> @@ -2,6 +2,18 @@
>  #ifndef _ASM_X86_KVM_CLOCK_H
>  #define _ASM_X86_KVM_CLOCK_H
>  
> +#include <linux/timekeeper_internal.h>
> +
>  extern struct clocksource kvm_clock;
>  
> +unsigned long kvm_get_tsc_khz(void);
> +
> +#ifdef CONFIG_KVM_HOSTCLOCK
> +void kvm_hostclock_init(void);
> +#else
> +static inline void kvm_hostclock_init(void)
> +{
> +}
> +#endif /* KVM_HOSTCLOCK */
> +
>  #endif /* _ASM_X86_KVM_CLOCK_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 3578ad248bc9..bc7be935fc5e 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -17,6 +17,7 @@ CFLAGS_REMOVE_tsc.o = -pg
>  CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
>  CFLAGS_REMOVE_pvclock.o = -pg
>  CFLAGS_REMOVE_kvmclock.o = -pg
> +CFLAGS_REMOVE_kvmhostclock.o = -pg
>  CFLAGS_REMOVE_ftrace.o = -pg
>  CFLAGS_REMOVE_early_printk.o = -pg
>  CFLAGS_REMOVE_head64.o = -pg
> @@ -112,6 +113,7 @@ obj-$(CONFIG_AMD_NB)		+= amd_nb.o
>  obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
>  
>  obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvmclock.o
> +obj-$(CONFIG_KVM_HOSTCLOCK)	+= kvmhostclock.o
>  obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch.o
>  obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
>  obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 904494b924c1..4ab862de9777 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -125,7 +125,7 @@ static inline void kvm_sched_clock_init(bool stable)
>   * poll of guests can be running and trouble each other. So we preset
>   * lpj here
>   */
> -static unsigned long kvm_get_tsc_khz(void)
> +unsigned long kvm_get_tsc_khz(void)
>  {
>  	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
>  	return pvclock_tsc_khz(this_cpu_pvti());
> @@ -366,5 +366,8 @@ void __init kvmclock_init(void)
>  		kvm_clock.rating = 299;
>  
>  	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
> +
> +	kvm_hostclock_init();
> +
>  	pv_info.name = "KVM";
>  }
> diff --git a/arch/x86/kernel/kvmhostclock.c b/arch/x86/kernel/kvmhostclock.c
> new file mode 100644
> index 000000000000..9971343c2bed
> --- /dev/null
> +++ b/arch/x86/kernel/kvmhostclock.c
> @@ -0,0 +1,130 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * KVM clocksource that uses host timekeeping.
> + * Copyright (c) 2019 Suleiman Souhlal, Google LLC
> + */
> +
> +#include <linux/clocksource.h>
> +#include <linux/kvm_para.h>
> +#include <asm/pvclock.h>
> +#include <asm/msr.h>
> +#include <asm/kvmclock.h>
> +#include <linux/timekeeper_internal.h>
> +
> +struct pvclock_timekeeper pv_timekeeper;
> +
> +static bool pv_timekeeper_enabled;
> +static bool pv_timekeeper_present;
> +static int old_vclock_mode;
> +
> +static u64
> +kvm_hostclock_get_cycles(struct clocksource *cs)
> +{
> +	return rdtsc_ordered();
> +}
> +
> +static int
> +kvm_hostclock_enable(struct clocksource *cs)
> +{
> +	pv_timekeeper_enabled = 1;
> +
> +	old_vclock_mode = kvm_clock.archdata.vclock_mode;
> +	kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
> +	return 0;
> +}
> +
> +static void
> +kvm_hostclock_disable(struct clocksource *cs)
> +{
> +	pv_timekeeper_enabled = 0;
> +	kvm_clock.archdata.vclock_mode = old_vclock_mode;
> +}
> +
> +struct clocksource kvm_hostclock = {
> +	.name = "kvm-hostclock",
> +	.read = kvm_hostclock_get_cycles,
> +	.enable = kvm_hostclock_enable,
> +	.disable = kvm_hostclock_disable,
> +	.rating = 401, /* Higher than kvm-clock */

I would've offload the decision which clock is better to the host. If I
understand corectly the new clock is not compatible with live migration,
right? So how can the guest know if the host plans to migrate it or not?
I'd suggest adding two separate CPUID bits:
- kvm-hostclock present and enabled
- kvm-hostclock is prefered over kvm-clock.

> +	.mask = CLOCKSOURCE_MASK(64),
> +	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
> +};
> +
> +static void
> +pvclock_copy_into_read_base(struct pvclock_timekeeper *pvtk,
> +    struct tk_read_base *tkr, struct pvclock_read_base *pvtkr)
> +{
> +	int shift_diff;
> +
> +	tkr->mask = pvtkr->mask;
> +	tkr->cycle_last = pvtkr->cycle_last + pvtk->tsc_offset;
> +	tkr->mult = pvtkr->mult;
> +	shift_diff = tkr->shift - pvtkr->shift;
> +	tkr->shift = pvtkr->shift;
> +	tkr->xtime_nsec = pvtkr->xtime_nsec;
> +	tkr->base = pvtkr->base;
> +}
> +
> +static u64
> +pvtk_read_begin(struct pvclock_timekeeper *pvtk)
> +{
> +	u64 gen;
> +
> +	gen = pvtk->gen & ~1;
> +	/* Make sure that the gen count is read before the data. */
> +	virt_rmb();
> +
> +	return gen;
> +}
> +
> +static bool
> +pvtk_read_retry(struct pvclock_timekeeper *pvtk, u64 gen)
> +{
> +	/* Make sure that the gen count is re-read after the data. */
> +	virt_rmb();
> +	return unlikely(gen != pvtk->gen);
> +}
> +
> +void
> +kvm_clock_copy_into_tk(struct timekeeper *tk)
> +{
> +	struct pvclock_timekeeper *pvtk;
> +	u64 gen;
> +
> +	if (!pv_timekeeper_enabled)
> +		return;
> +
> +	pvtk = &pv_timekeeper;
> +	do {
> +		gen = pvtk_read_begin(pvtk);
> +		if (!(pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED))
> +			return;
> +
> +		pvclock_copy_into_read_base(pvtk, &tk->tkr_mono,
> +		    &pvtk->tkr_mono);
> +		pvclock_copy_into_read_base(pvtk, &tk->tkr_raw, &pvtk->tkr_raw);
> +
> +		tk->xtime_sec = pvtk->xtime_sec;
> +		tk->ktime_sec = pvtk->ktime_sec;
> +		tk->wall_to_monotonic.tv_sec = pvtk->wall_to_monotonic_sec;
> +		tk->wall_to_monotonic.tv_nsec = pvtk->wall_to_monotonic_nsec;
> +		tk->offs_real = pvtk->offs_real;
> +		tk->offs_boot = pvtk->offs_boot;
> +		tk->offs_tai = pvtk->offs_tai;
> +		tk->raw_sec = pvtk->raw_sec;
> +	} while (pvtk_read_retry(pvtk, gen));
> +}
> +
> +void __init
> +kvm_hostclock_init(void)
> +{
> +	unsigned long pa;
> +
> +	pa = __pa(&pv_timekeeper);
> +	wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa);

Shouldn't you check for the presence of the MSR somehow? (CPUID bit probably)

> +	if (pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED) {
> +		pv_timekeeper_present = 1;
> +
> +		clocksource_register_khz(&kvm_hostclock, kvm_get_tsc_khz());
> +	}
> +}
> diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
> index 84ff2844df2a..43b036375cdc 100644
> --- a/include/linux/timekeeper_internal.h
> +++ b/include/linux/timekeeper_internal.h
> @@ -153,4 +153,12 @@ static inline void update_vsyscall_tz(void)
>  }
>  #endif
>  
> +#ifdef CONFIG_KVM_HOSTCLOCK
> +void kvm_clock_copy_into_tk(struct timekeeper *tk);
> +#else
> +static inline void kvm_clock_copy_into_tk(struct timekeeper *tk)
> +{
> +}
> +#endif
> +
>  #endif /* _LINUX_TIMEKEEPER_INTERNAL_H */
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index ca69290bee2a..09bcf13b2334 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -2107,6 +2107,8 @@ static void timekeeping_advance(enum timekeeping_adv_mode mode)
>  	clock_set |= accumulate_nsecs_to_secs(tk);
>  
>  	write_seqcount_begin(&tk_core.seq);
> +	kvm_clock_copy_into_tk(tk);
> +
>  	/*
>  	 * Update the real timekeeper.
>  	 *

-- 
Vitaly

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 0/2] kvm: Use host timekeeping in guest.
  2019-10-10  7:30 [RFC v2 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
  2019-10-10  7:30 ` [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
  2019-10-10  7:30 ` [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource Suleiman Souhlal
@ 2019-10-10 10:39 ` Roman Kagan
  2019-10-11 18:50   ` Konrad Rzeszutek Wilk
  2019-10-15  5:08   ` Suleiman Souhlal
  2019-10-10 10:43 ` Paolo Bonzini
  3 siblings, 2 replies; 13+ messages in thread
From: Roman Kagan @ 2019-10-10 10:39 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: pbonzini, rkrcmar, tglx, john.stultz, sboyd, linux-kernel, kvm,
	ssouhlal, tfiga, vkuznets

On Thu, Oct 10, 2019 at 04:30:53PM +0900, Suleiman Souhlal wrote:
> This RFC is to try to solve the following problem:
> 
> We have some applications that are currently running in their
> own namespace, that still talk to other processes on the
> machine, using IPC, and expect to run on the same machine.
> 
> We want to move them into a virtual machine, for the usual
> benefits of virtualization.
> 
> However, some of these programs use CLOCK_MONOTONIC and
> CLOCK_BOOTTIME timestamps, as part of their protocol, when talking
> to the host.
> 
> Generally speaking, we have multiple event sources, for example
> sensors, input devices, display controller vsync, etc and we would
> like to rely on them in the guest for various scenarios.
> 
> As a specific example, we are trying to run some wayland clients
> (in the guest) who talk to the server (in the host), and the server
> gives input events based on host time. Additionally, there are also
> vsync events that the clients use for timing their rendering.
> 
> Another use case we have are timestamps from IIO sensors and cameras.
> There are applications that need to determine how the timestamps
> relate to the current time and the only way to get current time is
> clock_gettime(), which would return a value from a different time
> domain than the timestamps.
> 
> In this case, it is not feasible to change these programs, due to
> the number of the places we would have to change.
> 
> We spent some time thinking about this, and the best solution we
> could come up with was the following:
> 
> Make the guest kernel return the same CLOCK_MONOTONIC and
> CLOCK_GETTIME timestamps as the host.
> 
> To do that, I am changing kvmclock to request to the host to copy
> its timekeeping parameters (mult, base, cycle_last, etc), so that
> the guest timekeeper can use the same values, so that time can
> be synchronized between the guest and the host.

I wonder how feasible it is to map the host's vdso into the guest and
thus make the guest use the *same* (as opposed to "synchronized") clock
as the host's userspace?  Another benefit is that it's essentially an
ABI so is not changed as liberally as internal structures like
timekeeper, etc.  There is probably certain complication in handling the
syscall fallback in the vdso when used in the guest kernel, though.

You'll also need to ensure neither tsc scaling and nor offsetting is
applied to the VM once this clock is enabled.

Roman.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 0/2] kvm: Use host timekeeping in guest.
  2019-10-10  7:30 [RFC v2 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
                   ` (2 preceding siblings ...)
  2019-10-10 10:39 ` [RFC v2 0/2] kvm: Use host timekeeping in guest Roman Kagan
@ 2019-10-10 10:43 ` Paolo Bonzini
  3 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2019-10-10 10:43 UTC (permalink / raw)
  To: Suleiman Souhlal, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets

On 10/10/19 09:30, Suleiman Souhlal wrote:
> 
> Changes in v2:
> - Move out of kvmclock and into its own clocksource and file.
> - Remove timekeeping.c #ifdefs.
> - Fix i386 build.

This is now pretty clean, so my objections are more or less gone.  I
haven't put much thought into this, but are all fields of struct
timekeeping necessary?  Some of them are redundant with the existing
wallclock MSRs.  The handling of versioning probably varies depending on
the exact set of fields, too.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest.
  2019-10-10  7:30 ` [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
  2019-10-10  8:31   ` Vitaly Kuznetsov
@ 2019-10-10 10:50   ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2019-10-10 10:50 UTC (permalink / raw)
  To: Suleiman Souhlal, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets

On 10/10/19 09:30, Suleiman Souhlal wrote:
>  	kvmclock_reset(vcpu);
>  
> +	if (vcpu->arch.pv_timekeeper_enabled)
> +		atomic_dec(&pv_timekeepers_nr);
> +

Please make the new MSR systemwide, there's no need to have one copy per
vCPU.  Also, it has to be cleared on reset.  I have just sent a patch
"[PATCH] kvm: clear kvmclock MSR on reset" and CCed you, you can add the
clearing to kvmclock_reset with that patch applied.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.
  2019-10-10  7:30 ` [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource Suleiman Souhlal
  2019-10-10  8:37   ` Vitaly Kuznetsov
@ 2019-10-10 10:55   ` Paolo Bonzini
  2019-10-15  8:39     ` Suleiman Souhlal
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2019-10-10 10:55 UTC (permalink / raw)
  To: Suleiman Souhlal, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets

On 10/10/19 09:30, Suleiman Souhlal wrote:
> +kvm_hostclock_enable(struct clocksource *cs)
> +{
> +	pv_timekeeper_enabled = 1;
> +
> +	old_vclock_mode = kvm_clock.archdata.vclock_mode;
> +	kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
> +	return 0;
> +}
> +
> +static void
> +kvm_hostclock_disable(struct clocksource *cs)
> +{
> +	pv_timekeeper_enabled = 0;
> +	kvm_clock.archdata.vclock_mode = old_vclock_mode;
> +}
> +

Why do you poke at kvm_clock?  Instead you should add

	.archdata               = { .vclock_mode = VCLOCK_TSC },

to the kvm_hostclock declaration.

Please also check that the invariant TSC CPUID bit
CPUID[0x80000007].EDX[8] is set before enabling this feature.

Paolo

> +	pvtk = &pv_timekeeper;
> +	do {
> +		gen = pvtk_read_begin(pvtk);
> +		if (!(pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED))
> +			return;
> +
> +		pvclock_copy_into_read_base(pvtk, &tk->tkr_mono,
> +		    &pvtk->tkr_mono);
> +		pvclock_copy_into_read_base(pvtk, &tk->tkr_raw, &pvtk->tkr_raw);
> +
> +		tk->xtime_sec = pvtk->xtime_sec;
> +		tk->ktime_sec = pvtk->ktime_sec;
> +		tk->wall_to_monotonic.tv_sec = pvtk->wall_to_monotonic_sec;
> +		tk->wall_to_monotonic.tv_nsec = pvtk->wall_to_monotonic_nsec;
> +		tk->offs_real = pvtk->offs_real;
> +		tk->offs_boot = pvtk->offs_boot;
> +		tk->offs_tai = pvtk->offs_tai;
> +		tk->raw_sec = pvtk->raw_sec;
> +	} while (pvtk_read_retry(pvtk, gen));
> +}
> +

Should you write an "enabled value" (basically the flags) into pvtk as well?

> 
> +kvm_hostclock_init(void)
> +{
> +	unsigned long pa;
> +
> +	pa = __pa(&pv_timekeeper);
> +	wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa);


As Vitaly said, a new CPUID bit must be defined in
Documentation/virt/kvm/cpuid.txt, and used here.  Also please make bit 0
an enable bit.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 0/2] kvm: Use host timekeeping in guest.
  2019-10-10 10:39 ` [RFC v2 0/2] kvm: Use host timekeeping in guest Roman Kagan
@ 2019-10-11 18:50   ` Konrad Rzeszutek Wilk
  2019-10-15  5:08   ` Suleiman Souhlal
  1 sibling, 0 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2019-10-11 18:50 UTC (permalink / raw)
  To: Roman Kagan, Suleiman Souhlal, pbonzini, rkrcmar, tglx,
	john.stultz, sboyd, linux-kernel, kvm, ssouhlal, tfiga, vkuznets

.snip..
> I wonder how feasible it is to map the host's vdso into the guest and
> thus make the guest use the *same* (as opposed to "synchronized") clock
> as the host's userspace?  Another benefit is that it's essentially an
> ABI so is not changed as liberally as internal structures like
> timekeeper, etc.  There is probably certain complication in handling the
> syscall fallback in the vdso when used in the guest kernel, though.
> 
> You'll also need to ensure neither tsc scaling and nor offsetting is
> applied to the VM once this clock is enabled.

This is how Xen does it - you can register the hypervisor to timestamp
your vDSO regions if you want it. See xen_setup_vsyscall_time_info

> 
> Roman.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 0/2] kvm: Use host timekeeping in guest.
  2019-10-10 10:39 ` [RFC v2 0/2] kvm: Use host timekeeping in guest Roman Kagan
  2019-10-11 18:50   ` Konrad Rzeszutek Wilk
@ 2019-10-15  5:08   ` Suleiman Souhlal
  1 sibling, 0 replies; 13+ messages in thread
From: Suleiman Souhlal @ 2019-10-15  5:08 UTC (permalink / raw)
  To: Roman Kagan
  Cc: pbonzini, rkrcmar, tglx, john.stultz, sboyd, linux-kernel, kvm,
	ssouhlal, tfiga, vkuznets, konrad.wilk

On Thu, Oct 10, 2019 at 7:39 PM Roman Kagan <rkagan@virtuozzo.com> wrote:
>
> I wonder how feasible it is to map the host's vdso into the guest and
> thus make the guest use the *same* (as opposed to "synchronized") clock
> as the host's userspace?  Another benefit is that it's essentially an
> ABI so is not changed as liberally as internal structures like
> timekeeper, etc.  There is probably certain complication in handling the
> syscall fallback in the vdso when used in the guest kernel, though.
>
> You'll also need to ensure neither tsc scaling and nor offsetting is
> applied to the VM once this clock is enabled.

That is what I initially wanted to do, but I couldn't find an easy way
to map a host page into the guest, outside of the regular userspace
(ioctl) KVM way of adding memory to a VM.

-- Suleiman

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.
  2019-10-10 10:55   ` Paolo Bonzini
@ 2019-10-15  8:39     ` Suleiman Souhlal
  2019-10-15  8:59       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Suleiman Souhlal @ 2019-10-15  8:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rkrcmar, Thomas Gleixner, John Stultz, sboyd, Linux Kernel, kvm,
	Suleiman Souhlal, tfiga, Vitaly Kuznetsov

On Thu, Oct 10, 2019 at 7:55 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/10/19 09:30, Suleiman Souhlal wrote:
> > +kvm_hostclock_enable(struct clocksource *cs)
> > +{
> > +     pv_timekeeper_enabled = 1;
> > +
> > +     old_vclock_mode = kvm_clock.archdata.vclock_mode;
> > +     kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
> > +     return 0;
> > +}
> > +
> > +static void
> > +kvm_hostclock_disable(struct clocksource *cs)
> > +{
> > +     pv_timekeeper_enabled = 0;
> > +     kvm_clock.archdata.vclock_mode = old_vclock_mode;
> > +}
> > +
>
> Why do you poke at kvm_clock?  Instead you should add
>
>         .archdata               = { .vclock_mode = VCLOCK_TSC },
>
> to the kvm_hostclock declaration.

Yeah, what I was doing does not make sense.

> Please also check that the invariant TSC CPUID bit
> CPUID[0x80000007].EDX[8] is set before enabling this feature.

Will do.

>
> Paolo
>
> > +     pvtk = &pv_timekeeper;
> > +     do {
> > +             gen = pvtk_read_begin(pvtk);
> > +             if (!(pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED))
> > +                     return;
> > +
> > +             pvclock_copy_into_read_base(pvtk, &tk->tkr_mono,
> > +                 &pvtk->tkr_mono);
> > +             pvclock_copy_into_read_base(pvtk, &tk->tkr_raw, &pvtk->tkr_raw);
> > +
> > +             tk->xtime_sec = pvtk->xtime_sec;
> > +             tk->ktime_sec = pvtk->ktime_sec;
> > +             tk->wall_to_monotonic.tv_sec = pvtk->wall_to_monotonic_sec;
> > +             tk->wall_to_monotonic.tv_nsec = pvtk->wall_to_monotonic_nsec;
> > +             tk->offs_real = pvtk->offs_real;
> > +             tk->offs_boot = pvtk->offs_boot;
> > +             tk->offs_tai = pvtk->offs_tai;
> > +             tk->raw_sec = pvtk->raw_sec;
> > +     } while (pvtk_read_retry(pvtk, gen));
> > +}
> > +
>
> Should you write an "enabled value" (basically the flags) into pvtk as well?

I think we have that already (pvtk->flags).
I'll change the if statement above to use pvtk instead of pv_timekeeper.

> > +kvm_hostclock_init(void)
> > +{
> > +     unsigned long pa;
> > +
> > +     pa = __pa(&pv_timekeeper);
> > +     wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa);
>
>
> As Vitaly said, a new CPUID bit must be defined in
> Documentation/virt/kvm/cpuid.txt, and used here.  Also please make bit 0
> an enable bit.

I think I might not be able to move the enable bit to 0 because we
need the generation count (pvclock_timekeeper.gen) to be the first
word of the struct due to the way we copy the data to userspace,
similarly to how kvm_setup_pvclock_page() does it.

I also noticed the comment in kvm_copy_into_pvtk() references the
wrong function, which I will fix in the next revision.

Thanks for the feedback.

-- Suleiman

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.
  2019-10-15  8:39     ` Suleiman Souhlal
@ 2019-10-15  8:59       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2019-10-15  8:59 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: rkrcmar, Thomas Gleixner, John Stultz, sboyd, Linux Kernel, kvm,
	Suleiman Souhlal, tfiga, Vitaly Kuznetsov

On 15/10/19 10:39, Suleiman Souhlal wrote:
> I think we have that already (pvtk->flags).
> I'll change the if statement above to use pvtk instead of pv_timekeeper.

Of course, thanks.

>>> +kvm_hostclock_init(void)
>>> +{
>>> +     unsigned long pa;
>>> +
>>> +     pa = __pa(&pv_timekeeper);
>>> +     wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa);
>>
>>
>> As Vitaly said, a new CPUID bit must be defined in
>> Documentation/virt/kvm/cpuid.txt, and used here.  Also please make bit 0
>> an enable bit.
> 
> I think I might not be able to move the enable bit to 0 because we
> need the generation count (pvclock_timekeeper.gen) to be the first
> word of the struct due to the way we copy the data to userspace,
> similarly to how kvm_setup_pvclock_page() does it.

I mean bit 0 of the MSR.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-15  8:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10  7:30 [RFC v2 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
2019-10-10  7:30 ` [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
2019-10-10  8:31   ` Vitaly Kuznetsov
2019-10-10 10:50   ` Paolo Bonzini
2019-10-10  7:30 ` [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource Suleiman Souhlal
2019-10-10  8:37   ` Vitaly Kuznetsov
2019-10-10 10:55   ` Paolo Bonzini
2019-10-15  8:39     ` Suleiman Souhlal
2019-10-15  8:59       ` Paolo Bonzini
2019-10-10 10:39 ` [RFC v2 0/2] kvm: Use host timekeeping in guest Roman Kagan
2019-10-11 18:50   ` Konrad Rzeszutek Wilk
2019-10-15  5:08   ` Suleiman Souhlal
2019-10-10 10:43 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.