linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] kvm: Use host timekeeping in guest.
@ 2019-09-20  6:27 Suleiman Souhlal
  2019-09-20  6:27 ` [RFC 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Suleiman Souhlal @ 2019-09-20  6:27 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, Suleiman Souhlal

This RFC is to try to solve the following problem:

We have some applications that are currently running in their
own namespace, that still talk to other processes on the
machine, using IPC, and expect to run on the same machine.

We want to move them into a virtual machine, for the usual
benefits of virtualization.

However, some of these programs use CLOCK_MONOTONIC and 
CLOCK_BOOTTIME timestamps, as part of their protocol, when talking
to the host.

Generally speaking, we have multiple event sources, for example
sensors, input devices, display controller vsync, etc and we would
like to rely on them in the guest for various scenarios.

As a specific example, we are trying to run some wayland clients
(in the guest) who talk to the server (in the host), and the server
gives input events based on host time. Additionally, there are also
vsync events that the clients use for timing their rendering.

Another use case we have are timestamps from IIO sensors and cameras.
There are applications that need to determine how the timestamps
relate to the current time and the only way to get current time is
clock_gettime(), which would return a value from a different time
domain than the timestamps.

In this case, it is not feasible to change these programs, due to
the number of the places we would have to change.

We spent some time thinking about this, and the best solution we
could come up with was the following:

Make the guest kernel return the same CLOCK_MONOTONIC and
CLOCK_GETTIME timestamps as the host.

To do that, I am changing kvmclock to request to the host to copy
its timekeeping parameters (mult, base, cycle_last, etc), so that
the guest timekeeper can use the same values, so that time can
be synchronized between the guest and the host.

Any suggestions or feedback would be highly appreciated.

Suleiman Souhlal (2):
  kvm: Mechanism to copy host timekeeping parameters into guest.
  x86/kvmclock: Use host timekeeping.

 arch/x86/Kconfig                     |   9 ++
 arch/x86/include/asm/kvm_host.h      |   3 +
 arch/x86/include/asm/kvmclock.h      |   2 +
 arch/x86/include/asm/pvclock-abi.h   |  27 ++++++
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kernel/kvmclock.c           | 127 ++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                   | 121 +++++++++++++++++++++++++
 kernel/time/timekeeping.c            |  21 +++++
 8 files changed, 307 insertions(+), 4 deletions(-)

-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/2] kvm: Mechanism to copy host timekeeping parameters into guest.
  2019-09-20  6:27 [RFC 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
@ 2019-09-20  6:27 ` Suleiman Souhlal
  2019-09-20  6:27 ` [RFC 2/2] x86/kvmclock: Use host timekeeping Suleiman Souhlal
  2019-09-20  7:48 ` [RFC 0/2] kvm: Use host timekeeping in guest Paolo Bonzini
  2 siblings, 0 replies; 9+ messages in thread
From: Suleiman Souhlal @ 2019-09-20  6:27 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, Suleiman Souhlal

This is used to synchronize time between host and guest.
The guest can request the (guest) physical address it wants the
data in through the MSR_KVM_TIMEKEEPER_EN MSR.

We maintain a shadow copy of the timekeeper that gets updated
whenever the timekeeper gets updated, and then copied into the
guest.

It currently assumes the host timekeeper is "tsc".

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
---
 arch/x86/include/asm/kvm_host.h      |   3 +
 arch/x86/include/asm/pvclock-abi.h   |  27 ++++++
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kvm/x86.c                   | 121 +++++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdc16b0aa7c6..b1b4c3a80b8d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -666,7 +666,10 @@ struct kvm_vcpu_arch {
 	struct pvclock_vcpu_time_info hv_clock;
 	unsigned int hw_tsc_khz;
 	struct gfn_to_hva_cache pv_time;
+	struct gfn_to_hva_cache pv_timekeeper_g2h;
+	struct pvclock_timekeeper pv_timekeeper;
 	bool pv_time_enabled;
+	bool pv_timekeeper_enabled;
 	/* set guest stopped flag in pvclock flags field */
 	bool pvclock_set_guest_stopped_request;
 
diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index 1436226efe3e..2809008b9b26 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,6 +40,33 @@ struct pvclock_wall_clock {
 	u32   nsec;
 } __attribute__((__packed__));
 
+struct pvclock_read_base {
+	u64 mask;
+	u64 cycle_last;
+	u32 mult;
+	u32 shift;
+	u64 xtime_nsec;
+	u64 base;
+} __attribute__((__packed__));
+
+struct pvclock_timekeeper {
+	u64 gen;
+	u64 flags;
+	struct pvclock_read_base tkr_mono;
+	struct pvclock_read_base tkr_raw;
+	u64 xtime_sec;
+	u64 ktime_sec;
+	u64 wall_to_monotonic_sec;
+	u64 wall_to_monotonic_nsec;
+	u64 offs_real;
+	u64 offs_boot;
+	u64 offs_tai;
+	u64 raw_sec;
+	u64 tsc_offset;
+} __attribute__((__packed__));
+
+#define	PVCLOCK_TIMEKEEPER_ENABLED (1 << 0)
+
 #define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
 #define PVCLOCK_GUEST_STOPPED	(1 << 1)
 /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..3ebb1d87db3a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,7 @@
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
+#define MSR_KVM_TIMEKEEPER_EN	0x4b564d06
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91602d310a3f..06a940a74005 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+static atomic_t pv_timekeepers_nr;
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -2621,6 +2623,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		break;
 	}
+	case MSR_KVM_TIMEKEEPER_EN:
+		if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
+		    &vcpu->arch.pv_timekeeper_g2h, data,
+		    sizeof(struct pvclock_timekeeper)))
+			vcpu->arch.pv_timekeeper_enabled = false;
+		else {
+			vcpu->arch.pv_timekeeper_enabled = true;
+			atomic_inc(&pv_timekeepers_nr);
+		}
+		break;
 	case MSR_KVM_ASYNC_PF_EN:
 		if (kvm_pv_enable_async_pf(vcpu, data))
 			return 1;
@@ -6965,6 +6977,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 	.handle_intel_pt_intr	= kvm_handle_intel_pt_intr,
 };
 
+static DEFINE_SPINLOCK(shadow_pvtk_lock);
+static struct pvclock_timekeeper shadow_pvtk;
+
+static void
+pvclock_copy_read_base(struct pvclock_read_base *pvtkr,
+    struct tk_read_base *tkr)
+{
+	pvtkr->cycle_last = tkr->cycle_last;
+	pvtkr->mult = tkr->mult;
+	pvtkr->shift = tkr->shift;
+	pvtkr->mask = tkr->mask;
+	pvtkr->xtime_nsec = tkr->xtime_nsec;
+	pvtkr->base = tkr->base;
+}
+
+static void
+kvm_copy_into_pvtk(struct kvm_vcpu *vcpu)
+{
+	struct pvclock_timekeeper *pvtk;
+	unsigned long flags;
+
+	if (!vcpu->arch.pv_timekeeper_enabled)
+		return;
+
+	pvtk = &vcpu->arch.pv_timekeeper;
+	if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC) {
+		pvtk->flags |= PVCLOCK_TIMEKEEPER_ENABLED;
+		spin_lock_irqsave(&shadow_pvtk_lock, flags);
+		pvtk->tkr_mono = shadow_pvtk.tkr_mono;
+		pvtk->tkr_raw = shadow_pvtk.tkr_raw;
+
+		pvtk->xtime_sec = shadow_pvtk.xtime_sec;
+		pvtk->ktime_sec = shadow_pvtk.ktime_sec;
+		pvtk->wall_to_monotonic_sec =
+		    shadow_pvtk.wall_to_monotonic_sec;
+		pvtk->wall_to_monotonic_nsec =
+		    shadow_pvtk.wall_to_monotonic_nsec;
+		pvtk->offs_real = shadow_pvtk.offs_real;
+		pvtk->offs_boot = shadow_pvtk.offs_boot;
+		pvtk->offs_tai = shadow_pvtk.offs_tai;
+		pvtk->raw_sec = shadow_pvtk.raw_sec;
+		spin_unlock_irqrestore(&shadow_pvtk_lock, flags);
+
+		pvtk->tsc_offset = kvm_x86_ops->read_l1_tsc_offset(vcpu);
+	} else
+		pvtk->flags &= ~PVCLOCK_TIMEKEEPER_ENABLED;
+
+	BUILD_BUG_ON(offsetof(struct pvclock_timekeeper, gen) != 0);
+
+	/*
+	 * Make the gen count odd to indicate we are in the process of
+	 * updating.
+	 */
+	vcpu->arch.pv_timekeeper.gen++;
+	vcpu->arch.pv_timekeeper.gen |= 1;
+
+	/*
+	 * See comment in kvm_guest_time_update() for why we have to do
+	 * multiple writes.
+	 */
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
+	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen));
+
+	smp_wmb();
+
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
+	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper));
+
+	smp_wmb();
+
+	vcpu->arch.pv_timekeeper.gen++;
+
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h,
+	    &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen));
+}
+
+static void
+update_shadow_pvtk(struct timekeeper *tk)
+{
+	struct pvclock_timekeeper *pvtk;
+	unsigned long flags;
+
+	pvtk = &shadow_pvtk;
+
+	if (atomic_read(&pv_timekeepers_nr) == 0 ||
+	    pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+		return;
+
+	spin_lock_irqsave(&shadow_pvtk_lock, flags);
+	pvclock_copy_read_base(&pvtk->tkr_mono, &tk->tkr_mono);
+	pvclock_copy_read_base(&pvtk->tkr_raw, &tk->tkr_raw);
+
+	pvtk->xtime_sec = tk->xtime_sec;
+	pvtk->ktime_sec = tk->ktime_sec;
+	pvtk->wall_to_monotonic_sec = tk->wall_to_monotonic.tv_sec;
+	pvtk->wall_to_monotonic_nsec = tk->wall_to_monotonic.tv_nsec;
+	pvtk->offs_real = tk->offs_real;
+	pvtk->offs_boot = tk->offs_boot;
+	pvtk->offs_tai = tk->offs_tai;
+	pvtk->raw_sec = tk->raw_sec;
+	spin_unlock_irqrestore(&shadow_pvtk_lock, flags);
+}
+
 #ifdef CONFIG_X86_64
 static void pvclock_gtod_update_fn(struct work_struct *work)
 {
@@ -6993,6 +7108,7 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
 	struct timekeeper *tk = priv;
 
 	update_pvclock_gtod(tk);
+	update_shadow_pvtk(tk);
 
 	/* disable master clock if host does not trust, or does not
 	 * use, TSC based clocksource.
@@ -7809,6 +7925,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
+	kvm_copy_into_pvtk(vcpu);
+
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu))
 			kvm_x86_ops->get_vmcs12_pages(vcpu);
@@ -8891,6 +9009,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 
 	kvmclock_reset(vcpu);
 
+	if (vcpu->arch.pv_timekeeper_enabled)
+		atomic_dec(&pv_timekeepers_nr);
+
 	kvm_x86_ops->vcpu_free(vcpu);
 	free_cpumask_var(wbinvd_dirty_mask);
 }
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/2] x86/kvmclock: Use host timekeeping.
  2019-09-20  6:27 [RFC 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
  2019-09-20  6:27 ` [RFC 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
@ 2019-09-20  6:27 ` Suleiman Souhlal
  2019-09-20 13:33   ` Vitaly Kuznetsov
  2019-09-20  7:48 ` [RFC 0/2] kvm: Use host timekeeping in guest Paolo Bonzini
  2 siblings, 1 reply; 9+ messages in thread
From: Suleiman Souhlal @ 2019-09-20  6:27 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx
  Cc: john.stultz, sboyd, linux-kernel, kvm, Suleiman Souhlal

When CONFIG_KVMCLOCK_HOST_TIMEKEEPING is enabled, and the host
supports it, update our timekeeping parameters to be the same as
the host. This lets us have our time synchronized with the host's,
even in the presence of host NTP or suspend.

When enabled, kvmclock uses raw tsc instead of pvclock.

When enabled, syscalls that can change time, such as settimeofday(2)
or adj_timex(2) are disabled in the guest.

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
---
 arch/x86/Kconfig                |   9 +++
 arch/x86/include/asm/kvmclock.h |   2 +
 arch/x86/kernel/kvmclock.c      | 127 +++++++++++++++++++++++++++++++-
 kernel/time/timekeeping.c       |  21 ++++++
 4 files changed, 155 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4195f44c6a09..37299377d9d7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -837,6 +837,15 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
 	bool
 
+config KVMCLOCK_HOST_TIMEKEEPING
+	bool "kvmclock uses host timekeeping"
+	depends on KVM_GUEST
+	---help---
+	  Select this option to make the guest use the same timekeeping
+	  parameters as the host. This means that time will be almost
+	  exactly the same between the two. Only works if the host uses "tsc"
+	  clocksource.
+
 config JAILHOUSE_GUEST
 	bool "Jailhouse non-root cell support"
 	depends on X86_64 && PCI
diff --git a/arch/x86/include/asm/kvmclock.h b/arch/x86/include/asm/kvmclock.h
index eceea9299097..cad7eae62b54 100644
--- a/arch/x86/include/asm/kvmclock.h
+++ b/arch/x86/include/asm/kvmclock.h
@@ -4,4 +4,6 @@
 
 extern struct clocksource kvm_clock;
 
+bool kvm_clock_copy_into_tk(struct timekeeper *tk);
+
 #endif /* _ASM_X86_KVM_CLOCK_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 904494b924c1..e3a33dcaffc8 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/set_memory.h>
+#include <linux/timekeeper_internal.h>
 
 #include <asm/hypervisor.h>
 #include <asm/mem_encrypt.h>
@@ -29,6 +30,9 @@ static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
 static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
 static u64 kvm_sched_clock_offset __ro_after_init;
 
+static bool pv_timekeeper_enabled;
+static bool pv_timekeeper_present;
+
 static int __init parse_no_kvmclock(char *arg)
 {
 	kvmclock = 0;
@@ -54,6 +58,8 @@ static struct pvclock_wall_clock wall_clock __bss_decrypted;
 static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
 static struct pvclock_vsyscall_time_info *hvclock_mem;
 
+struct pvclock_timekeeper pv_timekeeper;
+
 static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
 {
 	return &this_cpu_read(hv_clock_per_cpu)->pvti;
@@ -94,7 +100,10 @@ static u64 kvm_clock_read(void)
 
 static u64 kvm_clock_get_cycles(struct clocksource *cs)
 {
-	return kvm_clock_read();
+	if (pv_timekeeper_present)
+		return rdtsc_ordered();
+	else
+		return kvm_clock_read();
 }
 
 static u64 kvm_sched_clock_read(void)
@@ -159,9 +168,26 @@ bool kvm_check_and_clear_guest_paused(void)
 	return ret;
 }
 
+int
+kvm_clock_enable(struct clocksource *cs)
+{
+	if (pv_timekeeper_present)
+		pv_timekeeper_enabled = 1;
+
+	return 0;
+}
+
+void
+kvm_clock_disable(struct clocksource *cs)
+{
+	pv_timekeeper_enabled = 0;
+}
+
 struct clocksource kvm_clock = {
 	.name	= "kvm-clock",
 	.read	= kvm_clock_get_cycles,
+	.enable = kvm_clock_enable,
+	.disable = kvm_clock_disable,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
@@ -197,6 +223,86 @@ static void kvm_setup_secondary_clock(void)
 }
 #endif
 
+#ifdef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
+static void
+pvclock_copy_into_read_base(struct pvclock_timekeeper *pvtk,
+    struct tk_read_base *tkr, struct pvclock_read_base *pvtkr)
+{
+	int shift_diff;
+
+	tkr->mask = pvtkr->mask;
+	tkr->cycle_last = pvtkr->cycle_last + pvtk->tsc_offset;
+	tkr->mult = pvtkr->mult;
+	shift_diff = tkr->shift - pvtkr->shift;
+	tkr->shift = pvtkr->shift;
+	tkr->xtime_nsec = pvtkr->xtime_nsec;
+	tkr->base = pvtkr->base;
+}
+
+static u64
+pvtk_read_begin(struct pvclock_timekeeper *pvtk)
+{
+	u64 gen;
+
+	gen = pvtk->gen & ~1;
+	/* Make sure that the gen count is read before the data. */
+	virt_rmb();
+
+	return gen;
+}
+
+static bool
+pvtk_read_retry(struct pvclock_timekeeper *pvtk, u64 gen)
+{
+	/* Make sure that the gen count is re-read after the data. */
+	virt_rmb();
+	return unlikely(gen != pvtk->gen);
+}
+
+bool
+kvm_clock_copy_into_tk(struct timekeeper *tk)
+{
+	struct pvclock_timekeeper *pvtk;
+	u64 gen;
+
+	if (!pv_timekeeper_enabled)
+		return false;
+
+	pvtk = &pv_timekeeper;
+	do {
+		gen = pvtk_read_begin(pvtk);
+		if (!(pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED))
+			return false;
+
+		pvclock_copy_into_read_base(pvtk, &tk->tkr_mono,
+		    &pvtk->tkr_mono);
+		pvclock_copy_into_read_base(pvtk, &tk->tkr_raw, &pvtk->tkr_raw);
+
+		tk->xtime_sec = pvtk->xtime_sec;
+		tk->ktime_sec = pvtk->ktime_sec;
+		tk->wall_to_monotonic.tv_sec = pvtk->wall_to_monotonic_sec;
+		tk->wall_to_monotonic.tv_nsec = pvtk->wall_to_monotonic_nsec;
+		tk->offs_real = pvtk->offs_real;
+		tk->offs_boot = pvtk->offs_boot;
+		tk->offs_tai = pvtk->offs_tai;
+		tk->raw_sec = pvtk->raw_sec;
+	} while (pvtk_read_retry(pvtk, gen));
+
+	return true;
+}
+
+static void
+kvm_register_timekeeper(void)
+{
+	unsigned long pa;
+
+	pa = __pa(&pv_timekeeper);
+	wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa);
+	if (pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED)
+		pv_timekeeper_present = 1;
+}
+#endif /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
+
 /*
  * After the clock is registered, the host will keep writing to the
  * registered memory location. If the guest happens to shutdown, this memory
@@ -271,8 +377,10 @@ static int __init kvm_setup_vsyscall_timeinfo(void)
 	flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
 	if (!(flags & PVCLOCK_TSC_STABLE_BIT))
 		return 0;
-
-	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
+	if (pv_timekeeper_present)
+		kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
+	else
+		kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
 #endif
 
 	kvmclock_init_mem();
@@ -312,6 +420,10 @@ void __init kvmclock_init(void)
 	if (!kvm_para_available() || !kvmclock)
 		return;
 
+#ifdef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
+	kvm_register_timekeeper();
+#endif
+
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
 		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
 		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
@@ -360,11 +472,18 @@ void __init kvmclock_init(void)
 	 * can use TSC as clocksource.
 	 *
 	 */
+#ifndef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    !check_tsc_unstable())
 		kvm_clock.rating = 299;
+#endif
+
+	if (pv_timekeeper_present)
+		clocksource_register_khz(&kvm_clock,
+		    pvclock_tsc_khz(this_cpu_pvti()));
+	else
+		clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 
-	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 	pv_info.name = "KVM";
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ca69290bee2a..c6e4785babb1 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -23,6 +23,10 @@
 #include <linux/compiler.h>
 #include <linux/audit.h>
 
+#ifdef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
+#include <asm/kvmclock.h>
+#endif
+
 #include "tick-internal.h"
 #include "ntp_internal.h"
 #include "timekeeping_internal.h"
@@ -1223,6 +1227,7 @@ EXPORT_SYMBOL_GPL(get_device_system_crosststamp);
  */
 int do_settimeofday64(const struct timespec64 *ts)
 {
+#ifndef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
 	struct timekeeper *tk = &tk_core.timekeeper;
 	struct timespec64 ts_delta, xt;
 	unsigned long flags;
@@ -1261,9 +1266,13 @@ int do_settimeofday64(const struct timespec64 *ts)
 		audit_tk_injoffset(ts_delta);
 
 	return ret;
+#else /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
+	return -EINVAL;
+#endif /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
 }
 EXPORT_SYMBOL(do_settimeofday64);
 
+#ifndef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
 /**
  * timekeeping_inject_offset - Adds or subtracts from the current time.
  * @tv:		pointer to the timespec variable containing the offset
@@ -1307,6 +1316,7 @@ static int timekeeping_inject_offset(const struct timespec64 *ts)
 
 	return ret;
 }
+#endif /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
 
 /*
  * Indicates if there is an offset between the system clock and the hardware
@@ -1332,6 +1342,7 @@ int persistent_clock_is_local;
  */
 void timekeeping_warp_clock(void)
 {
+#ifndef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
 	if (sys_tz.tz_minuteswest != 0) {
 		struct timespec64 adjust;
 
@@ -1340,6 +1351,7 @@ void timekeeping_warp_clock(void)
 		adjust.tv_nsec = 0;
 		timekeeping_inject_offset(&adjust);
 	}
+#endif /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
 }
 
 /**
@@ -2107,6 +2119,9 @@ static void timekeeping_advance(enum timekeeping_adv_mode mode)
 	clock_set |= accumulate_nsecs_to_secs(tk);
 
 	write_seqcount_begin(&tk_core.seq);
+#ifdef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
+	kvm_clock_copy_into_tk(tk);
+#endif
 	/*
 	 * Update the real timekeeper.
 	 *
@@ -2241,6 +2256,7 @@ ktime_t ktime_get_update_offsets_now(unsigned int *cwsseq, ktime_t *offs_real,
 	return base;
 }
 
+#ifndef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
 /**
  * timekeeping_validate_timex - Ensures the timex is ok for use in do_adjtimex
  */
@@ -2305,6 +2321,7 @@ static int timekeeping_validate_timex(const struct __kernel_timex *txc)
 
 	return 0;
 }
+#endif /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
 
 
 /**
@@ -2312,6 +2329,7 @@ static int timekeeping_validate_timex(const struct __kernel_timex *txc)
  */
 int do_adjtimex(struct __kernel_timex *txc)
 {
+#ifndef CONFIG_KVMCLOCK_HOST_TIMEKEEPING
 	struct timekeeper *tk = &tk_core.timekeeper;
 	struct audit_ntp_data ad;
 	unsigned long flags;
@@ -2368,6 +2386,9 @@ int do_adjtimex(struct __kernel_timex *txc)
 	ntp_notify_cmos_timer();
 
 	return ret;
+#else /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
+	return -EINVAL;
+#endif /* CONFIG_KVMCLOCK_HOST_TIMEKEEPING */
 }
 
 #ifdef CONFIG_NTP_PPS
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 0/2] kvm: Use host timekeeping in guest.
  2019-09-20  6:27 [RFC 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
  2019-09-20  6:27 ` [RFC 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
  2019-09-20  6:27 ` [RFC 2/2] x86/kvmclock: Use host timekeeping Suleiman Souhlal
@ 2019-09-20  7:48 ` Paolo Bonzini
  2019-09-20 10:23   ` Thomas Gleixner
  2 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2019-09-20  7:48 UTC (permalink / raw)
  To: Suleiman Souhlal, rkrcmar, tglx; +Cc: john.stultz, sboyd, linux-kernel, kvm

On 20/09/19 08:27, Suleiman Souhlal wrote:
> To do that, I am changing kvmclock to request to the host to copy
> its timekeeping parameters (mult, base, cycle_last, etc), so that
> the guest timekeeper can use the same values, so that time can
> be synchronized between the guest and the host.
> 
> Any suggestions or feedback would be highly appreciated.

I'm not a timekeeping maintainer, but I don't think the
kernel/time/timekeeping.c changes are acceptable.

Is the PTP driver not enough?

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/2] kvm: Use host timekeeping in guest.
  2019-09-20  7:48 ` [RFC 0/2] kvm: Use host timekeeping in guest Paolo Bonzini
@ 2019-09-20 10:23   ` Thomas Gleixner
  2019-09-24  8:08     ` Suleiman Souhlal
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2019-09-20 10:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suleiman Souhlal, rkrcmar, john.stultz, sboyd, linux-kernel, kvm

On Fri, 20 Sep 2019, Paolo Bonzini wrote:

> On 20/09/19 08:27, Suleiman Souhlal wrote:
> > To do that, I am changing kvmclock to request to the host to copy
> > its timekeeping parameters (mult, base, cycle_last, etc), so that
> > the guest timekeeper can use the same values, so that time can
> > be synchronized between the guest and the host.
> > 
> > Any suggestions or feedback would be highly appreciated.
> 
> I'm not a timekeeping maintainer, but I don't think the
> kernel/time/timekeeping.c changes are acceptable.

Indeed. #ifdef WHATEVERTHEHECK does not go anywhere. If at all this needs
to be a runtime switch, but I have yet to understand the whole picture of
this.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] x86/kvmclock: Use host timekeeping.
  2019-09-20  6:27 ` [RFC 2/2] x86/kvmclock: Use host timekeeping Suleiman Souhlal
@ 2019-09-20 13:33   ` Vitaly Kuznetsov
  2019-09-24  8:10     ` Suleiman Souhlal
  0 siblings, 1 reply; 9+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-20 13:33 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: john.stultz, sboyd, linux-kernel, kvm, Suleiman Souhlal,
	pbonzini, rkrcmar, tglx

Suleiman Souhlal <suleiman@google.com> writes:

> When CONFIG_KVMCLOCK_HOST_TIMEKEEPING is enabled, and the host
> supports it, update our timekeeping parameters to be the same as
> the host. This lets us have our time synchronized with the host's,
> even in the presence of host NTP or suspend.
>
> When enabled, kvmclock uses raw tsc instead of pvclock.
>
> When enabled, syscalls that can change time, such as settimeofday(2)
> or adj_timex(2) are disabled in the guest.
>
> Signed-off-by: Suleiman Souhlal <suleiman@google.com>
> ---
>  arch/x86/Kconfig                |   9 +++
>  arch/x86/include/asm/kvmclock.h |   2 +
>  arch/x86/kernel/kvmclock.c      | 127 +++++++++++++++++++++++++++++++-
>  kernel/time/timekeeping.c       |  21 ++++++
>  4 files changed, 155 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4195f44c6a09..37299377d9d7 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -837,6 +837,15 @@ config PARAVIRT_TIME_ACCOUNTING
>  config PARAVIRT_CLOCK
>  	bool
>  
> +config KVMCLOCK_HOST_TIMEKEEPING
> +	bool "kvmclock uses host timekeeping"
> +	depends on KVM_GUEST
> +	---help---
> +	  Select this option to make the guest use the same timekeeping
> +	  parameters as the host. This means that time will be almost
> +	  exactly the same between the two. Only works if the host uses "tsc"
> +	  clocksource.
> +

I'd also like to speak up against this config, it is confusing. In case
the goal is to come up with a TSC-based clock for guests which will
return the same as clock_gettime() on the host (or, is the goal to just
have the same reading for all guests on the host?) I'd suggest we create
a separate (from KVMCLOCK) clocksource (mirroring host timekeeper) and
guests will be free to pick the one they like.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/2] kvm: Use host timekeeping in guest.
  2019-09-20 10:23   ` Thomas Gleixner
@ 2019-09-24  8:08     ` Suleiman Souhlal
  0 siblings, 0 replies; 9+ messages in thread
From: Suleiman Souhlal @ 2019-09-24  8:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, rkrcmar, john.stultz, sboyd, Linux Kernel, kvm,
	Tomasz Figa

On Fri, Sep 20, 2019 at 7:23 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, 20 Sep 2019, Paolo Bonzini wrote:
>
> > On 20/09/19 08:27, Suleiman Souhlal wrote:
> > > To do that, I am changing kvmclock to request to the host to copy
> > > its timekeeping parameters (mult, base, cycle_last, etc), so that
> > > the guest timekeeper can use the same values, so that time can
> > > be synchronized between the guest and the host.
> > >
> > > Any suggestions or feedback would be highly appreciated.
> >
> > I'm not a timekeeping maintainer, but I don't think the
> > kernel/time/timekeeping.c changes are acceptable.
>
> Indeed. #ifdef WHATEVERTHEHECK does not go anywhere. If at all this needs
> to be a runtime switch, but I have yet to understand the whole picture of
> this.

Yeah, I will try to make this a runtime switch.

As for the PTP driver, I don't think it will work for us because we
need CLOCK_MONOTONIC and CLOCK_BOOTTIME to match the host, and from my
understanding, PTP doesn't solve that.

Thanks,
-- Suleiman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] x86/kvmclock: Use host timekeeping.
  2019-09-20 13:33   ` Vitaly Kuznetsov
@ 2019-09-24  8:10     ` Suleiman Souhlal
  2019-09-24 11:14       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 9+ messages in thread
From: Suleiman Souhlal @ 2019-09-24  8:10 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: John Stultz, sboyd, Linux Kernel, kvm, Paolo Bonzini, rkrcmar,
	Thomas Gleixner, Tomasz Figa

On Fri, Sep 20, 2019 at 10:33 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Suleiman Souhlal <suleiman@google.com> writes:
>
> > When CONFIG_KVMCLOCK_HOST_TIMEKEEPING is enabled, and the host
> > supports it, update our timekeeping parameters to be the same as
> > the host. This lets us have our time synchronized with the host's,
> > even in the presence of host NTP or suspend.
> >
> > When enabled, kvmclock uses raw tsc instead of pvclock.
> >
> > When enabled, syscalls that can change time, such as settimeofday(2)
> > or adj_timex(2) are disabled in the guest.
> >
> > Signed-off-by: Suleiman Souhlal <suleiman@google.com>
> > ---
> >  arch/x86/Kconfig                |   9 +++
> >  arch/x86/include/asm/kvmclock.h |   2 +
> >  arch/x86/kernel/kvmclock.c      | 127 +++++++++++++++++++++++++++++++-
> >  kernel/time/timekeeping.c       |  21 ++++++
> >  4 files changed, 155 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 4195f44c6a09..37299377d9d7 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -837,6 +837,15 @@ config PARAVIRT_TIME_ACCOUNTING
> >  config PARAVIRT_CLOCK
> >       bool
> >
> > +config KVMCLOCK_HOST_TIMEKEEPING
> > +     bool "kvmclock uses host timekeeping"
> > +     depends on KVM_GUEST
> > +     ---help---
> > +       Select this option to make the guest use the same timekeeping
> > +       parameters as the host. This means that time will be almost
> > +       exactly the same between the two. Only works if the host uses "tsc"
> > +       clocksource.
> > +
>
> I'd also like to speak up against this config, it is confusing. In case
> the goal is to come up with a TSC-based clock for guests which will
> return the same as clock_gettime() on the host (or, is the goal to just
> have the same reading for all guests on the host?) I'd suggest we create
> a separate (from KVMCLOCK) clocksource (mirroring host timekeeper) and
> guests will be free to pick the one they like.

Fair enough. I'll do that in the next version of the patch.

The goal is to have a guest clock that gives the same
clock_gettime(CLOCK_MONOTONIC) as the host.

Thanks,
-- Suleiman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] x86/kvmclock: Use host timekeeping.
  2019-09-24  8:10     ` Suleiman Souhlal
@ 2019-09-24 11:14       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 9+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-24 11:14 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: John Stultz, sboyd, Linux Kernel, kvm, Paolo Bonzini, rkrcmar,
	Thomas Gleixner, Tomasz Figa

Suleiman Souhlal <suleiman@google.com> writes:

> On Fri, Sep 20, 2019 at 10:33 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>
>> Suleiman Souhlal <suleiman@google.com> writes:
>>
>> > When CONFIG_KVMCLOCK_HOST_TIMEKEEPING is enabled, and the host
>> > supports it, update our timekeeping parameters to be the same as
>> > the host. This lets us have our time synchronized with the host's,
>> > even in the presence of host NTP or suspend.
>> >
>> > When enabled, kvmclock uses raw tsc instead of pvclock.
>> >
>> > When enabled, syscalls that can change time, such as settimeofday(2)
>> > or adj_timex(2) are disabled in the guest.
>> >
>> > Signed-off-by: Suleiman Souhlal <suleiman@google.com>
>> > ---
>> >  arch/x86/Kconfig                |   9 +++
>> >  arch/x86/include/asm/kvmclock.h |   2 +
>> >  arch/x86/kernel/kvmclock.c      | 127 +++++++++++++++++++++++++++++++-
>> >  kernel/time/timekeeping.c       |  21 ++++++
>> >  4 files changed, 155 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> > index 4195f44c6a09..37299377d9d7 100644
>> > --- a/arch/x86/Kconfig
>> > +++ b/arch/x86/Kconfig
>> > @@ -837,6 +837,15 @@ config PARAVIRT_TIME_ACCOUNTING
>> >  config PARAVIRT_CLOCK
>> >       bool
>> >
>> > +config KVMCLOCK_HOST_TIMEKEEPING
>> > +     bool "kvmclock uses host timekeeping"
>> > +     depends on KVM_GUEST
>> > +     ---help---
>> > +       Select this option to make the guest use the same timekeeping
>> > +       parameters as the host. This means that time will be almost
>> > +       exactly the same between the two. Only works if the host uses "tsc"
>> > +       clocksource.
>> > +
>>
>> I'd also like to speak up against this config, it is confusing. In case
>> the goal is to come up with a TSC-based clock for guests which will
>> return the same as clock_gettime() on the host (or, is the goal to just
>> have the same reading for all guests on the host?) I'd suggest we create
>> a separate (from KVMCLOCK) clocksource (mirroring host timekeeper) and
>> guests will be free to pick the one they like.
>
> Fair enough. I'll do that in the next version of the patch.
>
> The goal is to have a guest clock that gives the same
> clock_gettime(CLOCK_MONOTONIC) as the host.
>

KVMCLOCK has lots of legacy derived from times when TSC synchronization
was not a given (I heard that this is still sometimes problematic with
multi-socket systems but oh well). If I was to design a new clock I'd
probably mirror Hyper-V's TSC page clocksource invalidating the page
when host timekeeper values are updated (and making guest spin).

The tricky part with this approach is probably tsc scaling/tsc
offsetting which is still going to be controlled by kvmclock (so the
guest has no option to read 'pure TSC'). As an alternative, you can make
kvmclock un-pluggable so when it's not enabled TSC frequency/offset can
remain intact. You can, of course, try to update timekeeper values to
match the new frequency/offset every time they change but rounding
errors may bite.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-24 11:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-20  6:27 [RFC 0/2] kvm: Use host timekeeping in guest Suleiman Souhlal
2019-09-20  6:27 ` [RFC 1/2] kvm: Mechanism to copy host timekeeping parameters into guest Suleiman Souhlal
2019-09-20  6:27 ` [RFC 2/2] x86/kvmclock: Use host timekeeping Suleiman Souhlal
2019-09-20 13:33   ` Vitaly Kuznetsov
2019-09-24  8:10     ` Suleiman Souhlal
2019-09-24 11:14       ` Vitaly Kuznetsov
2019-09-20  7:48 ` [RFC 0/2] kvm: Use host timekeeping in guest Paolo Bonzini
2019-09-20 10:23   ` Thomas Gleixner
2019-09-24  8:08     ` Suleiman Souhlal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).