All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/15] pvclock vsyscall support + KVM hypervisor support
@ 2012-10-16 17:56 Marcelo Tosatti
  2012-10-16 17:56 ` [patch 01/15] KVM: x86: retain pvclock guest stopped bit in guest memory Marcelo Tosatti
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy

This patchset, based on earlier work by Jeremy Fitzhardinge, implements
paravirtual clock vsyscall support.

It should be possible to implement Xen support relatively easily.

It reduces clock_gettime from 500 cycles to 200 cycles 
on my testbox (including an mfence, that measurement).

NOTE: exporting pvclock tsc stable bit if the guest TSCs are
not synchronized is incorrect, next version will have that fixed.

Please review.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 01/15] KVM: x86: retain pvclock guest stopped bit in guest memory
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 02/15] x86: pvclock: make sure rdtsc doesnt speculate out of region Marcelo Tosatti
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: x86-kvm-retain-guest-stopped.patch --]
[-- Type: text/plain, Size: 1659 bytes --]

Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kvm/x86.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/x86.c
+++ vsyscall/arch/x86/kvm/x86.c
@@ -1143,6 +1143,7 @@ static int kvm_guest_time_update(struct 
 	unsigned long this_tsc_khz;
 	s64 kernel_ns, max_kernel_ns;
 	u64 tsc_timestamp;
+	struct pvclock_vcpu_time_info *guest_hv_clock;
 	u8 pvclock_flags;
 
 	/* Keep irq disabled to prevent changes to the clock */
@@ -1226,13 +1227,6 @@ static int kvm_guest_time_update(struct 
 	vcpu->last_kernel_ns = kernel_ns;
 	vcpu->last_guest_tsc = tsc_timestamp;
 
-	pvclock_flags = 0;
-	if (vcpu->pvclock_set_guest_stopped_request) {
-		pvclock_flags |= PVCLOCK_GUEST_STOPPED;
-		vcpu->pvclock_set_guest_stopped_request = false;
-	}
-
-	vcpu->hv_clock.flags = pvclock_flags;
 
 	/*
 	 * The interface expects us to write an even number signaling that the
@@ -1243,6 +1237,18 @@ static int kvm_guest_time_update(struct 
 
 	shared_kaddr = kmap_atomic(vcpu->time_page);
 
+	guest_hv_clock = shared_kaddr + vcpu->time_offset;
+
+	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
+	pvclock_flags = (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
+
+	if (vcpu->pvclock_set_guest_stopped_request) {
+		pvclock_flags |= PVCLOCK_GUEST_STOPPED;
+		vcpu->pvclock_set_guest_stopped_request = false;
+	}
+
+	vcpu->hv_clock.flags = pvclock_flags;
+
 	memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
 	       sizeof(vcpu->hv_clock));
 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 02/15] x86: pvclock: make sure rdtsc doesnt speculate out of region
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
  2012-10-16 17:56 ` [patch 01/15] KVM: x86: retain pvclock guest stopped bit in guest memory Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 03/15] x86: pvclock: remove pvclock_shadow_time Marcelo Tosatti
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 01-pvclock-read-rdtsc-barrier --]
[-- Type: text/plain, Size: 745 bytes --]

Originally from Jeremy Fitzhardinge.

pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kernel/pvclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/pvclock.c
+++ vsyscall/arch/x86/kernel/pvclock.c
@@ -97,10 +97,10 @@ cycle_t pvclock_clocksource_read(struct 
 
 	do {
 		version = pvclock_get_time_values(&shadow, src);
-		barrier();
+		rdtsc_barrier();
 		offset = pvclock_get_nsec_offset(&shadow);
 		ret = shadow.system_timestamp + offset;
-		barrier();
+		rdtsc_barrier();
 	} while (version != src->version);
 
 	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 03/15] x86: pvclock: remove pvclock_shadow_time
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
  2012-10-16 17:56 ` [patch 01/15] KVM: x86: retain pvclock guest stopped bit in guest memory Marcelo Tosatti
  2012-10-16 17:56 ` [patch 02/15] x86: pvclock: make sure rdtsc doesnt speculate out of region Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 04/15] x86: pvclock: create helper for pvclock data retrieval Marcelo Tosatti
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 02-pvclock-remove-shadow-time --]
[-- Type: text/plain, Size: 2949 bytes --]

Originally from Jeremy Fitzhardinge.

We can copy the information directly from "struct pvclock_vcpu_time_info", 
remove pvclock_shadow_time.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kernel/pvclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/pvclock.c
+++ vsyscall/arch/x86/kernel/pvclock.c
@@ -19,21 +19,6 @@
 #include <linux/percpu.h>
 #include <asm/pvclock.h>
 
-/*
- * These are perodically updated
- *    xen: magic shared_info page
- *    kvm: gpa registered via msr
- * and then copied here.
- */
-struct pvclock_shadow_time {
-	u64 tsc_timestamp;     /* TSC at last update of time vals.  */
-	u64 system_timestamp;  /* Time, in nanosecs, since boot.    */
-	u32 tsc_to_nsec_mul;
-	int tsc_shift;
-	u32 version;
-	u8  flags;
-};
-
 static u8 valid_flags __read_mostly = 0;
 
 void pvclock_set_flags(u8 flags)
@@ -41,32 +26,11 @@ void pvclock_set_flags(u8 flags)
 	valid_flags = flags;
 }
 
-static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
+static u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
 {
-	u64 delta = native_read_tsc() - shadow->tsc_timestamp;
-	return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
-				   shadow->tsc_shift);
-}
-
-/*
- * Reads a consistent set of time-base values from hypervisor,
- * into a shadow data area.
- */
-static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
-					struct pvclock_vcpu_time_info *src)
-{
-	do {
-		dst->version = src->version;
-		rmb();		/* fetch version before data */
-		dst->tsc_timestamp     = src->tsc_timestamp;
-		dst->system_timestamp  = src->system_time;
-		dst->tsc_to_nsec_mul   = src->tsc_to_system_mul;
-		dst->tsc_shift         = src->tsc_shift;
-		dst->flags             = src->flags;
-		rmb();		/* test version after fetching data */
-	} while ((src->version & 1) || (dst->version != src->version));
-
-	return dst->version;
+	u64 delta = native_read_tsc() - src->tsc_timestamp;
+	return pvclock_scale_delta(delta, src->tsc_to_system_mul,
+				   src->tsc_shift);
 }
 
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
@@ -90,21 +54,20 @@ void pvclock_resume(void)
 
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
-	struct pvclock_shadow_time shadow;
 	unsigned version;
 	cycle_t ret, offset;
 	u64 last;
 
 	do {
-		version = pvclock_get_time_values(&shadow, src);
+		version = src->version;
 		rdtsc_barrier();
-		offset = pvclock_get_nsec_offset(&shadow);
-		ret = shadow.system_timestamp + offset;
+		offset = pvclock_get_nsec_offset(src);
+		ret = src->system_time + offset;
 		rdtsc_barrier();
-	} while (version != src->version);
+	} while ((src->version & 1) || version != src->version);
 
 	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
-		(shadow.flags & PVCLOCK_TSC_STABLE_BIT))
+		(src->flags & PVCLOCK_TSC_STABLE_BIT))
 		return ret;
 
 	/*



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 04/15] x86: pvclock: create helper for pvclock data retrieval
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2012-10-16 17:56 ` [patch 03/15] x86: pvclock: remove pvclock_shadow_time Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 05/15] x86: pvclock: fix flags usage race Marcelo Tosatti
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 03-move-pvread-to-pvheader --]
[-- Type: text/plain, Size: 2146 bytes --]

Originally from Jeremy Fitzhardinge.

So code can be reused.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kernel/pvclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/pvclock.c
+++ vsyscall/arch/x86/kernel/pvclock.c
@@ -26,13 +26,6 @@ void pvclock_set_flags(u8 flags)
 	valid_flags = flags;
 }
 
-static u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
-{
-	u64 delta = native_read_tsc() - src->tsc_timestamp;
-	return pvclock_scale_delta(delta, src->tsc_to_system_mul,
-				   src->tsc_shift);
-}
-
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
 {
 	u64 pv_tsc_khz = 1000000ULL << 32;
@@ -55,15 +48,11 @@ void pvclock_resume(void)
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
 	unsigned version;
-	cycle_t ret, offset;
+	cycle_t ret;
 	u64 last;
 
 	do {
-		version = src->version;
-		rdtsc_barrier();
-		offset = pvclock_get_nsec_offset(src);
-		ret = src->system_time + offset;
-		rdtsc_barrier();
+		version = __pvclock_read_cycles(src, &ret);
 	} while ((src->version & 1) || version != src->version);
 
 	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
Index: vsyscall/arch/x86/include/asm/pvclock.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/pvclock.h
+++ vsyscall/arch/x86/include/asm/pvclock.h
@@ -56,4 +56,29 @@ static inline u64 pvclock_scale_delta(u6
 	return product;
 }
 
+static __always_inline
+u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
+{
+	u64 delta = __native_read_tsc() - src->tsc_timestamp;
+	return pvclock_scale_delta(delta, src->tsc_to_system_mul,
+				   src->tsc_shift);
+}
+
+static __always_inline
+unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
+			       cycle_t *cycles)
+{
+	unsigned version;
+	cycle_t ret, offset;
+
+	version = src->version;
+	rdtsc_barrier();
+	offset = pvclock_get_nsec_offset(src);
+	ret = src->system_time + offset;
+	rdtsc_barrier();
+
+	*cycles = ret;
+	return version;
+}
+
 #endif /* _ASM_X86_PVCLOCK_H */



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 05/15] x86: pvclock: fix flags usage race
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2012-10-16 17:56 ` [patch 04/15] x86: pvclock: create helper for pvclock data retrieval Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 06/15] x86: pvclock: introduce helper to read flags Marcelo Tosatti
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 04-pvclock-read-cycles-return-flags --]
[-- Type: text/plain, Size: 1640 bytes --]

Validity of values returned by pvclock (including flags) is guaranteed by version 
checks.

That is, read of src->flags outside version check protection can refer
to a different paravirt clock update by the hypervisor.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/include/asm/pvclock.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/pvclock.h
+++ vsyscall/arch/x86/include/asm/pvclock.h
@@ -66,18 +66,21 @@ u64 pvclock_get_nsec_offset(const struct
 
 static __always_inline
 unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
-			       cycle_t *cycles)
+			       cycle_t *cycles, u8 *flags)
 {
 	unsigned version;
 	cycle_t ret, offset;
+	u8 ret_flags;
 
 	version = src->version;
 	rdtsc_barrier();
 	offset = pvclock_get_nsec_offset(src);
 	ret = src->system_time + offset;
+	ret_flags = src->flags;
 	rdtsc_barrier();
 
 	*cycles = ret;
+	*flags = ret_flags;
 	return version;
 }
 
Index: vsyscall/arch/x86/kernel/pvclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/pvclock.c
+++ vsyscall/arch/x86/kernel/pvclock.c
@@ -50,13 +50,14 @@ cycle_t pvclock_clocksource_read(struct 
 	unsigned version;
 	cycle_t ret;
 	u64 last;
+	u8 flags;
 
 	do {
-		version = __pvclock_read_cycles(src, &ret);
+		version = __pvclock_read_cycles(src, &ret, &flags);
 	} while ((src->version & 1) || version != src->version);
 
 	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
-		(src->flags & PVCLOCK_TSC_STABLE_BIT))
+		(flags & PVCLOCK_TSC_STABLE_BIT))
 		return ret;
 
 	/*



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 06/15] x86: pvclock: introduce helper to read flags
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (4 preceding siblings ...)
  2012-10-16 17:56 ` [patch 05/15] x86: pvclock: fix flags usage race Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 07/15] sched: add notifier for cross-cpu migrations Marcelo Tosatti
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 05-pvclock-add-get-flags --]
[-- Type: text/plain, Size: 1278 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: vsyscall/arch/x86/kernel/pvclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/pvclock.c
+++ vsyscall/arch/x86/kernel/pvclock.c
@@ -45,6 +45,19 @@ void pvclock_resume(void)
 	atomic64_set(&last_value, 0);
 }
 
+u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
+{
+	unsigned version;
+	cycle_t ret;
+	u8 flags;
+
+	do {
+		version = __pvclock_read_cycles(src, &ret, &flags);
+	} while ((src->version & 1) || version != src->version);
+
+	return flags & valid_flags;
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
 	unsigned version;
Index: vsyscall/arch/x86/include/asm/pvclock.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/pvclock.h
+++ vsyscall/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
 void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 07/15] sched: add notifier for cross-cpu migrations
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (5 preceding siblings ...)
  2012-10-16 17:56 ` [patch 06/15] x86: pvclock: introduce helper to read flags Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 08/15] x86: pvclock: generic pvclock vsyscall initialization Marcelo Tosatti
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 06-add-task-migration-notifier --]
[-- Type: text/plain, Size: 1732 bytes --]

Originally from Jeremy Fitzhardinge.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/include/linux/sched.h
===================================================================
--- vsyscall.orig/include/linux/sched.h
+++ vsyscall/include/linux/sched.h
@@ -107,6 +107,14 @@ extern unsigned long this_cpu_load(void)
 extern void calc_global_load(unsigned long ticks);
 extern void update_cpu_load_nohz(void);
 
+/* Notifier for when a task gets migrated to a new CPU */
+struct task_migration_notifier {
+	struct task_struct *task;
+	int from_cpu;
+	int to_cpu;
+};
+extern void register_task_migration_notifier(struct notifier_block *n);
+
 extern unsigned long get_parent_ip(unsigned long addr);
 
 struct seq_file;
Index: vsyscall/kernel/sched/core.c
===================================================================
--- vsyscall.orig/kernel/sched/core.c
+++ vsyscall/kernel/sched/core.c
@@ -922,6 +922,13 @@ void check_preempt_curr(struct rq *rq, s
 		rq->skip_clock_update = 1;
 }
 
+static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
+
+void register_task_migration_notifier(struct notifier_block *n)
+{
+	atomic_notifier_chain_register(&task_migration_notifier, n);
+}
+
 #ifdef CONFIG_SMP
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
@@ -952,8 +959,16 @@ void set_task_cpu(struct task_struct *p,
 	trace_sched_migrate_task(p, new_cpu);
 
 	if (task_cpu(p) != new_cpu) {
+		struct task_migration_notifier tmn;
+
 		p->se.nr_migrations++;
 		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, NULL, 0);
+
+		tmn.task = p;
+		tmn.from_cpu = task_cpu(p);
+		tmn.to_cpu = new_cpu;
+
+		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
 	}
 
 	__set_task_cpu(p, new_cpu);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 08/15] x86: pvclock: generic pvclock vsyscall initialization
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (6 preceding siblings ...)
  2012-10-16 17:56 ` [patch 07/15] sched: add notifier for cross-cpu migrations Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 09/15] x86: kvm guest: pvclock vsyscall support Marcelo Tosatti
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 07-add-pvclock-structs-and-fixmap --]
[-- Type: text/plain, Size: 5321 bytes --]

Originally from Jeremy Fitzhardinge.

Introduce generic, non hypervisor specific, pvclock initialization 
routines.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kernel/pvclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/pvclock.c
+++ vsyscall/arch/x86/kernel/pvclock.c
@@ -17,6 +17,10 @@
 
 #include <linux/kernel.h>
 #include <linux/percpu.h>
+#include <linux/notifier.h>
+#include <linux/sched.h>
+#include <linux/gfp.h>
+#include <linux/bootmem.h>
 #include <asm/pvclock.h>
 
 static u8 valid_flags __read_mostly = 0;
@@ -122,3 +126,70 @@ void pvclock_read_wallclock(struct pvclo
 
 	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
 }
+
+#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
+
+static aligned_pvti_t *pvclock_vdso_info;
+
+static struct pvclock_vsyscall_time_info *pvclock_get_vsyscall_user_time_info(int cpu)
+{
+	if (pvclock_vdso_info == NULL) {
+		BUG();
+		return NULL;
+	}
+
+	return &pvclock_vdso_info[cpu].info;
+}
+
+struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu)
+{
+	return &pvclock_get_vsyscall_user_time_info(cpu)->pvti;
+}
+
+int pvclock_task_migrate(struct notifier_block *nb, unsigned long l, void *v)
+{
+	struct task_migration_notifier *mn = v;
+	struct pvclock_vsyscall_time_info *pvti;
+
+	pvti = pvclock_get_vsyscall_user_time_info(mn->from_cpu);
+
+	if (pvti == NULL)
+		return NOTIFY_DONE;
+
+	pvti->migrate_count++;
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block pvclock_migrate = {
+	.notifier_call = pvclock_task_migrate,
+};
+
+/*
+ * Initialize the generic pvclock vsyscall state.  This will allocate
+ * a/some page(s) for the per-vcpu pvclock information, set up a
+ * fixmap mapping for the page(s)
+ */
+int __init pvclock_init_vsyscall(void)
+{
+	int idx;
+	unsigned int size = PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE;
+
+	pvclock_vdso_info = __alloc_bootmem(size, PAGE_SIZE, 0);
+	if (!pvclock_vdso_info)
+		return -ENOMEM;
+
+	memset(pvclock_vdso_info, 0, size);
+
+	for (idx = 0; idx <= (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
+		__set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
+			     __pa_symbol(pvclock_vdso_info) + (idx*PAGE_SIZE),
+		     	     PAGE_KERNEL_VVAR);
+	}
+
+	register_task_migration_notifier(&pvclock_migrate);
+
+	return 0;
+}
+
+#endif /* CONFIG_PARAVIRT_CLOCK_VSYSCALL */
Index: vsyscall/arch/x86/include/asm/fixmap.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/fixmap.h
+++ vsyscall/arch/x86/include/asm/fixmap.h
@@ -19,6 +19,7 @@
 #include <asm/acpi.h>
 #include <asm/apicdef.h>
 #include <asm/page.h>
+#include <asm/pvclock.h>
 #ifdef CONFIG_X86_32
 #include <linux/threads.h>
 #include <asm/kmap_types.h>
@@ -81,6 +82,10 @@ enum fixed_addresses {
 	VVAR_PAGE,
 	VSYSCALL_HPET,
 #endif
+#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
+	PVCLOCK_FIXMAP_BEGIN,
+	PVCLOCK_FIXMAP_END = PVCLOCK_FIXMAP_BEGIN+PVCLOCK_VSYSCALL_NR_PAGES-1,
+#endif
 	FIX_DBGP_BASE,
 	FIX_EARLYCON_MEM_BASE,
 #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
Index: vsyscall/arch/x86/include/asm/pvclock.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/pvclock.h
+++ vsyscall/arch/x86/include/asm/pvclock.h
@@ -13,6 +13,8 @@ void pvclock_read_wallclock(struct pvclo
 			    struct pvclock_vcpu_time_info *vcpu,
 			    struct timespec *ts);
 void pvclock_resume(void);
+int __init pvclock_init_vsyscall(void);
+struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu);
 
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
@@ -85,4 +87,24 @@ unsigned __pvclock_read_cycles(const str
 	return version;
 }
 
+#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
+
+struct pvclock_vsyscall_time_info {
+	struct pvclock_vcpu_time_info pvti;
+	u32 migrate_count;
+};
+
+typedef union {
+	struct pvclock_vsyscall_time_info info;
+	char pad[SMP_CACHE_BYTES];
+} aligned_pvti_t ____cacheline_aligned;
+
+#define PVTI_SIZE sizeof(aligned_pvti_t)
+#if NR_CPUS == 1
+#define PVCLOCK_VSYSCALL_NR_PAGES 1
+#else
+#define PVCLOCK_VSYSCALL_NR_PAGES ((NR_CPUS-1)/(PAGE_SIZE/PVTI_SIZE))+1
+#endif
+#endif /* CONFIG_PARAVIRT_CLOCK_VSYSCALL */
+
 #endif /* _ASM_X86_PVCLOCK_H */
Index: vsyscall/arch/x86/include/asm/clocksource.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/clocksource.h
+++ vsyscall/arch/x86/include/asm/clocksource.h
@@ -8,6 +8,7 @@
 #define VCLOCK_NONE 0  /* No vDSO clock available.	*/
 #define VCLOCK_TSC  1  /* vDSO should use vread_tsc.	*/
 #define VCLOCK_HPET 2  /* vDSO should use vread_hpet.	*/
+#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */
 
 struct arch_clocksource_data {
 	int vclock_mode;
Index: vsyscall/arch/x86/Kconfig
===================================================================
--- vsyscall.orig/arch/x86/Kconfig
+++ vsyscall/arch/x86/Kconfig
@@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS
 
 config PARAVIRT_CLOCK
 	bool
+config PARAVIRT_CLOCK_VSYSCALL
+	bool "Paravirt clock vsyscall support"
+	depends on PARAVIRT_CLOCK && GENERIC_TIME_VSYSCALL
+	---help---
+	  Enable performance critical clock related system calls to
+	  be executed in userspace, provided that the hypervisor
+	  supports it.
 
 endif
 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 09/15] x86: kvm guest: pvclock vsyscall support
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (7 preceding siblings ...)
  2012-10-16 17:56 ` [patch 08/15] x86: pvclock: generic pvclock vsyscall initialization Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 10/15] x86: vsyscall: pass mode to gettime backend Marcelo Tosatti
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 08-add-pvclock-vsyscall-kvm-support --]
[-- Type: text/plain, Size: 4252 bytes --]

Allow hypervisor to update userspace visible copy of
pvclock data.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kernel/kvmclock.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/kvmclock.c
+++ vsyscall/arch/x86/kernel/kvmclock.c
@@ -31,6 +31,9 @@ static int kvmclock = 1;
 static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
 static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
 
+/* set when the generic vsyscall pvclock elements are setup */
+bool vsyscall_clock_initializable = false;
+
 static int parse_no_kvmclock(char *arg)
 {
 	kvmclock = 0;
@@ -151,6 +154,28 @@ int kvm_register_clock(char *txt)
 	return ret;
 }
 
+static int kvm_register_vsyscall_clock(char *txt)
+{
+#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
+	int cpu = smp_processor_id();
+	int low, high, ret;
+	struct pvclock_vcpu_time_info *info;
+
+	info = pvclock_get_vsyscall_time_info(cpu);
+
+	low = (int)__pa(info) | 1;
+	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+	ret = native_write_msr_safe(MSR_KVM_USERSPACE_TIME, low, high);
+	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
+	       cpu, high, low, txt);
+
+	return ret;
+#else
+	return 0;
+#endif
+}
+
+
 static void kvm_save_sched_clock_state(void)
 {
 }
@@ -158,6 +183,8 @@ static void kvm_save_sched_clock_state(v
 static void kvm_restore_sched_clock_state(void)
 {
 	kvm_register_clock("primary cpu clock, resume");
+	if (vsyscall_clock_initializable)
+		kvm_register_vsyscall_clock("primary cpu vsyscall clock, resume");
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -168,6 +195,8 @@ static void __cpuinit kvm_setup_secondar
 	 * we shouldn't fail.
 	 */
 	WARN_ON(kvm_register_clock("secondary cpu clock"));
+	if (vsyscall_clock_initializable)
+		kvm_register_vsyscall_clock("secondary cpu vsyscall clock");
 }
 #endif
 
@@ -182,6 +211,8 @@ static void __cpuinit kvm_setup_secondar
 #ifdef CONFIG_KEXEC
 static void kvm_crash_shutdown(struct pt_regs *regs)
 {
+	if (vsyscall_clock_initializable)
+		native_write_msr(MSR_KVM_USERSPACE_TIME, 0, 0);
 	native_write_msr(msr_kvm_system_time, 0, 0);
 	kvm_disable_steal_time();
 	native_machine_crash_shutdown(regs);
@@ -190,6 +221,8 @@ static void kvm_crash_shutdown(struct pt
 
 static void kvm_shutdown(void)
 {
+	if (vsyscall_clock_initializable)
+		native_write_msr(MSR_KVM_USERSPACE_TIME, 0, 0);
 	native_write_msr(msr_kvm_system_time, 0, 0);
 	kvm_disable_steal_time();
 	native_machine_shutdown();
@@ -233,3 +266,27 @@ void __init kvmclock_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
 		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 }
+
+int kvm_setup_vsyscall_timeinfo(void)
+{
+#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
+	int ret;
+	struct pvclock_vcpu_time_info *vcpu_time;
+	u8 flags;
+
+	vcpu_time = &get_cpu_var(hv_clock);
+	flags = pvclock_read_flags(vcpu_time);
+	put_cpu_var(hv_clock);
+
+	if (!(flags & PVCLOCK_TSC_STABLE_BIT))
+		return 1;
+
+	if ((ret = pvclock_init_vsyscall()))
+		return ret;
+
+	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
+	vsyscall_clock_initialized = true;
+#endif /* CONFIG_PARAVIRT_CLOCK_VSYSCALL */
+	return 0;
+}
+
Index: vsyscall/arch/x86/kernel/kvm.c
===================================================================
--- vsyscall.orig/arch/x86/kernel/kvm.c
+++ vsyscall/arch/x86/kernel/kvm.c
@@ -42,6 +42,7 @@
 #include <asm/apic.h>
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
+#include <asm/kvm_guest.h>
 
 static int kvmapf = 1;
 
@@ -468,6 +469,9 @@ void __init kvm_guest_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
+	if (kvm_para_has_feature(KVM_FEATURE_USERSPACE_CLOCKSOURCE))
+		kvm_setup_vsyscall_timeinfo();
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 	register_cpu_notifier(&kvm_cpu_notifier);
Index: vsyscall/arch/x86/include/asm/kvm_guest.h
===================================================================
--- /dev/null
+++ vsyscall/arch/x86/include/asm/kvm_guest.h
@@ -0,0 +1,8 @@
+#ifndef _ASM_X86_KVM_GUEST_H
+#define _ASM_X86_KVM_GUEST_H
+
+extern bool vsyscall_clock_initializable;
+
+int kvm_setup_vsyscall_timeinfo(void);
+
+#endif /* _ASM_X86_KVM_GUEST_H */



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 10/15] x86: vsyscall: pass mode to gettime backend
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (8 preceding siblings ...)
  2012-10-16 17:56 ` [patch 09/15] x86: kvm guest: pvclock vsyscall support Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 11/15] x86: vdso: pvclock gettime support Marcelo Tosatti
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 09-vclock-gettime-return-mode --]
[-- Type: text/plain, Size: 1109 bytes --]

Required by next patch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/vdso/vclock_gettime.c
===================================================================
--- vsyscall.orig/arch/x86/vdso/vclock_gettime.c
+++ vsyscall/arch/x86/vdso/vclock_gettime.c
@@ -80,7 +80,7 @@ notrace static long vdso_fallback_gtod(s
 }
 
 
-notrace static inline u64 vgetsns(void)
+notrace static inline u64 vgetsns(int *mode)
 {
 	long v;
 	cycles_t cycles;
@@ -107,7 +107,7 @@ notrace static int __always_inline do_re
 		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->wall_time_sec;
 		ns = gtod->wall_time_snsec;
-		ns += vgetsns();
+		ns += vgetsns(&mode);
 		ns >>= gtod->clock.shift;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
 
@@ -127,7 +127,7 @@ notrace static int do_monotonic(struct t
 		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->monotonic_time_sec;
 		ns = gtod->monotonic_time_snsec;
-		ns += vgetsns();
+		ns += vgetsns(&mode);
 		ns >>= gtod->clock.shift;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
 	timespec_add_ns(ts, ns);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 11/15] x86: vdso: pvclock gettime support
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (9 preceding siblings ...)
  2012-10-16 17:56 ` [patch 10/15] x86: vsyscall: pass mode to gettime backend Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 12/15] KVM: x86: introduce facility to support vsyscall pvclock, via MSR Marcelo Tosatti
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 10-add-pvclock-vdso-code --]
[-- Type: text/plain, Size: 4010 bytes --]

Improve performance of time system calls when using Linux pvclock, 
by reading time info from fixmap visible copy of pvclock data.

Originally from Jeremy Fitzhardinge.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/vdso/vclock_gettime.c
===================================================================
--- vsyscall.orig/arch/x86/vdso/vclock_gettime.c
+++ vsyscall/arch/x86/vdso/vclock_gettime.c
@@ -22,6 +22,7 @@
 #include <asm/hpet.h>
 #include <asm/unistd.h>
 #include <asm/io.h>
+#include <asm/pvclock.h>
 
 #define gtod (&VVAR(vsyscall_gtod_data))
 
@@ -62,6 +63,69 @@ static notrace cycle_t vread_hpet(void)
 	return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
 }
 
+#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
+
+static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
+{
+	const aligned_pvti_t *pvti_base;
+	int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
+	int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
+
+	BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx > PVCLOCK_FIXMAP_END);
+
+	pvti_base = (aligned_pvti_t *)__fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
+
+	return &pvti_base[offset].info;
+}
+
+static notrace cycle_t vread_pvclock(int *mode)
+{
+	const struct pvclock_vsyscall_time_info *pvti;
+	cycle_t ret;
+	u64 last;
+	u32 version;
+	u32 migrate_count;
+	u8 flags;
+	unsigned cpu, cpu1;
+
+
+	/*
+	 * When looping to get a consistent (time-info, tsc) pair, we
+	 * also need to deal with the possibility we can switch vcpus,
+	 * so make sure we always re-fetch time-info for the current vcpu.
+	 */
+	do {
+		cpu = __getcpu() & 0xfff;
+		pvti = get_pvti(cpu);
+
+		migrate_count = pvti->migrate_count;
+
+		version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
+
+		/*
+		 * Test we're still on the cpu as well as the version.
+		 * We could have been migrated just after the first
+		 * vgetcpu but before fetching the version, so we
+		 * wouldn't notice a version change.
+		 */
+		cpu1 = __getcpu() & 0xfff;
+	} while (unlikely(cpu != cpu1 ||
+			  (pvti->pvti.version & 1) ||
+			  pvti->pvti.version != version ||
+			  pvti->migrate_count != migrate_count));
+
+	if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
+		*mode = VCLOCK_NONE;
+
+	last = VVAR(vsyscall_gtod_data).clock.cycle_last;
+
+	if (likely(ret >= last))
+		return ret;
+
+	return last;
+}
+#endif
+
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
 {
 	long ret;
@@ -88,6 +152,8 @@ notrace static inline u64 vgetsns(int *m
 		cycles = vread_tsc();
 	else if (gtod->clock.vclock_mode == VCLOCK_HPET)
 		cycles = vread_hpet();
+	else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
+		cycles = vread_pvclock(mode);
 	else
 		return 0;
 	v = (cycles - gtod->clock.cycle_last) & gtod->clock.mask;
Index: vsyscall/arch/x86/include/asm/vsyscall.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/vsyscall.h
+++ vsyscall/arch/x86/include/asm/vsyscall.h
@@ -33,6 +33,21 @@ extern void map_vsyscall(void);
  */
 extern bool emulate_vsyscall(struct pt_regs *regs, unsigned long address);
 
+static inline unsigned int __getcpu(void)
+{
+	unsigned int p;
+
+	if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
+		/* Load per CPU data from RDTSCP */
+		native_read_tscp(&p);
+	} else {
+		/* Load per CPU data from GDT */
+		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+	}
+
+	return p;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_VSYSCALL_H */
Index: vsyscall/arch/x86/vdso/vgetcpu.c
===================================================================
--- vsyscall.orig/arch/x86/vdso/vgetcpu.c
+++ vsyscall/arch/x86/vdso/vgetcpu.c
@@ -17,13 +17,8 @@ __vdso_getcpu(unsigned *cpu, unsigned *n
 {
 	unsigned int p;
 
-	if (VVAR(vgetcpu_mode) == VGETCPU_RDTSCP) {
-		/* Load per CPU data from RDTSCP */
-		native_read_tscp(&p);
-	} else {
-		/* Load per CPU data from GDT */
-		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
-	}
+	p = __getcpu();
+
 	if (cpu)
 		*cpu = p & 0xfff;
 	if (node)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 12/15] KVM: x86: introduce facility to support vsyscall pvclock, via MSR
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (10 preceding siblings ...)
  2012-10-16 17:56 ` [patch 11/15] x86: vdso: pvclock gettime support Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 13/15] KVM: x86: pass host_tsc to read_l1_tsc Marcelo Tosatti
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 11-host-add-userspace-time-msr --]
[-- Type: text/plain, Size: 9580 bytes --]

Allow a guest to register a second location for the VCPU time info

structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
This is intended to allow the guest kernel to map this information
into a usermode accessible page, so that usermode can efficiently
calculate system time from the TSC without having to make a syscall.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/include/asm/kvm_para.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/kvm_para.h
+++ vsyscall/arch/x86/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
+#define KVM_FEATURE_USERSPACE_CLOCKSOURCE 7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -39,6 +40,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
+#define MSR_KVM_USERSPACE_TIME      0x4b564d05
 
 struct kvm_steal_time {
 	__u64 steal;
Index: vsyscall/Documentation/virtual/kvm/msr.txt
===================================================================
--- vsyscall.orig/Documentation/virtual/kvm/msr.txt
+++ vsyscall/Documentation/virtual/kvm/msr.txt
@@ -125,6 +125,22 @@ MSR_KVM_SYSTEM_TIME_NEW:  0x4b564d01
 	Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
 	leaf prior to usage.
 
+MSR_KVM_USERSPACE_TIME:  0x4b564d05
+
+Allow a guest to register a second location for the VCPU time info
+structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
+This is intended to allow the guest kernel to map this information
+into a usermode accessible page, so that usermode can efficiently
+calculate system time from the TSC without having to make a syscall.
+
+Relationship with master copy (MSR_KVM_SYSTEM_TIME_NEW):
+
+- This MSR must be enabled only when the master is enabled.
+- Disabling updates to the master automatically disables
+updates for this copy.
+
+Availability of this MSR must be checked via bit 7 in 0x4000001 cpuid
+leaf prior to usage.
 
 MSR_KVM_WALL_CLOCK:  0x11
 
Index: vsyscall/arch/x86/include/asm/kvm_host.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/kvm_host.h
+++ vsyscall/arch/x86/include/asm/kvm_host.h
@@ -415,10 +415,13 @@ struct kvm_vcpu_arch {
 	int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
 
 	gpa_t time;
+	gpa_t uspace_time;
 	struct pvclock_vcpu_time_info hv_clock;
 	unsigned int hw_tsc_khz;
 	unsigned int time_offset;
+	unsigned int uspace_time_offset;
 	struct page *time_page;
+	struct page *uspace_time_page;
 	/* set guest stopped flag in pvclock flags field */
 	bool pvclock_set_guest_stopped_request;
 
Index: vsyscall/arch/x86/kvm/x86.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/x86.c
+++ vsyscall/arch/x86/kvm/x86.c
@@ -809,13 +809,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	10
+#define KVM_SAVE_MSRS_BEGIN	11
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
-	MSR_KVM_PV_EOI_EN,
+	MSR_KVM_PV_EOI_EN, MSR_KVM_USERSPACE_TIME,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1135,16 +1135,43 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
+static void kvm_write_pvtime(struct kvm_vcpu *v, struct page *page,
+			     unsigned int offset_in_page, gpa_t gpa)
+{
+	struct kvm_vcpu_arch *vcpu = &v->arch;
+	void *shared_kaddr;
+	struct pvclock_vcpu_time_info *guest_hv_clock;
+	u8 pvclock_flags;
+
+	shared_kaddr = kmap_atomic(page);
+
+	guest_hv_clock = shared_kaddr + offset_in_page;
+
+	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
+	pvclock_flags = (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
+
+	if (vcpu->pvclock_set_guest_stopped_request) {
+		pvclock_flags |= PVCLOCK_GUEST_STOPPED;
+		vcpu->pvclock_set_guest_stopped_request = false;
+	}
+
+	vcpu->hv_clock.flags = pvclock_flags;
+
+	memcpy(shared_kaddr + offset_in_page, &vcpu->hv_clock,
+	       sizeof(vcpu->hv_clock));
+
+	kunmap_atomic(shared_kaddr);
+
+	mark_page_dirty(v->kvm, gpa >> PAGE_SHIFT);
+}
+
 static int kvm_guest_time_update(struct kvm_vcpu *v)
 {
 	unsigned long flags;
 	struct kvm_vcpu_arch *vcpu = &v->arch;
-	void *shared_kaddr;
 	unsigned long this_tsc_khz;
 	s64 kernel_ns, max_kernel_ns;
 	u64 tsc_timestamp;
-	struct pvclock_vcpu_time_info *guest_hv_clock;
-	u8 pvclock_flags;
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
@@ -1235,26 +1262,11 @@ static int kvm_guest_time_update(struct 
 	 */
 	vcpu->hv_clock.version += 2;
 
-	shared_kaddr = kmap_atomic(vcpu->time_page);
-
-	guest_hv_clock = shared_kaddr + vcpu->time_offset;
-
-	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
-	pvclock_flags = (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
+ 	kvm_write_pvtime(v, vcpu->time_page, vcpu->time_offset, vcpu->time);
+ 	if (vcpu->uspace_time_page)
+ 		kvm_write_pvtime(v, vcpu->uspace_time_page,
+ 				 vcpu->uspace_time_offset, vcpu->uspace_time);
 
-	if (vcpu->pvclock_set_guest_stopped_request) {
-		pvclock_flags |= PVCLOCK_GUEST_STOPPED;
-		vcpu->pvclock_set_guest_stopped_request = false;
-	}
-
-	vcpu->hv_clock.flags = pvclock_flags;
-
-	memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
-	       sizeof(vcpu->hv_clock));
-
-	kunmap_atomic(shared_kaddr);
-
-	mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
 	return 0;
 }
 
@@ -1549,6 +1561,15 @@ static void kvmclock_reset(struct kvm_vc
 	}
 }
 
+static void kvmclock_uspace_reset(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.uspace_time = 0;
+	if (vcpu->arch.uspace_time_page) {
+		kvm_release_page_dirty(vcpu->arch.uspace_time_page);
+		vcpu->arch.uspace_time_page = NULL;
+	}
+}
+
 static void accumulate_steal_time(struct kvm_vcpu *vcpu)
 {
 	u64 delta;
@@ -1639,6 +1660,31 @@ int kvm_set_msr_common(struct kvm_vcpu *
 		vcpu->kvm->arch.wall_clock = data;
 		kvm_write_wall_clock(vcpu->kvm, data);
 		break;
+	case MSR_KVM_USERSPACE_TIME: {
+		kvmclock_uspace_reset(vcpu);
+
+		if (!vcpu->arch.time_page && (data & 1))
+			return 1;
+
+		vcpu->arch.uspace_time = data;
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+
+		/* we verify if the enable bit is set... */
+		if (!(data & 1))
+			break;
+
+		/* ...but clean it before doing the actual write */
+		vcpu->arch.uspace_time_offset = data & ~(PAGE_MASK | 1);
+
+		vcpu->arch.uspace_time_page = gfn_to_page(vcpu->kvm,
+							  data >> PAGE_SHIFT);
+
+		if (is_error_page(vcpu->arch.uspace_time_page)) {
+			kvm_release_page_clean(vcpu->arch.uspace_time_page);
+			vcpu->arch.uspace_time_page = NULL;
+		}
+		break;
+	}
 	case MSR_KVM_SYSTEM_TIME_NEW:
 	case MSR_KVM_SYSTEM_TIME: {
 		kvmclock_reset(vcpu);
@@ -1647,8 +1693,10 @@ int kvm_set_msr_common(struct kvm_vcpu *
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
 		/* we verify if the enable bit is set... */
-		if (!(data & 1))
+		if (!(data & 1)) {
+			kvmclock_uspace_reset(vcpu);
 			break;
+		}
 
 		/* ...but clean it before doing the actual write */
 		vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
@@ -1656,8 +1704,10 @@ int kvm_set_msr_common(struct kvm_vcpu *
 		vcpu->arch.time_page =
 				gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
 
-		if (is_error_page(vcpu->arch.time_page))
+		if (is_error_page(vcpu->arch.time_page)) {
 			vcpu->arch.time_page = NULL;
+			kvmclock_uspace_reset(vcpu);
+		}
 
 		break;
 	}
@@ -2010,6 +2060,9 @@ int kvm_get_msr_common(struct kvm_vcpu *
 	case MSR_KVM_SYSTEM_TIME_NEW:
 		data = vcpu->arch.time;
 		break;
+	case MSR_KVM_USERSPACE_TIME:
+		data = vcpu->arch.uspace_time;
+		break;
 	case MSR_KVM_ASYNC_PF_EN:
 		data = vcpu->arch.apf.msr_val;
 		break;
@@ -2195,6 +2248,7 @@ int kvm_dev_ioctl_check_extension(long e
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_READONLY_MEM:
 	case KVM_CAP_IRQFD_RESAMPLE:
+	case KVM_CAP_USERSPACE_CLOCKSOURCE:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -6017,6 +6071,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kvmclock_uspace_reset(vcpu);
 	kvmclock_reset(vcpu);
 
 	free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
Index: vsyscall/arch/x86/kvm/cpuid.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/cpuid.c
+++ vsyscall/arch/x86/kvm/cpuid.c
@@ -411,7 +411,9 @@ static int do_cpuid_ent(struct kvm_cpuid
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
 			     (1 << KVM_FEATURE_PV_EOI) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_USERSPACE_CLOCKSOURCE);
+
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
Index: vsyscall/include/uapi/linux/kvm.h
===================================================================
--- vsyscall.orig/include/uapi/linux/kvm.h
+++ vsyscall/include/uapi/linux/kvm.h
@@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_READONLY_MEM 81
 #endif
 #define KVM_CAP_IRQFD_RESAMPLE 82
+#define KVM_CAP_USERSPACE_CLOCKSOURCE 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 13/15] KVM: x86: pass host_tsc to read_l1_tsc
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (11 preceding siblings ...)
  2012-10-16 17:56 ` [patch 12/15] KVM: x86: introduce facility to support vsyscall pvclock, via MSR Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 14/15] time: export time information for KVM pvclock Marcelo Tosatti
  2012-10-16 17:56 ` [patch 15/15] KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag Marcelo Tosatti
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 12-kvm-read-l1-tsc-pass-tscvalue --]
[-- Type: text/plain, Size: 3372 bytes --]

Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/include/asm/kvm_host.h
===================================================================
--- vsyscall.orig/arch/x86/include/asm/kvm_host.h
+++ vsyscall/arch/x86/include/asm/kvm_host.h
@@ -703,7 +703,7 @@ struct kvm_x86_ops {
 	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
 	u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
-	u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu);
+	u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
 
 	void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
 
Index: vsyscall/arch/x86/kvm/lapic.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/lapic.c
+++ vsyscall/arch/x86/kvm/lapic.c
@@ -1011,7 +1011,7 @@ static void start_apic_timer(struct kvm_
 		local_irq_save(flags);
 
 		now = apic->lapic_timer.timer.base->get_time();
-		guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu);
+		guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc());
 		if (likely(tscdeadline > guest_tsc)) {
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
Index: vsyscall/arch/x86/kvm/svm.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/svm.c
+++ vsyscall/arch/x86/kvm/svm.c
@@ -3008,11 +3008,11 @@ static int cr8_write_interception(struct
 	return 0;
 }
 
-u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu)
+u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 {
 	struct vmcb *vmcb = get_host_vmcb(to_svm(vcpu));
 	return vmcb->control.tsc_offset +
-		svm_scale_tsc(vcpu, native_read_tsc());
+		svm_scale_tsc(vcpu, host_tsc);
 }
 
 static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
Index: vsyscall/arch/x86/kvm/vmx.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/vmx.c
+++ vsyscall/arch/x86/kvm/vmx.c
@@ -1839,11 +1839,10 @@ static u64 guest_read_tsc(void)
  * Like guest_read_tsc, but always returns L1's notion of the timestamp
  * counter, even if a nested guest (L2) is currently running.
  */
-u64 vmx_read_l1_tsc(struct kvm_vcpu *vcpu)
+u64 vmx_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 {
-	u64 host_tsc, tsc_offset;
+	u64 tsc_offset;
 
-	rdtscll(host_tsc);
 	tsc_offset = is_guest_mode(vcpu) ?
 		to_vmx(vcpu)->nested.vmcs01_tsc_offset :
 		vmcs_read64(TSC_OFFSET);
Index: vsyscall/arch/x86/kvm/x86.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/x86.c
+++ vsyscall/arch/x86/kvm/x86.c
@@ -1175,7 +1175,7 @@ static int kvm_guest_time_update(struct 
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
-	tsc_timestamp = kvm_x86_ops->read_l1_tsc(v);
+	tsc_timestamp = kvm_x86_ops->read_l1_tsc(v, native_read_tsc());
 	kernel_ns = get_kernel_ns();
 	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
 	if (unlikely(this_tsc_khz == 0)) {
@@ -5429,7 +5429,8 @@ static int vcpu_enter_guest(struct kvm_v
 	if (hw_breakpoint_active())
 		hw_breakpoint_restore();
 
-	vcpu->arch.last_guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu);
+	vcpu->arch.last_guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu,
+							   native_read_tsc());
 
 	vcpu->mode = OUTSIDE_GUEST_MODE;
 	smp_wmb();



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 14/15] time: export time information for KVM pvclock
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (12 preceding siblings ...)
  2012-10-16 17:56 ` [patch 13/15] KVM: x86: pass host_tsc to read_l1_tsc Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  2012-10-16 17:56 ` [patch 15/15] KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag Marcelo Tosatti
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 13-time-add-pvclock-gtod-data --]
[-- Type: text/plain, Size: 2535 bytes --]

As suggested by John, export time data similarly to how its
by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/include/linux/pvclock_gtod.h
===================================================================
--- /dev/null
+++ vsyscall/include/linux/pvclock_gtod.h
@@ -0,0 +1,23 @@
+#ifndef _PVCLOCK_GTOD_H
+#define _PVCLOCK_GTOD_H
+
+#include <linux/clocksource.h>
+
+struct pvclock_gtod_data {
+	seqcount_t	seq;
+
+	struct { /* extract of a clocksource struct */
+		int vclock_mode;
+		cycle_t	cycle_last;
+		cycle_t	mask;
+		u32	mult;
+		u32	shift;
+	} clock;
+
+	/* open coded 'struct timespec' */
+	u64		monotonic_time_snsec;
+	time_t		monotonic_time_sec;
+};
+extern struct pvclock_gtod_data pvclock_gtod_data;
+
+#endif /* _PVCLOCK_GTOD_H */
Index: vsyscall/kernel/time/timekeeping.c
===================================================================
--- vsyscall.orig/kernel/time/timekeeping.c
+++ vsyscall/kernel/time/timekeeping.c
@@ -21,6 +21,7 @@
 #include <linux/time.h>
 #include <linux/tick.h>
 #include <linux/stop_machine.h>
+#include <linux/pvclock_gtod.h>
 
 
 static struct timekeeper timekeeper;
@@ -180,6 +181,37 @@ static inline s64 timekeeping_get_ns_raw
 	return nsec + arch_gettimeoffset();
 }
 
+struct pvclock_gtod_data pvclock_gtod_data;
+EXPORT_SYMBOL_GPL(pvclock_gtod_data);
+
+static void update_pvclock_gtod(struct timekeeper *tk)
+{
+	struct pvclock_gtod_data *vdata = &pvclock_gtod_data;
+
+	write_seqcount_begin(&vdata->seq);
+
+	/* copy vsyscall data */
+	vdata->clock.vclock_mode	= tk->clock->archdata.vclock_mode;
+	vdata->clock.cycle_last		= tk->clock->cycle_last;
+	vdata->clock.mask		= tk->clock->mask;
+	vdata->clock.mult		= tk->mult;
+	vdata->clock.shift		= tk->shift;
+
+	vdata->monotonic_time_sec	= tk->xtime_sec
+					+ tk->wall_to_monotonic.tv_sec;
+	vdata->monotonic_time_snsec	= tk->xtime_nsec
+					+ (tk->wall_to_monotonic.tv_nsec
+						<< tk->shift);
+	while (vdata->monotonic_time_snsec >=
+					(((u64)NSEC_PER_SEC) << tk->shift)) {
+		vdata->monotonic_time_snsec -=
+					((u64)NSEC_PER_SEC) << tk->shift;
+		vdata->monotonic_time_sec++;
+	}
+
+	write_seqcount_end(&vdata->seq);
+}
+
 /* must hold write on timekeeper.lock */
 static void timekeeping_update(struct timekeeper *tk, bool clearntp)
 {
@@ -188,6 +220,7 @@ static void timekeeping_update(struct ti
 		ntp_clear();
 	}
 	update_vsyscall(tk);
+	update_pvclock_gtod(tk);
 }
 
 /**



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 15/15] KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag
  2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
                   ` (13 preceding siblings ...)
  2012-10-16 17:56 ` [patch 14/15] time: export time information for KVM pvclock Marcelo Tosatti
@ 2012-10-16 17:56 ` Marcelo Tosatti
  14 siblings, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2012-10-16 17:56 UTC (permalink / raw)
  To: kvm; +Cc: johnstul, jeremy, Marcelo Tosatti

[-- Attachment #1: 14-host-pass-stable-pvclock-flag --]
[-- Type: text/plain, Size: 5697 bytes --]

KVM added a global variable to guarantee monotonicity in the guest. 
It is necessary because the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances 
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when 
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, then
this problem disappears. Monotonicity is then guaranteed by the
synchronicity of the host TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: vsyscall/arch/x86/kvm/x86.c
===================================================================
--- vsyscall.orig/arch/x86/kvm/x86.c
+++ vsyscall/arch/x86/kvm/x86.c
@@ -46,6 +46,7 @@
 #include <linux/uaccess.h>
 #include <linux/hash.h>
 #include <linux/pci.h>
+#include <linux/pvclock_gtod.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -1135,8 +1136,91 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
+static cycle_t read_tsc(void)
+{
+	cycle_t ret;
+	u64 last;
+
+	/*
+	 * Empirically, a fence (of type that depends on the CPU)
+	 * before rdtsc is enough to ensure that rdtsc is ordered
+	 * with respect to loads.  The various CPU manuals are unclear
+	 * as to whether rdtsc can be reordered with later loads,
+	 * but no one has ever seen it happen.
+	 */
+	rdtsc_barrier();
+	ret = (cycle_t)vget_cycles();
+
+	last = pvclock_gtod_data.clock.cycle_last;
+
+	if (likely(ret >= last))
+		return ret;
+
+	/*
+	 * GCC likes to generate cmov here, but this branch is extremely
+	 * predictable (it's just a funciton of time and the likely is
+	 * very likely) and there's a data dependence, so force GCC
+	 * to generate a branch instead.  I don't barrier() because
+	 * we don't actually need a barrier, and if this function
+	 * ever gets inlined it will generate worse code.
+	 */
+	asm volatile ("");
+	return last;
+}
+
+static inline u64 vgettsc(cycle_t *cycle_now)
+{
+	long v;
+	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
+
+	*cycle_now = read_tsc();
+
+	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
+	return v * gtod->clock.mult;
+}
+
+static int do_monotonic(struct timespec *ts, cycle_t *cycle_now)
+{
+	unsigned long seq;
+	u64 ns;
+	int mode;
+	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
+
+	ts->tv_nsec = 0;
+	do {
+		seq = read_seqcount_begin(&gtod->seq);
+		mode = gtod->clock.vclock_mode;
+		ts->tv_sec = gtod->monotonic_time_sec;
+		ns = gtod->monotonic_time_snsec;
+		ns += vgettsc(cycle_now);
+		ns >>= gtod->clock.shift;
+	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
+	timespec_add_ns(ts, ns);
+
+	return mode;
+}
+
+/* returns true if host is using tsc clocksource */
+static bool kvm_get_time_and_clockread(s64 *kernel_ns, cycle_t *cycle_now)
+{
+	struct timespec ts;
+
+	/* checked again under seqlock below */
+	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+		return false;
+
+	if (do_monotonic(&ts, cycle_now) != VCLOCK_TSC)
+		return false;
+
+	monotonic_to_bootbased(&ts);
+	*kernel_ns = timespec_to_ns(&ts);
+
+	return true;
+}
+
 static void kvm_write_pvtime(struct kvm_vcpu *v, struct page *page,
-			     unsigned int offset_in_page, gpa_t gpa)
+			     unsigned int offset_in_page, gpa_t gpa,
+			     bool host_tsc_clocksource)
 {
 	struct kvm_vcpu_arch *vcpu = &v->arch;
 	void *shared_kaddr;
@@ -1155,6 +1239,10 @@ static void kvm_write_pvtime(struct kvm_
 		vcpu->pvclock_set_guest_stopped_request = false;
 	}
 
+	/* If the host uses TSC clocksource, then it is stable */
+	if (host_tsc_clocksource)
+		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
+
 	vcpu->hv_clock.flags = pvclock_flags;
 
 	memcpy(shared_kaddr + offset_in_page, &vcpu->hv_clock,
@@ -1172,11 +1260,12 @@ static int kvm_guest_time_update(struct 
 	unsigned long this_tsc_khz;
 	s64 kernel_ns, max_kernel_ns;
 	u64 tsc_timestamp;
+	cycle_t cycle_now;
+	u64 host_tsc;
+	bool host_tsc_clocksource;
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
-	tsc_timestamp = kvm_x86_ops->read_l1_tsc(v, native_read_tsc());
-	kernel_ns = get_kernel_ns();
 	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
 	if (unlikely(this_tsc_khz == 0)) {
 		local_irq_restore(flags);
@@ -1185,6 +1274,20 @@ static int kvm_guest_time_update(struct 
 	}
 
 	/*
+ 	 * If the host uses TSC clock, then passthrough TSC as stable
+	 * to the guest.
+	 */
+	host_tsc_clocksource = kvm_get_time_and_clockread(&kernel_ns, &cycle_now);
+	if (host_tsc_clocksource)
+		host_tsc = cycle_now;
+	else {
+		host_tsc = native_read_tsc();
+		kernel_ns = get_kernel_ns();
+	}
+
+	tsc_timestamp = kvm_x86_ops->read_l1_tsc(v, host_tsc);
+
+	/*
 	 * We may have to catch up the TSC to match elapsed wall clock
 	 * time for two reasons, even if kvmclock is used.
 	 *   1) CPU could have been running below the maximum TSC rate
@@ -1262,10 +1365,12 @@ static int kvm_guest_time_update(struct 
 	 */
 	vcpu->hv_clock.version += 2;
 
- 	kvm_write_pvtime(v, vcpu->time_page, vcpu->time_offset, vcpu->time);
+ 	kvm_write_pvtime(v, vcpu->time_page, vcpu->time_offset, vcpu->time,
+			 host_tsc_clocksource);
  	if (vcpu->uspace_time_page)
  		kvm_write_pvtime(v, vcpu->uspace_time_page,
- 				 vcpu->uspace_time_offset, vcpu->uspace_time);
+ 				 vcpu->uspace_time_offset, vcpu->uspace_time,
+				 host_tsc_clocksource);
 
 	return 0;
 }



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-10-16 18:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-16 17:56 [patch 00/15] pvclock vsyscall support + KVM hypervisor support Marcelo Tosatti
2012-10-16 17:56 ` [patch 01/15] KVM: x86: retain pvclock guest stopped bit in guest memory Marcelo Tosatti
2012-10-16 17:56 ` [patch 02/15] x86: pvclock: make sure rdtsc doesnt speculate out of region Marcelo Tosatti
2012-10-16 17:56 ` [patch 03/15] x86: pvclock: remove pvclock_shadow_time Marcelo Tosatti
2012-10-16 17:56 ` [patch 04/15] x86: pvclock: create helper for pvclock data retrieval Marcelo Tosatti
2012-10-16 17:56 ` [patch 05/15] x86: pvclock: fix flags usage race Marcelo Tosatti
2012-10-16 17:56 ` [patch 06/15] x86: pvclock: introduce helper to read flags Marcelo Tosatti
2012-10-16 17:56 ` [patch 07/15] sched: add notifier for cross-cpu migrations Marcelo Tosatti
2012-10-16 17:56 ` [patch 08/15] x86: pvclock: generic pvclock vsyscall initialization Marcelo Tosatti
2012-10-16 17:56 ` [patch 09/15] x86: kvm guest: pvclock vsyscall support Marcelo Tosatti
2012-10-16 17:56 ` [patch 10/15] x86: vsyscall: pass mode to gettime backend Marcelo Tosatti
2012-10-16 17:56 ` [patch 11/15] x86: vdso: pvclock gettime support Marcelo Tosatti
2012-10-16 17:56 ` [patch 12/15] KVM: x86: introduce facility to support vsyscall pvclock, via MSR Marcelo Tosatti
2012-10-16 17:56 ` [patch 13/15] KVM: x86: pass host_tsc to read_l1_tsc Marcelo Tosatti
2012-10-16 17:56 ` [patch 14/15] time: export time information for KVM pvclock Marcelo Tosatti
2012-10-16 17:56 ` [patch 15/15] KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.