All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Pvclock fixes , version two
@ 2010-05-03 15:52 Glauber Costa
  2010-05-03 15:52 ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Glauber Costa
  2010-05-05  3:57 ` [PATCH v2 0/7] Pvclock fixes , version two Zachary Amsden
  0 siblings, 2 replies; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden

Here is a new version of this series.

I am definitely leaving any warp calculations out, as Jeremy wisely
points out that Chuck Norris should perish before we warp.

Also, in this series, I am using KVM_GET_SUPPORTED_CPUID to export
our features to userspace, as avi suggets. (patch 4/7), and starting
to use the flag that indicates possible tsc stability (patch 7/7), although
it is still off by default.

Glauber Costa (7):
  Enable pvclock flags in vcpu_time_info structure
  Add a global synchronization point for pvclock
  change msr numbers for kvmclock
  export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  Try using new kvm clock msrs
  don't compute pvclock adjustments if we trust the tsc
  Tell the guest we'll warn it about tsc stability

 arch/x86/include/asm/kvm_para.h    |   13 ++++++++
 arch/x86/include/asm/pvclock-abi.h |    4 ++-
 arch/x86/include/asm/pvclock.h     |    1 +
 arch/x86/kernel/kvmclock.c         |   56 ++++++++++++++++++++++-------------
 arch/x86/kernel/pvclock.c          |   37 +++++++++++++++++++++++
 arch/x86/kvm/x86.c                 |   37 +++++++++++++++++++++++-
 6 files changed, 125 insertions(+), 23 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure
  2010-05-03 15:52 [PATCH v2 0/7] Pvclock fixes , version two Glauber Costa
@ 2010-05-03 15:52 ` Glauber Costa
  2010-05-03 15:52   ` [PATCH v2 2/7] Add a global synchronization point for pvclock Glauber Costa
  2010-05-05  8:29   ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Avi Kivity
  2010-05-05  3:57 ` [PATCH v2 0/7] Pvclock fixes , version two Zachary Amsden
  1 sibling, 2 replies; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden, Jeremy Fitzhardinge

This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/include/asm/pvclock-abi.h |    3 ++-
 arch/x86/include/asm/pvclock.h     |    1 +
 arch/x86/kernel/pvclock.c          |    9 +++++++++
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
 	u64   system_time;
 	u32   tsc_to_system_mul;
 	s8    tsc_shift;
-	u8    pad[3];
+	u8    flags;
+	u8    pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
 struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..cd02f32 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
 			    struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..aa2262b 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
 	u32 tsc_to_nsec_mul;
 	int tsc_shift;
 	u32 version;
+	u8  flags;
 };
 
+static u8 valid_flags = 0;
+
+void pvclock_set_flags(u8 flags)
+{
+	valid_flags = flags;
+}
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
@@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
 		dst->system_timestamp  = src->system_time;
 		dst->tsc_to_nsec_mul   = src->tsc_to_system_mul;
 		dst->tsc_shift         = src->tsc_shift;
+		dst->flags             = src->flags;
 		rmb();		/* test version after fetching data */
 	} while ((src->version & 1) || (dst->version != src->version));
 
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/7] Add a global synchronization point for pvclock
  2010-05-03 15:52 ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Glauber Costa
@ 2010-05-03 15:52   ` Glauber Costa
  2010-05-03 15:52     ` [PATCH v2 3/7] change msr numbers for kvmclock Glauber Costa
  2010-05-05  8:29   ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Avi Kivity
  1 sibling, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden, Jeremy Fitzhardinge, Marcelo Tosatti

In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kernel/pvclock.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index aa2262b..3dff5fa 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
 	return pv_tsc_khz;
 }
 
+static atomic64_t last_value = ATOMIC64_INIT(0);
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
 	struct pvclock_shadow_time shadow;
 	unsigned version;
 	cycle_t ret, offset;
+	u64 last;
 
 	do {
 		version = pvclock_get_time_values(&shadow, src);
@@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 		barrier();
 	} while (version != src->version);
 
+	/*
+	 * Assumption here is that last_value, a global accumulator, always goes
+	 * forward. If we are less than that, we should not be much smaller.
+	 * We assume there is an error marging we're inside, and then the correction
+	 * does not sacrifice accuracy.
+	 *
+	 * For reads: global may have changed between test and return,
+	 * but this means someone else updated poked the clock at a later time.
+	 * We just need to make sure we are not seeing a backwards event.
+	 *
+	 * For updates: last_value = ret is not enough, since two vcpus could be
+	 * updating at the same time, and one of them could be slightly behind,
+	 * making the assumption that last_value always go forward fail to hold.
+	 */
+	last = atomic64_read(&last_value);
+	do {
+		if (ret < last)
+			return last;
+		last = atomic64_cmpxchg(&last_value, last, ret);
+	} while (unlikely(last != ret));
+
 	return ret;
 }
 
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/7] change msr numbers for kvmclock
  2010-05-03 15:52   ` [PATCH v2 2/7] Add a global synchronization point for pvclock Glauber Costa
@ 2010-05-03 15:52     ` Glauber Costa
  2010-05-03 15:52       ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
  0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden

Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    4 ++++
 arch/x86/kvm/x86.c              |    7 ++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..9734808 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,10 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
+#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+
 #define KVM_MAX_MMU_OP_BATCH           32
 
 /* Operations for KVM_HC_MMU_OP */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..eb84947 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -664,9 +664,10 @@ static inline u32 bit(int bitno)
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	5
+#define KVM_SAVE_MSRS_BEGIN	7
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1192,10 +1193,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	case MSR_IA32_MISC_ENABLE:
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
+	case MSR_KVM_WALL_CLOCK_NEW:
 	case MSR_KVM_WALL_CLOCK:
 		vcpu->kvm->arch.wall_clock = data;
 		kvm_write_wall_clock(vcpu->kvm, data);
 		break;
+	case MSR_KVM_SYSTEM_TIME_NEW:
 	case MSR_KVM_SYSTEM_TIME: {
 		if (vcpu->arch.time_page) {
 			kvm_release_page_dirty(vcpu->arch.time_page);
@@ -1467,9 +1470,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 		data = vcpu->arch.efer;
 		break;
 	case MSR_KVM_WALL_CLOCK:
+	case MSR_KVM_WALL_CLOCK_NEW:
 		data = vcpu->kvm->arch.wall_clock;
 		break;
 	case MSR_KVM_SYSTEM_TIME:
+	case MSR_KVM_SYSTEM_TIME_NEW:
 		data = vcpu->arch.time;
 		break;
 	case MSR_IA32_P5_MC_ADDR:
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  2010-05-03 15:52     ` [PATCH v2 3/7] change msr numbers for kvmclock Glauber Costa
@ 2010-05-03 15:52       ` Glauber Costa
  2010-05-03 15:52         ` [PATCH v2 5/7] Try using new kvm clock msrs Glauber Costa
  2010-05-05  8:23         ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Avi Kivity
  0 siblings, 2 replies; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden

Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    4 ++++
 arch/x86/kvm/x86.c              |   27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE2        3
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb84947..8a7cdda 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1971,6 +1971,20 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		}
 		break;
 	}
+	case 0x40000000: {
+		char signature[] = "KVMKVMKVM";
+		u32 *sigptr = (u32 *)signature;
+		entry->eax = 1;
+		entry->ebx = sigptr[0];
+		entry->ecx = sigptr[1];
+		entry->edx = sigptr[2];
+		break;
+	}
+	case 0x40000001:
+		entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
+		(1 << KVM_FEATURE_NOP_IO_DELAY) |
+		(1 << KVM_FEATURE_CLOCKSOURCE2);
+		break;
 	case 0x80000000:
 		entry->eax = min(entry->eax, 0x8000001a);
 		break;
@@ -2017,6 +2031,19 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
 	for (func = 0x80000001; func <= limit && nent < cpuid->nent; ++func)
 		do_cpuid_ent(&cpuid_entries[nent], func, 0,
 			     &nent, cpuid->nent);
+
+	
+
+	r = -E2BIG;
+	if (nent >= cpuid->nent)
+		goto out_free;
+
+	do_cpuid_ent(&cpuid_entries[nent], 0x40000000, 0, &nent, cpuid->nent);
+	limit = cpuid_entries[nent - 1].eax;
+	for (func = 0x40000001; func <= limit && nent < cpuid->nent; ++func)
+		do_cpuid_ent(&cpuid_entries[nent], func, 0,
+			     &nent, cpuid->nent);
+
 	r = -E2BIG;
 	if (nent >= cpuid->nent)
 		goto out_free;
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/7] Try using new kvm clock msrs
  2010-05-03 15:52       ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
@ 2010-05-03 15:52         ` Glauber Costa
  2010-05-03 15:52           ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Glauber Costa
  2010-05-05  8:23         ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Avi Kivity
  1 sibling, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden

We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/kernel/kvmclock.c |   53 ++++++++++++++++++++++++++-----------------
 1 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..59c740f 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,8 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -54,7 +56,8 @@ static unsigned long kvm_get_wallclock(void)
 
 	low = (int)__pa_symbol(&wall_clock);
 	high = ((u64)__pa_symbol(&wall_clock) >> 32);
-	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
+
+	native_write_msr(msr_kvm_wall_clock, low, high);
 
 	vcpu_time = &get_cpu_var(hv_clock);
 	pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
@@ -130,7 +133,8 @@ static int kvm_register_clock(char *txt)
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
 	       cpu, high, low, txt);
-	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+	return native_write_msr_safe(msr_kvm_system_time, low, high);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -165,14 +169,14 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #ifdef CONFIG_KEXEC
 static void kvm_crash_shutdown(struct pt_regs *regs)
 {
-	native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+	native_write_msr(msr_kvm_system_time, 0, 0);
 	native_machine_crash_shutdown(regs);
 }
 #endif
 
 static void kvm_shutdown(void)
 {
-	native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+	native_write_msr(msr_kvm_system_time, 0, 0);
 	native_machine_shutdown();
 }
 
@@ -181,27 +185,34 @@ void __init kvmclock_init(void)
 	if (!kvm_para_available())
 		return;
 
-	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
-		if (kvm_register_clock("boot clock"))
-			return;
-		pv_time_ops.sched_clock = kvm_clock_read;
-		x86_platform.calibrate_tsc = kvm_get_tsc_khz;
-		x86_platform.get_wallclock = kvm_get_wallclock;
-		x86_platform.set_wallclock = kvm_set_wallclock;
+	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
+		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
+	} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+		return;
+
+	printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
+		msr_kvm_system_time, msr_kvm_wall_clock);
+
+	if (kvm_register_clock("boot clock"))
+		return;
+	pv_time_ops.sched_clock = kvm_clock_read;
+	x86_platform.calibrate_tsc = kvm_get_tsc_khz;
+	x86_platform.get_wallclock = kvm_get_wallclock;
+	x86_platform.set_wallclock = kvm_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-		x86_cpuinit.setup_percpu_clockev =
-			kvm_setup_secondary_clock;
+	x86_cpuinit.setup_percpu_clockev =
+		kvm_setup_secondary_clock;
 #endif
 #ifdef CONFIG_SMP
-		smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 #endif
-		machine_ops.shutdown  = kvm_shutdown;
+	machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
-		machine_ops.crash_shutdown  = kvm_crash_shutdown;
+	machine_ops.crash_shutdown  = kvm_crash_shutdown;
 #endif
-		kvm_get_preset_lpj();
-		clocksource_register(&kvm_clock);
-		pv_info.paravirt_enabled = 1;
-		pv_info.name = "KVM";
-	}
+	kvm_get_preset_lpj();
+	clocksource_register(&kvm_clock);
+	pv_info.paravirt_enabled = 1;
+	pv_info.name = "KVM";
 }
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc
  2010-05-03 15:52         ` [PATCH v2 5/7] Try using new kvm clock msrs Glauber Costa
@ 2010-05-03 15:52           ` Glauber Costa
  2010-05-03 15:52             ` [PATCH v2 7/7] Tell the guest we'll warn it about tsc stability Glauber Costa
  2010-05-05  8:28             ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Avi Kivity
  0 siblings, 2 replies; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden

If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h    |    5 +++++
 arch/x86/include/asm/pvclock-abi.h |    1 +
 arch/x86/kernel/kvmclock.c         |    3 +++
 arch/x86/kernel/pvclock.c          |    4 ++++
 4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..6f1b878 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 
+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC	24
+
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index ec5c41a..b123bd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -39,5 +39,6 @@ struct pvclock_wall_clock {
 	u32   nsec;
 } __attribute__((__packed__));
 
+#define PVCLOCK_STABLE_TSC	(1 << 0)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 59c740f..518de01 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -215,4 +215,7 @@ void __init kvmclock_init(void)
 	clocksource_register(&kvm_clock);
 	pv_info.paravirt_enabled = 1;
 	pv_info.name = "KVM";
+
+	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_TSC))
+		pvclock_set_flags(PVCLOCK_STABLE_TSC);
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 3dff5fa..0d4fe0c 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 		barrier();
 	} while (version != src->version);
 
+	if ((valid_flags & PVCLOCK_STABLE_TSC) &&
+		(shadow.flags & PVCLOCK_STABLE_TSC))
+		return ret;
+
 	/*
 	 * Assumption here is that last_value, a global accumulator, always goes
 	 * forward. If we are less than that, we should not be much smaller.
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 7/7] Tell the guest we'll warn it about tsc stability
  2010-05-03 15:52           ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Glauber Costa
@ 2010-05-03 15:52             ` Glauber Costa
  2010-05-05  8:28             ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Avi Kivity
  1 sibling, 0 replies; 12+ messages in thread
From: Glauber Costa @ 2010-05-03 15:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, zamsden

This patch puts up the flag that tells the guest that we'll warn it
about the tsc being trustworthy or not. By now, we also say
it is not.
---
 arch/x86/kvm/x86.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8a7cdda..63a2acc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -855,6 +855,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
 	vcpu->hv_clock.system_time = ts.tv_nsec +
 				     (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
 
+	vcpu->hv_clock.flags = 0;
+
 	/*
 	 * The interface expects us to write an even number signaling that the
 	 * update is finished. Since the guest won't see the intermediate
@@ -1983,7 +1985,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	case 0x40000001:
 		entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
 		(1 << KVM_FEATURE_NOP_IO_DELAY) |
-		(1 << KVM_FEATURE_CLOCKSOURCE2);
+		(1 << KVM_FEATURE_CLOCKSOURCE2) |
+		(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_TSC);
 		break;
 	case 0x80000000:
 		entry->eax = min(entry->eax, 0x8000001a);
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] Pvclock fixes , version two
  2010-05-03 15:52 [PATCH v2 0/7] Pvclock fixes , version two Glauber Costa
  2010-05-03 15:52 ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Glauber Costa
@ 2010-05-05  3:57 ` Zachary Amsden
  1 sibling, 0 replies; 12+ messages in thread
From: Zachary Amsden @ 2010-05-05  3:57 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

On 05/03/2010 05:52 AM, Glauber Costa wrote:
> Here is a new version of this series.
>
> I am definitely leaving any warp calculations out, as Jeremy wisely
> points out that Chuck Norris should perish before we warp.
>
> Also, in this series, I am using KVM_GET_SUPPORTED_CPUID to export
> our features to userspace, as avi suggets. (patch 4/7), and starting
> to use the flag that indicates possible tsc stability (patch 7/7), although
> it is still off by default.
>
> Glauber Costa (7):
>    Enable pvclock flags in vcpu_time_info structure
>    Add a global synchronization point for pvclock
>    change msr numbers for kvmclock
>    export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
>    Try using new kvm clock msrs
>    don't compute pvclock adjustments if we trust the tsc
>    Tell the guest we'll warn it about tsc stability
>
>   arch/x86/include/asm/kvm_para.h    |   13 ++++++++
>   arch/x86/include/asm/pvclock-abi.h |    4 ++-
>   arch/x86/include/asm/pvclock.h     |    1 +
>   arch/x86/kernel/kvmclock.c         |   56 ++++++++++++++++++++++-------------
>   arch/x86/kernel/pvclock.c          |   37 +++++++++++++++++++++++
>   arch/x86/kvm/x86.c                 |   37 +++++++++++++++++++++++-
>   6 files changed, 125 insertions(+), 23 deletions(-)
>
>    
Acked-by: Zachary Amsden <zamsden@redhat.com>

I'll rebase on top of these if they get pulled.

And obviously, there is no point in checking for warp if Chuck Norris 
has perished.  I happen to have a special place in my heart for nunchaku 
masters despite the fact that this created a generation of teenagers who 
insisted they were called numb-chucks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  2010-05-03 15:52       ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
  2010-05-03 15:52         ` [PATCH v2 5/7] Try using new kvm clock msrs Glauber Costa
@ 2010-05-05  8:23         ` Avi Kivity
  1 sibling, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-05-05  8:23 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, zamsden

On 05/03/2010 06:52 PM, Glauber Costa wrote:
> Right now, we were using individual KVM_CAP entities to communicate
> userspace about which cpuids we support. This is suboptimal, since it
> generates a delay between the feature arriving in the host, and
> being available at the guest.
>
> A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
> This makes userspace automatically aware of what we provide. And if we
> ever add a new cpuid bit in the future, we have to do that again,
> which create some complexity and delay in feature adoption.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
> ---
>   arch/x86/include/asm/kvm_para.h |    4 ++++
>   arch/x86/kvm/x86.c              |   27 +++++++++++++++++++++++++++
>   2 files changed, 31 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9734808..f019f8c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,6 +16,10 @@
>   #define KVM_FEATURE_CLOCKSOURCE		0
>   #define KVM_FEATURE_NOP_IO_DELAY	1
>   #define KVM_FEATURE_MMU_OP		2
> +/* This indicates that the new set of kvmclock msrs
> + * are available. The use of 0x11 and 0x12 is deprecated
> + */
> +#define KVM_FEATURE_CLOCKSOURCE2        3
>    

Separate patch.

>
>   #define MSR_KVM_WALL_CLOCK  0x11
>   #define MSR_KVM_SYSTEM_TIME 0x12
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb84947..8a7cdda 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1971,6 +1971,20 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>   		}
>   		break;
>   	}
> +	case 0x40000000: {
>    

Use symbolic name, please.

> +		char signature[] = "KVMKVMKVM";
> +		u32 *sigptr = (u32 *)signature;
> +		entry->eax = 1;
>    

Where did this come from?

> +		entry->ebx = sigptr[0];
> +		entry->ecx = sigptr[1];
> +		entry->edx = sigptr[2];
>    

Overflow, you're reading 12 bytes from a 10-byte variable.

> +		break;
> +	}
> +	case 0x40000001:
> +		entry->eax = (1<<  KVM_FEATURE_CLOCKSOURCE) |
> +		(1<<  KVM_FEATURE_NOP_IO_DELAY) |
> +		(1<<  KVM_FEATURE_CLOCKSOURCE2);
>    

Indentation...

Also, have to initialize all fields, since the real cpu won't initialize 
them for you.

Sidenote: the real cpu may be a kvm vcpu, so it may in fact support 
those features.

> +		break;
>   	case 0x80000000:
>   		entry->eax = min(entry->eax, 0x8000001a);
>   		break;
> @@ -2017,6 +2031,19 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
>   	for (func = 0x80000001; func<= limit&&  nent<  cpuid->nent; ++func)
>   		do_cpuid_ent(&cpuid_entries[nent], func, 0,
>   			&nent, cpuid->nent);
> +
> +	
> +
> +	r = -E2BIG;
> +	if (nent>= cpuid->nent)
> +		goto out_free;
> +
> +	do_cpuid_ent(&cpuid_entries[nent], 0x40000000, 0,&nent, cpuid->nent);
> +	limit = cpuid_entries[nent - 1].eax;
>    

The kvm cpuid does not follow the limit thing.

> +	for (func = 0x40000001; func<= limit&&  nent<  cpuid->nent; ++func)
> +		do_cpuid_ent(&cpuid_entries[nent], func, 0,
> +			&nent, cpuid->nent);
> +
>   	r = -E2BIG;
>    

To avoid confusion, please write Documentation/kvm/cpuid.txt based on 
the current qemu-kvm code, and implement this patch according to the 
documentation.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc
  2010-05-03 15:52           ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Glauber Costa
  2010-05-03 15:52             ` [PATCH v2 7/7] Tell the guest we'll warn it about tsc stability Glauber Costa
@ 2010-05-05  8:28             ` Avi Kivity
  1 sibling, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-05-05  8:28 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, zamsden

On 05/03/2010 06:52 PM, Glauber Costa wrote:
> If the HV told us we can fully trust the TSC, skip any
> correction
>
>    

> Signed-off-by: Glauber Costa<glommer@redhat.com>
> ---
>   arch/x86/include/asm/kvm_para.h    |    5 +++++
>   arch/x86/include/asm/pvclock-abi.h |    1 +
>   arch/x86/kernel/kvmclock.c         |    3 +++
>   arch/x86/kernel/pvclock.c          |    4 ++++
>   4 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index f019f8c..6f1b878 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -21,6 +21,11 @@
>    */
>   #define KVM_FEATURE_CLOCKSOURCE2        3
>
> +/* The last 8 bits are used to indicate how to interpret the flags field
> + * in pvclock structure. If no bits are set, all flags are ignored.
> + */
> +#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC	24
>    

This needs documentation (in cpuid.txt).  The flag doesn't mean the TSC 
is stable, rather it means the pvclock tsc stable bit is valid.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure
  2010-05-03 15:52 ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Glauber Costa
  2010-05-03 15:52   ` [PATCH v2 2/7] Add a global synchronization point for pvclock Glauber Costa
@ 2010-05-05  8:29   ` Avi Kivity
  1 sibling, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-05-05  8:29 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, zamsden, Jeremy Fitzhardinge

On 05/03/2010 06:52 PM, Glauber Costa wrote:
> This patch removes one padding byte and transform it into a flags
> field. New versions of guests using pvclock will query these flags
> upon each read.
>
> Flags, however, will only be interpreted when the guest decides to.
> It uses the pvclock_valid_flags function to signal that a specific
> set of flags should be taken into consideration. Which flags are valid
> are usually devised via HV negotiation.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
> CC: Jeremy Fitzhardinge<jeremy@goop.org>
> ---
>   arch/x86/include/asm/pvclock-abi.h |    3 ++-
>   arch/x86/include/asm/pvclock.h     |    1 +
>   arch/x86/kernel/pvclock.c          |    9 +++++++++
>   3 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
> index 6d93508..ec5c41a 100644
> --- a/arch/x86/include/asm/pvclock-abi.h
> +++ b/arch/x86/include/asm/pvclock-abi.h
> @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
>   	u64   system_time;
>   	u32   tsc_to_system_mul;
>   	s8    tsc_shift;
> -	u8    pad[3];
> +	u8    flags;
> +	u8    pad[2];
>   } __attribute__((__packed__)); /* 32 bytes */
>
>   struct pvclock_wall_clock {
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 53235fd..cd02f32 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -6,6 +6,7 @@
>
>   /* some helper functions for xen and kvm pv clock sources */
>   cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
> +void pvclock_set_flags(u8 flags);
>   unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
>   void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
>   			    struct pvclock_vcpu_time_info *vcpu,
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 03801f2..aa2262b 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -31,8 +31,16 @@ struct pvclock_shadow_time {
>   	u32 tsc_to_nsec_mul;
>   	int tsc_shift;
>   	u32 version;
> +	u8  flags;
>   };
>
> +static u8 valid_flags = 0;
> +
>    

Minor optimization: __read_mostly.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-05-05  8:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-03 15:52 [PATCH v2 0/7] Pvclock fixes , version two Glauber Costa
2010-05-03 15:52 ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Glauber Costa
2010-05-03 15:52   ` [PATCH v2 2/7] Add a global synchronization point for pvclock Glauber Costa
2010-05-03 15:52     ` [PATCH v2 3/7] change msr numbers for kvmclock Glauber Costa
2010-05-03 15:52       ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
2010-05-03 15:52         ` [PATCH v2 5/7] Try using new kvm clock msrs Glauber Costa
2010-05-03 15:52           ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Glauber Costa
2010-05-03 15:52             ` [PATCH v2 7/7] Tell the guest we'll warn it about tsc stability Glauber Costa
2010-05-05  8:28             ` [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc Avi Kivity
2010-05-05  8:23         ` [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Avi Kivity
2010-05-05  8:29   ` [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure Avi Kivity
2010-05-05  3:57 ` [PATCH v2 0/7] Pvclock fixes , version two Zachary Amsden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.