All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] pvclock fixes
@ 2010-04-26 17:46 Glauber Costa
  2010-04-26 17:46 ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Glauber Costa
  2010-04-27  2:21 ` [PATCH 0/6] pvclock fixes Zachary Amsden
  0 siblings, 2 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

Hi,

This is the last series I've sent, with comments from you merged.
The first 5 patches are the same, only with the suggested fixes.
I am leaving documentation out, since the basics won't change, and
we're still discussing the details.

Patch 6 is new, and is the guest side of the skipping updates
avi asked for. I haven't yet done any HV work on this
(specially because I am not convinced exactly where it is safe to
do).

Let me know what you think.

Thanks

Glauber Costa (6):
  Enable pvclock flags in vcpu_time_info structure
  Add a global synchronization point for pvclock
  change msr numbers for kvmclock
  export new cpuid KVM_CAP
  Try using new kvm clock msrs
  don't compute pvclock adjustments if we trust the tsc

 arch/x86/include/asm/kvm_para.h    |   13 ++++++++
 arch/x86/include/asm/pvclock-abi.h |    4 ++-
 arch/x86/include/asm/pvclock.h     |    1 +
 arch/x86/kernel/kvmclock.c         |   59 +++++++++++++++++++++++-------------
 arch/x86/kernel/pvclock.c          |   37 ++++++++++++++++++++++
 arch/x86/kvm/x86.c                 |   13 +++++++-
 include/linux/kvm.h                |    1 +
 7 files changed, 105 insertions(+), 23 deletions(-)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
  2010-04-26 17:46 [PATCH 0/6] pvclock fixes Glauber Costa
@ 2010-04-26 17:46 ` Glauber Costa
  2010-04-26 17:46   ` [PATCH 2/6] Add a global synchronization point for pvclock Glauber Costa
                     ` (2 more replies)
  2010-04-27  2:21 ` [PATCH 0/6] pvclock fixes Zachary Amsden
  1 sibling, 3 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, Jeremy Fitzhardinge

This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/include/asm/pvclock-abi.h |    3 ++-
 arch/x86/include/asm/pvclock.h     |    1 +
 arch/x86/kernel/pvclock.c          |    9 +++++++++
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
 	u64   system_time;
 	u32   tsc_to_system_mul;
 	s8    tsc_shift;
-	u8    pad[3];
+	u8    flags;
+	u8    pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
 struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..c50823f 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_valid_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
 			    struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..8f4af7b 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
 	u32 tsc_to_nsec_mul;
 	int tsc_shift;
 	u32 version;
+	u8  flags;
 };
 
+static u8 valid_flags = 0;
+
+void pvclock_valid_flags(u8 flags)
+{
+	valid_flags = flags;
+}
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
@@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
 		dst->system_timestamp  = src->system_time;
 		dst->tsc_to_nsec_mul   = src->tsc_to_system_mul;
 		dst->tsc_shift         = src->tsc_shift;
+		dst->flags             = src->flags;
 		rmb();		/* test version after fetching data */
 	} while ((src->version & 1) || (dst->version != src->version));
 
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] Add a global synchronization point for pvclock
  2010-04-26 17:46 ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Glauber Costa
@ 2010-04-26 17:46   ` Glauber Costa
  2010-04-26 17:46     ` [PATCH 3/6] change msr numbers for kvmclock Glauber Costa
  2010-04-27 13:28     ` [PATCH 2/6] Add a global synchronization point for pvclock Marcelo Tosatti
  2010-04-26 18:11   ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Jeremy Fitzhardinge
  2010-04-27 18:07   ` Avi Kivity
  2 siblings, 2 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, avi, Jeremy Fitzhardinge, Marcelo Tosatti, Zachary Amsden

In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kernel/pvclock.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 8f4af7b..6cf6dec 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
 	return pv_tsc_khz;
 }
 
+static atomic64_t last_value = ATOMIC64_INIT(0);
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
 	struct pvclock_shadow_time shadow;
 	unsigned version;
 	cycle_t ret, offset;
+	u64 last;
 
 	do {
 		version = pvclock_get_time_values(&shadow, src);
@@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 		barrier();
 	} while (version != src->version);
 
+	/*
+	 * Assumption here is that last_value, a global accumulator, always goes
+	 * forward. If we are less than that, we should not be much smaller.
+	 * We assume there is an error marging we're inside, and then the correction
+	 * does not sacrifice accuracy.
+	 *
+	 * For reads: global may have changed between test and return,
+	 * but this means someone else updated poked the clock at a later time.
+	 * We just need to make sure we are not seeing a backwards event.
+	 *
+	 * For updates: last_value = ret is not enough, since two vcpus could be
+	 * updating at the same time, and one of them could be slightly behind,
+	 * making the assumption that last_value always go forward fail to hold.
+	 */
+	last = atomic64_read(&last_value);
+	do {
+		if (ret < last)
+			return last;
+		last = atomic64_cmpxchg(&last_value, last, ret);
+	} while (unlikely(last != ret));
+
 	return ret;
 }
 
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] change msr numbers for kvmclock
  2010-04-26 17:46   ` [PATCH 2/6] Add a global synchronization point for pvclock Glauber Costa
@ 2010-04-26 17:46     ` Glauber Costa
  2010-04-26 17:46       ` [PATCH 4/6] export new cpuid KVM_CAP Glauber Costa
  2010-04-27 13:28     ` [PATCH 2/6] Add a global synchronization point for pvclock Marcelo Tosatti
  1 sibling, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    4 ++++
 arch/x86/kvm/x86.c              |    7 ++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..9734808 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,10 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
+#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+
 #define KVM_MAX_MMU_OP_BATCH           32
 
 /* Operations for KVM_HC_MMU_OP */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8824b73..a2ead7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -575,9 +575,10 @@ static inline u32 bit(int bitno)
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	5
+#define KVM_SAVE_MSRS_BEGIN	7
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1099,10 +1100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	case MSR_IA32_MISC_ENABLE:
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
+	case MSR_KVM_WALL_CLOCK_NEW:
 	case MSR_KVM_WALL_CLOCK:
 		vcpu->kvm->arch.wall_clock = data;
 		kvm_write_wall_clock(vcpu->kvm, data);
 		break;
+	case MSR_KVM_SYSTEM_TIME_NEW:
 	case MSR_KVM_SYSTEM_TIME: {
 		if (vcpu->arch.time_page) {
 			kvm_release_page_dirty(vcpu->arch.time_page);
@@ -1374,9 +1377,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 		data = vcpu->arch.efer;
 		break;
 	case MSR_KVM_WALL_CLOCK:
+	case MSR_KVM_WALL_CLOCK_NEW:
 		data = vcpu->kvm->arch.wall_clock;
 		break;
 	case MSR_KVM_SYSTEM_TIME:
+	case MSR_KVM_SYSTEM_TIME_NEW:
 		data = vcpu->arch.time;
 		break;
 	case MSR_IA32_P5_MC_ADDR:
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-26 17:46     ` [PATCH 3/6] change msr numbers for kvmclock Glauber Costa
@ 2010-04-26 17:46       ` Glauber Costa
  2010-04-26 17:46         ` [PATCH 5/6] Try using new kvm clock msrs Glauber Costa
                           ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

Since we're changing the msrs kvmclock uses, we have to communicate
that to the guest, through cpuid. We can add a new KVM_CAP to the
hypervisor, and then patch userspace to recognize it.

And if we ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Instead, what I'm proposing in this patch is a new capability, called
KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
currently supported by the hypervisor. If we ever want to add or remove
some feature, we only need to tweak into the HV, leaving userspace untouched.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    4 ++++
 arch/x86/kvm/x86.c              |    6 ++++++
 include/linux/kvm.h             |    1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE2        3
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2ead7f..04f04aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_MCE:
 		r = KVM_MAX_MCE_BANKS;
 		break;
+	case KVM_CAP_X86_CPUID_FEATURE_LIST:
+		r = (1 << KVM_FEATURE_CLOCKSOURCE) |
+		(1 << KVM_FEATURE_NOP_IO_DELAY) |
+		(1 << KVM_FEATURE_MMU_OP) |
+		(1 << KVM_FEATURE_CLOCKSOURCE2);
+		break;
 	default:
 		r = 0;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..1ce124f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -507,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_DEBUGREGS 50
 #endif
 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
+#define KVM_CAP_X86_CPUID_FEATURE_LIST 52
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] Try using new kvm clock msrs
  2010-04-26 17:46       ` [PATCH 4/6] export new cpuid KVM_CAP Glauber Costa
@ 2010-04-26 17:46         ` Glauber Costa
  2010-04-26 17:46           ` [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc Glauber Costa
  2010-04-27 13:35           ` [PATCH 5/6] Try using new kvm clock msrs Marcelo Tosatti
  2010-04-27 13:30         ` [PATCH 4/6] export new cpuid KVM_CAP Marcelo Tosatti
  2010-04-27 18:12         ` Avi Kivity
  2 siblings, 2 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/kernel/kvmclock.c |   56 +++++++++++++++++++++++++++----------------
 1 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..f2f6aee 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,8 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -41,6 +43,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
+
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -54,7 +57,8 @@ static unsigned long kvm_get_wallclock(void)
 
 	low = (int)__pa_symbol(&wall_clock);
 	high = ((u64)__pa_symbol(&wall_clock) >> 32);
-	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
+
+	native_write_msr_safe(msr_kvm_wall_clock, low, high);
 
 	vcpu_time = &get_cpu_var(hv_clock);
 	pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
@@ -130,7 +134,8 @@ static int kvm_register_clock(char *txt)
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
 	       cpu, high, low, txt);
-	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+	return native_write_msr_safe(msr_kvm_system_time, low, high);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -165,14 +170,15 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #ifdef CONFIG_KEXEC
 static void kvm_crash_shutdown(struct pt_regs *regs)
 {
-	native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+
+	native_write_msr(msr_kvm_system_time, 0, 0);
 	native_machine_crash_shutdown(regs);
 }
 #endif
 
 static void kvm_shutdown(void)
 {
-	native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+	native_write_msr(msr_kvm_system_time, 0, 0);
 	native_machine_shutdown();
 }
 
@@ -181,27 +187,35 @@ void __init kvmclock_init(void)
 	if (!kvm_para_available())
 		return;
 
-	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
-		if (kvm_register_clock("boot clock"))
-			return;
-		pv_time_ops.sched_clock = kvm_clock_read;
-		x86_platform.calibrate_tsc = kvm_get_tsc_khz;
-		x86_platform.get_wallclock = kvm_get_wallclock;
-		x86_platform.set_wallclock = kvm_set_wallclock;
+	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
+		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
+	}
+	else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+		return;
+
+	printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
+		msr_kvm_system_time, msr_kvm_wall_clock);
+
+	if (kvm_register_clock("boot clock"))
+		return;
+	pv_time_ops.sched_clock = kvm_clock_read;
+	x86_platform.calibrate_tsc = kvm_get_tsc_khz;
+	x86_platform.get_wallclock = kvm_get_wallclock;
+	x86_platform.set_wallclock = kvm_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-		x86_cpuinit.setup_percpu_clockev =
-			kvm_setup_secondary_clock;
+	x86_cpuinit.setup_percpu_clockev =
+		kvm_setup_secondary_clock;
 #endif
 #ifdef CONFIG_SMP
-		smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 #endif
-		machine_ops.shutdown  = kvm_shutdown;
+	machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
-		machine_ops.crash_shutdown  = kvm_crash_shutdown;
+	machine_ops.crash_shutdown  = kvm_crash_shutdown;
 #endif
-		kvm_get_preset_lpj();
-		clocksource_register(&kvm_clock);
-		pv_info.paravirt_enabled = 1;
-		pv_info.name = "KVM";
-	}
+	kvm_get_preset_lpj();
+	clocksource_register(&kvm_clock);
+	pv_info.paravirt_enabled = 1;
+	pv_info.name = "KVM";
 }
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc
  2010-04-26 17:46         ` [PATCH 5/6] Try using new kvm clock msrs Glauber Costa
@ 2010-04-26 17:46           ` Glauber Costa
  2010-04-27 13:40             ` Marcelo Tosatti
  2010-04-27 13:35           ` [PATCH 5/6] Try using new kvm clock msrs Marcelo Tosatti
  1 sibling, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 17:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h    |    5 +++++
 arch/x86/include/asm/pvclock-abi.h |    1 +
 arch/x86/kernel/kvmclock.c         |    3 +++
 arch/x86/kernel/pvclock.c          |    4 ++++
 4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..615ebb1 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 
+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC	0xf8
+
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index ec5c41a..b123bd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -39,5 +39,6 @@ struct pvclock_wall_clock {
 	u32   nsec;
 } __attribute__((__packed__));
 
+#define PVCLOCK_STABLE_TSC	(1 << 0)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index f2f6aee..aca2d3c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -218,4 +218,7 @@ void __init kvmclock_init(void)
 	clocksource_register(&kvm_clock);
 	pv_info.paravirt_enabled = 1;
 	pv_info.name = "KVM";
+
+	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_TSC))
+		pvclock_valid_flags(PVCLOCK_STABLE_TSC);
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 6cf6dec..43ae8d5 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 		barrier();
 	} while (version != src->version);
 
+	if ((valid_flags & PVCLOCK_STABLE_TSC) &&
+		(shadow->flags & PVCLOCK_STABLE_TSC))
+		return ret;
+
 	/*
 	 * Assumption here is that last_value, a global accumulator, always goes
 	 * forward. If we are less than that, we should not be much smaller.
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
  2010-04-26 17:46 ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Glauber Costa
  2010-04-26 17:46   ` [PATCH 2/6] Add a global synchronization point for pvclock Glauber Costa
@ 2010-04-26 18:11   ` Jeremy Fitzhardinge
  2010-04-26 18:45     ` Glauber Costa
  2010-04-27 18:07   ` Avi Kivity
  2 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2010-04-26 18:11 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

On 04/26/2010 10:46 AM, Glauber Costa wrote:
> This patch removes one padding byte and transform it into a flags
> field. New versions of guests using pvclock will query these flags
> upon each read.
>   

Is this necessary?  Why not just make the pvclock driver maintain a
local flag set, and have the HV backend call into it to update it.  Why
does it need to be part of the pvclock structure?

    J

> Flags, however, will only be interpreted when the guest decides to.
> It uses the pvclock_valid_flags function to signal that a specific
> set of flags should be taken into consideration. Which flags are valid
> are usually devised via HV negotiation.
>
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> CC: Jeremy Fitzhardinge <jeremy@goop.org>
> ---
>  arch/x86/include/asm/pvclock-abi.h |    3 ++-
>  arch/x86/include/asm/pvclock.h     |    1 +
>  arch/x86/kernel/pvclock.c          |    9 +++++++++
>  3 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
> index 6d93508..ec5c41a 100644
> --- a/arch/x86/include/asm/pvclock-abi.h
> +++ b/arch/x86/include/asm/pvclock-abi.h
> @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
>  	u64   system_time;
>  	u32   tsc_to_system_mul;
>  	s8    tsc_shift;
> -	u8    pad[3];
> +	u8    flags;
> +	u8    pad[2];
>  } __attribute__((__packed__)); /* 32 bytes */
>  
>  struct pvclock_wall_clock {
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 53235fd..c50823f 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -6,6 +6,7 @@
>  
>  /* some helper functions for xen and kvm pv clock sources */
>  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
> +void pvclock_valid_flags(u8 flags);
>  unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
>  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
>  			    struct pvclock_vcpu_time_info *vcpu,
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 03801f2..8f4af7b 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -31,8 +31,16 @@ struct pvclock_shadow_time {
>  	u32 tsc_to_nsec_mul;
>  	int tsc_shift;
>  	u32 version;
> +	u8  flags;
>  };
>  
> +static u8 valid_flags = 0;
> +
> +void pvclock_valid_flags(u8 flags)
> +{
> +	valid_flags = flags;
> +}
> +
>  /*
>   * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
>   * yielding a 64-bit result.
> @@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
>  		dst->system_timestamp  = src->system_time;
>  		dst->tsc_to_nsec_mul   = src->tsc_to_system_mul;
>  		dst->tsc_shift         = src->tsc_shift;
> +		dst->flags             = src->flags;
>  		rmb();		/* test version after fetching data */
>  	} while ((src->version & 1) || (dst->version != src->version));
>  
>   


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
  2010-04-26 18:11   ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Jeremy Fitzhardinge
@ 2010-04-26 18:45     ` Glauber Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-26 18:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm, linux-kernel, avi

On Mon, Apr 26, 2010 at 11:11:57AM -0700, Jeremy Fitzhardinge wrote:
> On 04/26/2010 10:46 AM, Glauber Costa wrote:
> > This patch removes one padding byte and transform it into a flags
> > field. New versions of guests using pvclock will query these flags
> > upon each read.
> >   
> 
> Is this necessary?  Why not just make the pvclock driver maintain a
> local flag set, and have the HV backend call into it to update it.  Why
> does it need to be part of the pvclock structure?
Because it is already there, and we have plenty of space left?

There are obvious other ways, but I don't see any of them being simpler.
If we go by the method you suggested, we'd have, for instance, to register
the memory area where this flags lives. Which is a duplication of the
infrastructure already present in kvmclock.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/6] pvclock fixes
  2010-04-26 17:46 [PATCH 0/6] pvclock fixes Glauber Costa
  2010-04-26 17:46 ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Glauber Costa
@ 2010-04-27  2:21 ` Zachary Amsden
  1 sibling, 0 replies; 25+ messages in thread
From: Zachary Amsden @ 2010-04-27  2:21 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

On 04/26/2010 07:46 AM, Glauber Costa wrote:
> Hi,
>
> This is the last series I've sent, with comments from you merged.
> The first 5 patches are the same, only with the suggested fixes.
> I am leaving documentation out, since the basics won't change, and
> we're still discussing the details.
>
> Patch 6 is new, and is the guest side of the skipping updates
> avi asked for. I haven't yet done any HV work on this
> (specially because I am not convinced exactly where it is safe to
> do).
>
> Let me know what you think.
>    

I'm rebasing my patches on top of this series to address the host side 
issues.  I noticed a couple issues patching against Avi's tree, not sure 
if those issues are also present in trunk.

Can you keep me CC'd on any updates to this series?

Thanks,

Zach

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] Add a global synchronization point for pvclock
  2010-04-26 17:46   ` [PATCH 2/6] Add a global synchronization point for pvclock Glauber Costa
  2010-04-26 17:46     ` [PATCH 3/6] change msr numbers for kvmclock Glauber Costa
@ 2010-04-27 13:28     ` Marcelo Tosatti
  2010-04-27 18:00       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 25+ messages in thread
From: Marcelo Tosatti @ 2010-04-27 13:28 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi, Jeremy Fitzhardinge, Zachary Amsden

On Mon, Apr 26, 2010 at 01:46:24PM -0400, Glauber Costa wrote:
> In recent stress tests, it was found that pvclock-based systems
> could seriously warp in smp systems. Using ingo's time-warp-test.c,
> I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
> (to be fair, it wasn't that bad in most of them). Investigating further, I
> found out that such warps were caused by the very offset-based calculation
> pvclock is based on.
> 
> This happens even on some machines that report constant_tsc in its tsc flags,
> specially on multi-socket ones.
> 
> Two reads of the same kernel timestamp at approx the same time, will likely
> have tsc timestamped in different occasions too. This means the delta we
> calculate is unpredictable at best, and can probably be smaller in a cpu
> that is legitimately reading clock in a forward ocasion.
> 
> Some adjustments on the host could make this window less likely to happen,
> but still, it pretty much poses as an intrinsic problem of the mechanism.
> 
> A while ago, I though about using a shared variable anyway, to hold clock
> last state, but gave up due to the high contention locking was likely
> to introduce, possibly rendering the thing useless on big machines. I argue,
> however, that locking is not necessary.
> 
> We do a read-and-return sequence in pvclock, and between read and return,
> the global value can have changed. However, it can only have changed
> by means of an addition of a positive value. So if we detected that our
> clock timestamp is less than the current global, we know that we need to
> return a higher one, even though it is not exactly the one we compared to.
> 
> OTOH, if we detect we're greater than the current time source, we atomically
> replace the value with our new readings. This do causes contention on big
> boxes (but big here means *BIG*), but it seems like a good trade off, since
> it provide us with a time source guaranteed to be stable wrt time warps.
> 
> After this patch is applied, I don't see a single warp in time during 5 days
> of execution, in any of the machines I saw them before.
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> CC: Jeremy Fitzhardinge <jeremy@goop.org>
> CC: Avi Kivity <avi@redhat.com>
> CC: Marcelo Tosatti <mtosatti@redhat.com>
> CC: Zachary Amsden <zamsden@redhat.com>
> ---
>  arch/x86/kernel/pvclock.c |   24 ++++++++++++++++++++++++
>  1 files changed, 24 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 8f4af7b..6cf6dec 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>  	return pv_tsc_khz;
>  }
>  
> +static atomic64_t last_value = ATOMIC64_INIT(0);
> +
>  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>  {
>  	struct pvclock_shadow_time shadow;
>  	unsigned version;
>  	cycle_t ret, offset;
> +	u64 last;
>  
>  	do {
>  		version = pvclock_get_time_values(&shadow, src);
> @@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>  		barrier();
>  	} while (version != src->version);
>  
> +	/*
> +	 * Assumption here is that last_value, a global accumulator, always goes
> +	 * forward. If we are less than that, we should not be much smaller.
> +	 * We assume there is an error marging we're inside, and then the correction
> +	 * does not sacrifice accuracy.
> +	 *
> +	 * For reads: global may have changed between test and return,
> +	 * but this means someone else updated poked the clock at a later time.
> +	 * We just need to make sure we are not seeing a backwards event.
> +	 *
> +	 * For updates: last_value = ret is not enough, since two vcpus could be
> +	 * updating at the same time, and one of them could be slightly behind,
> +	 * making the assumption that last_value always go forward fail to hold.
> +	 */
> +	last = atomic64_read(&last_value);
> +	do {
> +		if (ret < last)
> +			return last;
> +		last = atomic64_cmpxchg(&last_value, last, ret);
> +	} while (unlikely(last != ret));

Wraparound?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-26 17:46       ` [PATCH 4/6] export new cpuid KVM_CAP Glauber Costa
  2010-04-26 17:46         ` [PATCH 5/6] Try using new kvm clock msrs Glauber Costa
@ 2010-04-27 13:30         ` Marcelo Tosatti
  2010-04-27 15:09           ` Glauber Costa
  2010-04-27 16:55           ` Glauber Costa
  2010-04-27 18:12         ` Avi Kivity
  2 siblings, 2 replies; 25+ messages in thread
From: Marcelo Tosatti @ 2010-04-27 13:30 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

On Mon, Apr 26, 2010 at 01:46:26PM -0400, Glauber Costa wrote:
> Since we're changing the msrs kvmclock uses, we have to communicate
> that to the guest, through cpuid. We can add a new KVM_CAP to the
> hypervisor, and then patch userspace to recognize it.
> 
> And if we ever add a new cpuid bit in the future, we have to do that again,
> which create some complexity and delay in feature adoption.
> 
> Instead, what I'm proposing in this patch is a new capability, called
> KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
> currently supported by the hypervisor. If we ever want to add or remove
> some feature, we only need to tweak into the HV, leaving userspace untouched.
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  arch/x86/include/asm/kvm_para.h |    4 ++++
>  arch/x86/kvm/x86.c              |    6 ++++++
>  include/linux/kvm.h             |    1 +
>  3 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9734808..f019f8c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,6 +16,10 @@
>  #define KVM_FEATURE_CLOCKSOURCE		0
>  #define KVM_FEATURE_NOP_IO_DELAY	1
>  #define KVM_FEATURE_MMU_OP		2
> +/* This indicates that the new set of kvmclock msrs
> + * are available. The use of 0x11 and 0x12 is deprecated
> + */
> +#define KVM_FEATURE_CLOCKSOURCE2        3
>  
>  #define MSR_KVM_WALL_CLOCK  0x11
>  #define MSR_KVM_SYSTEM_TIME 0x12
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a2ead7f..04f04aa 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_MCE:
>  		r = KVM_MAX_MCE_BANKS;
>  		break;
> +	case KVM_CAP_X86_CPUID_FEATURE_LIST:
> +		r = (1 << KVM_FEATURE_CLOCKSOURCE) |
> +		(1 << KVM_FEATURE_NOP_IO_DELAY) |
> +		(1 << KVM_FEATURE_MMU_OP) |

Remove this flag, it has been deprecated.

> +		(1 << KVM_FEATURE_CLOCKSOURCE2);
> +		break;
>  	default:

Also missing qemu-kvm/qemu patches.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] Try using new kvm clock msrs
  2010-04-26 17:46         ` [PATCH 5/6] Try using new kvm clock msrs Glauber Costa
  2010-04-26 17:46           ` [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc Glauber Costa
@ 2010-04-27 13:35           ` Marcelo Tosatti
  1 sibling, 0 replies; 25+ messages in thread
From: Marcelo Tosatti @ 2010-04-27 13:35 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

On Mon, Apr 26, 2010 at 01:46:27PM -0400, Glauber Costa wrote:
> We now added a new set of clock-related msrs in replacement of the old
> ones. In theory, we could just try to use them and get a return value
> indicating they do not exist, due to our use of kvm_write_msr_save.
> 
> However, kvm clock registration happens very early, and if we ever
> try to write to a non-existant MSR, we raise a lethal #GP, since our
> idt handlers are not in place yet.
> 
> So this patch tests for a cpuid feature exported by the host to
> decide which set of msrs are supported.
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  arch/x86/kernel/kvmclock.c |   56 +++++++++++++++++++++++++++----------------
>  1 files changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index feaeb0d..f2f6aee 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -29,6 +29,8 @@
>  #define KVM_SCALE 22
>  
>  static int kvmclock = 1;
> +static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
> +static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
>  
>  static int parse_no_kvmclock(char *arg)
>  {
> @@ -41,6 +43,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
>  static struct pvclock_wall_clock wall_clock;
>  
> +

Extra newline.

>  /*
>   * The wallclock is the time of day when we booted. Since then, some time may
>   * have elapsed since the hypervisor wrote the data. So we try to account for
> @@ -54,7 +57,8 @@ static unsigned long kvm_get_wallclock(void)
>  
>  	low = (int)__pa_symbol(&wall_clock);
>  	high = ((u64)__pa_symbol(&wall_clock) >> 32);
> -	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
> +
> +	native_write_msr_safe(msr_kvm_wall_clock, low, high);
>  
>  	vcpu_time = &get_cpu_var(hv_clock);
>  	pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
> @@ -130,7 +134,8 @@ static int kvm_register_clock(char *txt)
>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
>  	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>  	       cpu, high, low, txt);
> -	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
> +
> +	return native_write_msr_safe(msr_kvm_system_time, low, high);
>  }
>  
>  #ifdef CONFIG_X86_LOCAL_APIC
> @@ -165,14 +170,15 @@ static void __init kvm_smp_prepare_boot_cpu(void)
>  #ifdef CONFIG_KEXEC
>  static void kvm_crash_shutdown(struct pt_regs *regs)
>  {
> -	native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
> +
> +	native_write_msr(msr_kvm_system_time, 0, 0);
>  	native_machine_crash_shutdown(regs);
>  }
>  #endif
>  
>  static void kvm_shutdown(void)
>  {
> -	native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
> +	native_write_msr(msr_kvm_system_time, 0, 0);
>  	native_machine_shutdown();
>  }
>  
> @@ -181,27 +187,35 @@ void __init kvmclock_init(void)
>  	if (!kvm_para_available())
>  		return;
>  
> -	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
> -		if (kvm_register_clock("boot clock"))
> -			return;
> -		pv_time_ops.sched_clock = kvm_clock_read;
> -		x86_platform.calibrate_tsc = kvm_get_tsc_khz;
> -		x86_platform.get_wallclock = kvm_get_wallclock;
> -		x86_platform.set_wallclock = kvm_set_wallclock;
> +	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
> +		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
> +		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
> +	}
> +	else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
> +		return;

Non conformant indentation.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc
  2010-04-26 17:46           ` [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc Glauber Costa
@ 2010-04-27 13:40             ` Marcelo Tosatti
  2010-04-27 15:11               ` Glauber Costa
  2010-04-27 18:03               ` Avi Kivity
  0 siblings, 2 replies; 25+ messages in thread
From: Marcelo Tosatti @ 2010-04-27 13:40 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

On Mon, Apr 26, 2010 at 01:46:28PM -0400, Glauber Costa wrote:
> If the HV told us we can fully trust the TSC, skip any
> correction
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  arch/x86/include/asm/kvm_para.h    |    5 +++++
>  arch/x86/include/asm/pvclock-abi.h |    1 +
>  arch/x86/kernel/kvmclock.c         |    3 +++
>  arch/x86/kernel/pvclock.c          |    4 ++++
>  4 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index f019f8c..615ebb1 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -21,6 +21,11 @@
>   */
>  #define KVM_FEATURE_CLOCKSOURCE2        3
>  
> +/* The last 8 bits are used to indicate how to interpret the flags field
> + * in pvclock structure. If no bits are set, all flags are ignored.
> + */
> +#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC	0xf8
> +
>  #define MSR_KVM_WALL_CLOCK  0x11
>  #define MSR_KVM_SYSTEM_TIME 0x12

Please drop this. Its not certain what is the best method to reduce
contention on the global variable. Whatever it is, can be done later.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-27 13:30         ` [PATCH 4/6] export new cpuid KVM_CAP Marcelo Tosatti
@ 2010-04-27 15:09           ` Glauber Costa
  2010-04-27 16:55           ` Glauber Costa
  1 sibling, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-27 15:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, linux-kernel, avi

On Tue, Apr 27, 2010 at 10:30:48AM -0300, Marcelo Tosatti wrote:
> On Mon, Apr 26, 2010 at 01:46:26PM -0400, Glauber Costa wrote:
> > Since we're changing the msrs kvmclock uses, we have to communicate
> > that to the guest, through cpuid. We can add a new KVM_CAP to the
> > hypervisor, and then patch userspace to recognize it.
> > 
> > And if we ever add a new cpuid bit in the future, we have to do that again,
> > which create some complexity and delay in feature adoption.
> > 
> > Instead, what I'm proposing in this patch is a new capability, called
> > KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
> > currently supported by the hypervisor. If we ever want to add or remove
> > some feature, we only need to tweak into the HV, leaving userspace untouched.
> > 
> > Signed-off-by: Glauber Costa <glommer@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_para.h |    4 ++++
> >  arch/x86/kvm/x86.c              |    6 ++++++
> >  include/linux/kvm.h             |    1 +
> >  3 files changed, 11 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index 9734808..f019f8c 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -16,6 +16,10 @@
> >  #define KVM_FEATURE_CLOCKSOURCE		0
> >  #define KVM_FEATURE_NOP_IO_DELAY	1
> >  #define KVM_FEATURE_MMU_OP		2
> > +/* This indicates that the new set of kvmclock msrs
> > + * are available. The use of 0x11 and 0x12 is deprecated
> > + */
> > +#define KVM_FEATURE_CLOCKSOURCE2        3
> >  
> >  #define MSR_KVM_WALL_CLOCK  0x11
> >  #define MSR_KVM_SYSTEM_TIME 0x12
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a2ead7f..04f04aa 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_MCE:
> >  		r = KVM_MAX_MCE_BANKS;
> >  		break;
> > +	case KVM_CAP_X86_CPUID_FEATURE_LIST:
> > +		r = (1 << KVM_FEATURE_CLOCKSOURCE) |
> > +		(1 << KVM_FEATURE_NOP_IO_DELAY) |
> > +		(1 << KVM_FEATURE_MMU_OP) |
> 
> Remove this flag, it has been deprecated.

The deprecation of this flag has nothing to do with this series.
Anyway, we can't pick its number, even if it is really deprecated,
since we can still have old hvs around, so, don't really see the point here

> 
> > +		(1 << KVM_FEATURE_CLOCKSOURCE2);
> > +		break;
> >  	default:
> 
> Also missing qemu-kvm/qemu patches.
> 
Of course it is missing. It is a totally separate story.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc
  2010-04-27 13:40             ` Marcelo Tosatti
@ 2010-04-27 15:11               ` Glauber Costa
  2010-04-27 18:03               ` Avi Kivity
  1 sibling, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-27 15:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, linux-kernel, avi

On Tue, Apr 27, 2010 at 10:40:13AM -0300, Marcelo Tosatti wrote:
> On Mon, Apr 26, 2010 at 01:46:28PM -0400, Glauber Costa wrote:
> > If the HV told us we can fully trust the TSC, skip any
> > correction
> > 
> > Signed-off-by: Glauber Costa <glommer@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_para.h    |    5 +++++
> >  arch/x86/include/asm/pvclock-abi.h |    1 +
> >  arch/x86/kernel/kvmclock.c         |    3 +++
> >  arch/x86/kernel/pvclock.c          |    4 ++++
> >  4 files changed, 13 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index f019f8c..615ebb1 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -21,6 +21,11 @@
> >   */
> >  #define KVM_FEATURE_CLOCKSOURCE2        3
> >  
> > +/* The last 8 bits are used to indicate how to interpret the flags field
> > + * in pvclock structure. If no bits are set, all flags are ignored.
> > + */
> > +#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC	0xf8
> > +
> >  #define MSR_KVM_WALL_CLOCK  0x11
> >  #define MSR_KVM_SYSTEM_TIME 0x12
> 
> Please drop this. Its not certain what is the best method to reduce
> contention on the global variable. Whatever it is, can be done later.
You tell me to drop it, avi tells me to write this, I am happy to pick
any one of the two options. Just don't want to get ping-ponged between
haveit-dropit.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-27 13:30         ` [PATCH 4/6] export new cpuid KVM_CAP Marcelo Tosatti
  2010-04-27 15:09           ` Glauber Costa
@ 2010-04-27 16:55           ` Glauber Costa
  1 sibling, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-27 16:55 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, linux-kernel, avi

On Tue, Apr 27, 2010 at 10:30:48AM -0300, Marcelo Tosatti wrote:
> On Mon, Apr 26, 2010 at 01:46:26PM -0400, Glauber Costa wrote:
> > Since we're changing the msrs kvmclock uses, we have to communicate
> > that to the guest, through cpuid. We can add a new KVM_CAP to the
> > hypervisor, and then patch userspace to recognize it.
> > 
> > And if we ever add a new cpuid bit in the future, we have to do that again,
> > which create some complexity and delay in feature adoption.
> > 
> > Instead, what I'm proposing in this patch is a new capability, called
> > KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
> > currently supported by the hypervisor. If we ever want to add or remove
> > some feature, we only need to tweak into the HV, leaving userspace untouched.
> > 
> > Signed-off-by: Glauber Costa <glommer@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_para.h |    4 ++++
> >  arch/x86/kvm/x86.c              |    6 ++++++
> >  include/linux/kvm.h             |    1 +
> >  3 files changed, 11 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index 9734808..f019f8c 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -16,6 +16,10 @@
> >  #define KVM_FEATURE_CLOCKSOURCE		0
> >  #define KVM_FEATURE_NOP_IO_DELAY	1
> >  #define KVM_FEATURE_MMU_OP		2
> > +/* This indicates that the new set of kvmclock msrs
> > + * are available. The use of 0x11 and 0x12 is deprecated
> > + */
> > +#define KVM_FEATURE_CLOCKSOURCE2        3
> >  
> >  #define MSR_KVM_WALL_CLOCK  0x11
> >  #define MSR_KVM_SYSTEM_TIME 0x12
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a2ead7f..04f04aa 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_MCE:
> >  		r = KVM_MAX_MCE_BANKS;
> >  		break;
> > +	case KVM_CAP_X86_CPUID_FEATURE_LIST:
> > +		r = (1 << KVM_FEATURE_CLOCKSOURCE) |
> > +		(1 << KVM_FEATURE_NOP_IO_DELAY) |
> > +		(1 << KVM_FEATURE_MMU_OP) |
> 
> Remove this flag, it has been deprecated.

Ok, sorry. My bad here.

at a quick glance, I did not realize you were commenting on this
patch specifically. KVM_CAP_X86_CPUID_FEATURE_LIST is a new
entity, so there is no need to return a deprecated feature.

I will remove that.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] Add a global synchronization point for pvclock
  2010-04-27 13:28     ` [PATCH 2/6] Add a global synchronization point for pvclock Marcelo Tosatti
@ 2010-04-27 18:00       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2010-04-27 18:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Glauber Costa, kvm, linux-kernel, avi, Zachary Amsden

On 04/27/2010 06:28 AM, Marcelo Tosatti wrote:
>> +	last = atomic64_read(&last_value);
>> +	do {
>> +		if (ret < last)
>> +			return last;
>> +		last = atomic64_cmpxchg(&last_value, last, ret);
>> +	} while (unlikely(last != ret));
>>     
> Wraparound?
>   

If the units nanoseconds, it's good for ~500,000 years of uptime.

    J


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc
  2010-04-27 13:40             ` Marcelo Tosatti
  2010-04-27 15:11               ` Glauber Costa
@ 2010-04-27 18:03               ` Avi Kivity
  2010-04-27 18:57                 ` Glauber Costa
  1 sibling, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-27 18:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Glauber Costa, kvm, linux-kernel

On 04/27/2010 04:40 PM, Marcelo Tosatti wrote:
> On Mon, Apr 26, 2010 at 01:46:28PM -0400, Glauber Costa wrote:
>    
>> If the HV told us we can fully trust the TSC, skip any
>> correction
>>
>>      
> Please drop this. Its not certain what is the best method to reduce
> contention on the global variable. Whatever it is, can be done later.
>
>    

The problem is lead time.  If we (or the cpu vendors) fix this in the 
hypervisor/processors in a year, then we want existing guests to take 
advantage of it immediately.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
  2010-04-26 17:46 ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Glauber Costa
  2010-04-26 17:46   ` [PATCH 2/6] Add a global synchronization point for pvclock Glauber Costa
  2010-04-26 18:11   ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Jeremy Fitzhardinge
@ 2010-04-27 18:07   ` Avi Kivity
  2010-04-27 19:09     ` Glauber Costa
  2 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-27 18:07 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, Jeremy Fitzhardinge

On 04/26/2010 08:46 PM, Glauber Costa wrote:
> This patch removes one padding byte and transform it into a flags
> field. New versions of guests using pvclock will query these flags
> upon each read.
>
> Flags, however, will only be interpreted when the guest decides to.
> It uses the pvclock_valid_flags function to signal that a specific
> set of flags should be taken into consideration. Which flags are valid
> are usually devised via HV negotiation.
>
>
> +void pvclock_valid_flags(u8 flags);
>    

set_pvclock_valid_flags() (or just set_pvclock_flags? 
pvclock_set_flags()?).  The original name looks like a getter.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-26 17:46       ` [PATCH 4/6] export new cpuid KVM_CAP Glauber Costa
  2010-04-26 17:46         ` [PATCH 5/6] Try using new kvm clock msrs Glauber Costa
  2010-04-27 13:30         ` [PATCH 4/6] export new cpuid KVM_CAP Marcelo Tosatti
@ 2010-04-27 18:12         ` Avi Kivity
  2010-04-27 19:09           ` Glauber Costa
  2 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-27 18:12 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel

On 04/26/2010 08:46 PM, Glauber Costa wrote:
> Since we're changing the msrs kvmclock uses, we have to communicate
> that to the guest, through cpuid. We can add a new KVM_CAP to the
> hypervisor, and then patch userspace to recognize it.
>
> And if we ever add a new cpuid bit in the future, we have to do that again,
> which create some complexity and delay in feature adoption.
>
> Instead, what I'm proposing in this patch is a new capability, called
> KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
> currently supported by the hypervisor. If we ever want to add or remove
> some feature, we only need to tweak into the HV, leaving userspace untouched.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
> ---
>   arch/x86/include/asm/kvm_para.h |    4 ++++
>   arch/x86/kvm/x86.c              |    6 ++++++
>   include/linux/kvm.h             |    1 +
>   3 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9734808..f019f8c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,6 +16,10 @@
>   #define KVM_FEATURE_CLOCKSOURCE		0
>   #define KVM_FEATURE_NOP_IO_DELAY	1
>   #define KVM_FEATURE_MMU_OP		2
> +/* This indicates that the new set of kvmclock msrs
> + * are available. The use of 0x11 and 0x12 is deprecated
> + */
> +#define KVM_FEATURE_CLOCKSOURCE2        3
>    

Please document this in Documentation/kvm/cpuid.txt (also /msr.txt).

>
>   #define MSR_KVM_WALL_CLOCK  0x11
>   #define MSR_KVM_SYSTEM_TIME 0x12
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a2ead7f..04f04aa 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
>   	case KVM_CAP_MCE:
>   		r = KVM_MAX_MCE_BANKS;
>   		break;
> +	case KVM_CAP_X86_CPUID_FEATURE_LIST:
> +		r = (1<<  KVM_FEATURE_CLOCKSOURCE) |
> +		(1<<  KVM_FEATURE_NOP_IO_DELAY) |
> +		(1<<  KVM_FEATURE_MMU_OP) |
> +		(1<<  KVM_FEATURE_CLOCKSOURCE2);
> +		break;
>   	default:
>   		r = 0;
>   		break;
>    

Hmm.  We already have an API to get cpuid bits: 
KVM_GET_SUPPORTED_CPUID2.  The nice thing about it is that it will mean 
-cpu host will work out of the box.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc
  2010-04-27 18:03               ` Avi Kivity
@ 2010-04-27 18:57                 ` Glauber Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-27 18:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, linux-kernel

On Tue, Apr 27, 2010 at 09:03:55PM +0300, Avi Kivity wrote:
> On 04/27/2010 04:40 PM, Marcelo Tosatti wrote:
> >On Mon, Apr 26, 2010 at 01:46:28PM -0400, Glauber Costa wrote:
> >>If the HV told us we can fully trust the TSC, skip any
> >>correction
> >>
> >Please drop this. Its not certain what is the best method to reduce
> >contention on the global variable. Whatever it is, can be done later.
> >
> 
> The problem is lead time.  If we (or the cpu vendors) fix this in
> the hypervisor/processors in a year, then we want existing guests to
> take advantage of it immediately.
100 % Agreed.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-27 18:12         ` Avi Kivity
@ 2010-04-27 19:09           ` Glauber Costa
  2010-04-27 19:20             ` Avi Kivity
  0 siblings, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2010-04-27 19:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

On Tue, Apr 27, 2010 at 09:12:47PM +0300, Avi Kivity wrote:
> On 04/26/2010 08:46 PM, Glauber Costa wrote:
> >Since we're changing the msrs kvmclock uses, we have to communicate
> >that to the guest, through cpuid. We can add a new KVM_CAP to the
> >hypervisor, and then patch userspace to recognize it.
> >
> >And if we ever add a new cpuid bit in the future, we have to do that again,
> >which create some complexity and delay in feature adoption.
> >
> >Instead, what I'm proposing in this patch is a new capability, called
> >KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
> >currently supported by the hypervisor. If we ever want to add or remove
> >some feature, we only need to tweak into the HV, leaving userspace untouched.
> >
> >Signed-off-by: Glauber Costa<glommer@redhat.com>
> >---
> >  arch/x86/include/asm/kvm_para.h |    4 ++++
> >  arch/x86/kvm/x86.c              |    6 ++++++
> >  include/linux/kvm.h             |    1 +
> >  3 files changed, 11 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> >index 9734808..f019f8c 100644
> >--- a/arch/x86/include/asm/kvm_para.h
> >+++ b/arch/x86/include/asm/kvm_para.h
> >@@ -16,6 +16,10 @@
> >  #define KVM_FEATURE_CLOCKSOURCE		0
> >  #define KVM_FEATURE_NOP_IO_DELAY	1
> >  #define KVM_FEATURE_MMU_OP		2
> >+/* This indicates that the new set of kvmclock msrs
> >+ * are available. The use of 0x11 and 0x12 is deprecated
> >+ */
> >+#define KVM_FEATURE_CLOCKSOURCE2        3
> 
> Please document this in Documentation/kvm/cpuid.txt (also /msr.txt).
> 
> >
> >  #define MSR_KVM_WALL_CLOCK  0x11
> >  #define MSR_KVM_SYSTEM_TIME 0x12
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index a2ead7f..04f04aa 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_MCE:
> >  		r = KVM_MAX_MCE_BANKS;
> >  		break;
> >+	case KVM_CAP_X86_CPUID_FEATURE_LIST:
> >+		r = (1<<  KVM_FEATURE_CLOCKSOURCE) |
> >+		(1<<  KVM_FEATURE_NOP_IO_DELAY) |
> >+		(1<<  KVM_FEATURE_MMU_OP) |
> >+		(1<<  KVM_FEATURE_CLOCKSOURCE2);
> >+		break;
> >  	default:
> >  		r = 0;
> >  		break;
> 
> Hmm.  We already have an API to get cpuid bits:
> KVM_GET_SUPPORTED_CPUID2.  The nice thing about it is that it will
> mean -cpu host will work out of the box.

Ok, from what I understand, KVM_GET_CPUID2 gets a set of features, and tells
userspace which of them are available. Right?

If this is the case, userspace could ask for 0xffffffff, and then we tell them
which of them are present.

Does that make sense?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
  2010-04-27 18:07   ` Avi Kivity
@ 2010-04-27 19:09     ` Glauber Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2010-04-27 19:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, Jeremy Fitzhardinge

On Tue, Apr 27, 2010 at 09:07:18PM +0300, Avi Kivity wrote:
> On 04/26/2010 08:46 PM, Glauber Costa wrote:
> >This patch removes one padding byte and transform it into a flags
> >field. New versions of guests using pvclock will query these flags
> >upon each read.
> >
> >Flags, however, will only be interpreted when the guest decides to.
> >It uses the pvclock_valid_flags function to signal that a specific
> >set of flags should be taken into consideration. Which flags are valid
> >are usually devised via HV negotiation.
> >
> >
> >+void pvclock_valid_flags(u8 flags);
> 
> set_pvclock_valid_flags() (or just set_pvclock_flags?
> pvclock_set_flags()?).  The original name looks like a getter.
Ok, will change it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] export new cpuid KVM_CAP
  2010-04-27 19:09           ` Glauber Costa
@ 2010-04-27 19:20             ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2010-04-27 19:20 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel

On 04/27/2010 10:09 PM, Glauber Costa wrote:
>
>> Hmm.  We already have an API to get cpuid bits:
>> KVM_GET_SUPPORTED_CPUID2.  The nice thing about it is that it will
>> mean -cpu host will work out of the box.
>>      
> Ok, from what I understand, KVM_GET_CPUID2 gets a set of features, and tells
> userspace which of them are available. Right?
>    

No.  KVM_GET_CPUID2 reads what was set by KVM_SET_CPUID, as modified by 
the guest executing the cpuid instruction.  KVM_GET_SUPPORTED_CPUID 
tells userspace which bits are supported by the host cpu and kvm.

> If this is the case, userspace could ask for 0xffffffff, and then we tell them
> which of them are present.
>
> Does that make sense?
>    

The API for KVM_GET_SUPPORTED_CPUID returns all cpuid leaves supported 
in one go, IIRC.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-04-27 19:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-26 17:46 [PATCH 0/6] pvclock fixes Glauber Costa
2010-04-26 17:46 ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Glauber Costa
2010-04-26 17:46   ` [PATCH 2/6] Add a global synchronization point for pvclock Glauber Costa
2010-04-26 17:46     ` [PATCH 3/6] change msr numbers for kvmclock Glauber Costa
2010-04-26 17:46       ` [PATCH 4/6] export new cpuid KVM_CAP Glauber Costa
2010-04-26 17:46         ` [PATCH 5/6] Try using new kvm clock msrs Glauber Costa
2010-04-26 17:46           ` [PATCH 6/6] don't compute pvclock adjustments if we trust the tsc Glauber Costa
2010-04-27 13:40             ` Marcelo Tosatti
2010-04-27 15:11               ` Glauber Costa
2010-04-27 18:03               ` Avi Kivity
2010-04-27 18:57                 ` Glauber Costa
2010-04-27 13:35           ` [PATCH 5/6] Try using new kvm clock msrs Marcelo Tosatti
2010-04-27 13:30         ` [PATCH 4/6] export new cpuid KVM_CAP Marcelo Tosatti
2010-04-27 15:09           ` Glauber Costa
2010-04-27 16:55           ` Glauber Costa
2010-04-27 18:12         ` Avi Kivity
2010-04-27 19:09           ` Glauber Costa
2010-04-27 19:20             ` Avi Kivity
2010-04-27 13:28     ` [PATCH 2/6] Add a global synchronization point for pvclock Marcelo Tosatti
2010-04-27 18:00       ` Jeremy Fitzhardinge
2010-04-26 18:11   ` [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure Jeremy Fitzhardinge
2010-04-26 18:45     ` Glauber Costa
2010-04-27 18:07   ` Avi Kivity
2010-04-27 19:09     ` Glauber Costa
2010-04-27  2:21 ` [PATCH 0/6] pvclock fixes Zachary Amsden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.