All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Automatically grab wallclock time updates from hypervisor
@ 2009-09-01 11:50 Glauber Costa
  2009-09-01 11:50 ` [PATCH 1/2] keep guest wallclock in sync with host clock Glauber Costa
  0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2009-09-01 11:50 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

Hi folks,

In this proposed patch, I am introducing a worker fired by kvmclock that updates
guest wallclock periodically to reflect changes in the host's wallclock. With this
patch, a large pool of VMs will no longer have to run NTP in all of its guests.

The worker does that at a configurable interval, with a minimum granularity of 1
second. So, although not exactly cheap, the msr write needed to get an updated
wallclock value won't pose a heavy burden on the system.

It is also possible to disable it completely if this behaviour is undesired for
a specific scenario.

diffstat follows:

 arch/x86/include/asm/kvm_para.h |    6 +++
 arch/x86/kernel/kvmclock.c      |   77 ++++++++++++++++++++++++++++++++++-----
 kernel/sysctl.c                 |   13 +++++++
 3 files changed, 87 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] keep guest wallclock in sync with host clock
  2009-09-01 11:50 [PATCH 0/2] Automatically grab wallclock time updates from hypervisor Glauber Costa
@ 2009-09-01 11:50 ` Glauber Costa
  2009-09-01 11:50   ` [PATCH 2/2] add sysctl for kvm wallclock sync Glauber Costa
  2009-09-02 11:44   ` [PATCH 1/2] keep guest wallclock in sync with host clock Avi Kivity
  0 siblings, 2 replies; 11+ messages in thread
From: Glauber Costa @ 2009-09-01 11:50 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
However, the current mechanism will not propagate changes in wallclock value
upwards. This effectively means that in a large pool of VMs that need accurate timing,
all of them has to run NTP, instead of just the host doing it.

Since the host updates information in the shared memory area upon msr writes,
this patch introduces a worker that writes to that msr, and calls do_settimeofday
at fixed intervals, with second resolution. A interval of 0 determines that we
are not interested in this behaviour. A later patch will make this optional at
runtime

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/kernel/kvmclock.c |   62 +++++++++++++++++++++++++++++++++++++------
 1 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index e5efcdc..fc409e9 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static unsigned int kvm_wall_update_interval = 5;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -39,24 +40,67 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
-/*
- * The wallclock is the time of day when we booted. Since then, some time may
- * have elapsed since the hypervisor wrote the data. So we try to account for
- * that with system time
- */
-static unsigned long kvm_get_wallclock(void)
+static void kvm_get_wall_ts(struct timespec *ts)
 {
-	struct pvclock_vcpu_time_info *vcpu_time;
-	struct timespec ts;
 	int low, high;
+	struct pvclock_vcpu_time_info *vcpu_time;
 
 	low = (int)__pa_symbol(&wall_clock);
 	high = ((u64)__pa_symbol(&wall_clock) >> 32);
 	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
 
 	vcpu_time = &get_cpu_var(hv_clock);
-	pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
+	pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
 	put_cpu_var(hv_clock);
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work);
+static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock);
+
+static void schedule_next_update(void)
+{
+	struct timespec next;
+
+	if (kvm_wall_update_interval == 0)
+		return;
+
+	next.tv_sec = kvm_wall_update_interval;
+	next.tv_nsec = 0;
+
+	schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next));
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work)
+{
+	struct timespec now;
+
+	kvm_get_wall_ts(&now);
+
+	do_settimeofday(&now);
+	schedule_next_update();
+}
+
+static __init int init_updates(void)
+{
+	schedule_next_update();
+	return 0;
+}
+/*
+ * It has to be run after workqueues are initialized, since we call
+ * schedule_delayed_work. Other than that, we have no specific requirements
+ */
+late_initcall(init_updates);
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that with system time
+ */
+static unsigned long kvm_get_wallclock(void)
+{
+	struct timespec ts;
+
+	kvm_get_wall_ts(&ts);
 
 	return ts.tv_sec;
 }
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] add sysctl for kvm wallclock sync
  2009-09-01 11:50 ` [PATCH 1/2] keep guest wallclock in sync with host clock Glauber Costa
@ 2009-09-01 11:50   ` Glauber Costa
  2009-09-02  6:54     ` Chris Lalancette
  2009-09-02 11:44   ` [PATCH 1/2] keep guest wallclock in sync with host clock Avi Kivity
  1 sibling, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2009-09-01 11:50 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi

This patch introduces a new sysctl called kvm_sync_wallclock.

It controls the behaviour of the worker that updates guest wallclock time.
The worker will fire in periods specified by its value, if it is greater than zero,
and not fire at all otherwise.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    6 ++++++
 arch/x86/kernel/kvmclock.c      |   17 ++++++++++++++++-
 kernel/sysctl.c                 |   13 +++++++++++++
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index b8a3305..3a3f38f 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -47,8 +47,14 @@ struct kvm_mmu_op_release_pt {
 
 #ifdef __KERNEL__
 #include <asm/processor.h>
+#include <linux/sysctl.h>
 
 extern void kvmclock_init(void);
+extern unsigned int kvm_wall_update_interval;
+extern int kvm_sync_wall_handler(struct ctl_table *table, int write,
+                          struct file *filp, void __user *buffer,
+                          size_t *lenp, loff_t *ppos);
+
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index fc409e9..3a4e1bd 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,7 +27,7 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
-static unsigned int kvm_wall_update_interval = 5;
+unsigned int kvm_wall_update_interval = 5;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -91,6 +91,21 @@ static __init int init_updates(void)
  */
 late_initcall(init_updates);
 
+int kvm_sync_wall_handler(struct ctl_table *table, int write,
+	                  struct file *filp, void __user *buffer,
+			  size_t *lenp, loff_t *ppos)
+{
+	int ret  = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+
+	if (ret || !write)
+		return ret;
+
+	cancel_delayed_work_sync(&kvm_sync_wall_work);
+
+	schedule_next_update();
+	return 0;
+}
+
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 98e0232..b787c81 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -51,6 +51,7 @@
 #include <linux/ftrace.h>
 #include <linux/slow-work.h>
 #include <linux/perf_counter.h>
+#include <linux/kvm_para.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -989,6 +990,18 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+#ifdef CONFIG_KVM_CLOCK
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "kvm_sync_wallclock",
+		.data		= &kvm_wall_update_interval,
+		.maxlen		= sizeof(kvm_wall_update_interval),
+		.mode		= 0644,
+		.proc_handler	= &kvm_sync_wall_handler,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+#endif
 
 /*
  * NOTE: do not add new entries to this table unless you have read
-- 
1.6.2.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] add sysctl for kvm wallclock sync
  2009-09-01 11:50   ` [PATCH 2/2] add sysctl for kvm wallclock sync Glauber Costa
@ 2009-09-02  6:54     ` Chris Lalancette
  2009-09-02 11:31       ` Glauber Costa
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Lalancette @ 2009-09-02  6:54 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel, avi

Glauber Costa wrote:
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index fc409e9..3a4e1bd 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -27,7 +27,7 @@
>  #define KVM_SCALE 22
>  
>  static int kvmclock = 1;
> -static unsigned int kvm_wall_update_interval = 5;
> +unsigned int kvm_wall_update_interval = 5;

I think the overall idea is very interesting, but I also think that it should be
disabled by default.  Because of the problems with time in virtualization,
people are already conditioned to run ntpd inside their guests, and this
kvmclock change will "fight" with ntpd.  Also, the command "# date 09091323" (or
whatever) ceases to work like it does on bare-metal, so I think it has to be an
opt-in feature.

-- 
Chris Lalancette

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] add sysctl for kvm wallclock sync
  2009-09-02  6:54     ` Chris Lalancette
@ 2009-09-02 11:31       ` Glauber Costa
  2009-09-02 11:40         ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2009-09-02 11:31 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm, linux-kernel, avi

On Wed, Sep 02, 2009 at 08:54:37AM +0200, Chris Lalancette wrote:
> Glauber Costa wrote:
> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> > index fc409e9..3a4e1bd 100644
> > --- a/arch/x86/kernel/kvmclock.c
> > +++ b/arch/x86/kernel/kvmclock.c
> > @@ -27,7 +27,7 @@
> >  #define KVM_SCALE 22
> >  
> >  static int kvmclock = 1;
> > -static unsigned int kvm_wall_update_interval = 5;
> > +unsigned int kvm_wall_update_interval = 5;
> 
> I think the overall idea is very interesting, but I also think that it should be
> disabled by default.  Because of the problems with time in virtualization,
> people are already conditioned to run ntpd inside their guests, and this
> kvmclock change will "fight" with ntpd.  Also, the command "# date 09091323" (or
> whatever) ceases to work like it does on bare-metal, so I think it has to be an
> opt-in feature.
I don't disagree.

Actually, I thought about that myself a few hours after I sent the patch. 
Avi, do you have a word on that ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] add sysctl for kvm wallclock sync
  2009-09-02 11:31       ` Glauber Costa
@ 2009-09-02 11:40         ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2009-09-02 11:40 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Chris Lalancette, kvm, linux-kernel

On 09/02/2009 02:31 PM, Glauber Costa wrote:
>> I think the overall idea is very interesting, but I also think that it should be
>> disabled by default.  Because of the problems with time in virtualization,
>> people are already conditioned to run ntpd inside their guests, and this
>> kvmclock change will "fight" with ntpd.  Also, the command "# date 09091323" (or
>> whatever) ceases to work like it does on bare-metal, so I think it has to be an
>> opt-in feature.
>>      
> I don't disagree.
>
> Actually, I thought about that myself a few hours after I sent the patch.
> Avi, do you have a word on that ?
>    

Chris' arguments are compelling IMO.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] keep guest wallclock in sync with host clock
  2009-09-01 11:50 ` [PATCH 1/2] keep guest wallclock in sync with host clock Glauber Costa
  2009-09-01 11:50   ` [PATCH 2/2] add sysctl for kvm wallclock sync Glauber Costa
@ 2009-09-02 11:44   ` Avi Kivity
  2009-09-02 12:21     ` Glauber Costa
  1 sibling, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2009-09-02 11:44 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel

On 09/01/2009 02:50 PM, Glauber Costa wrote:
> KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> However, the current mechanism will not propagate changes in wallclock value
> upwards. This effectively means that in a large pool of VMs that need accurate timing,
> all of them has to run NTP, instead of just the host doing it.
>
> Since the host updates information in the shared memory area upon msr writes,
> this patch introduces a worker that writes to that msr, and calls do_settimeofday
> at fixed intervals, with second resolution. A interval of 0 determines that we
> are not interested in this behaviour. A later patch will make this optional at
> runtime
>
> +
> +static void kvm_sync_wall_clock(struct work_struct *work)
> +{
> +	struct timespec now;
> +
> +	kvm_get_wall_ts(&now);
>    

What happens if we schedule here?

> +
> +	do_settimeofday(&now);
> +	schedule_next_update();
> +}
> +
> +static __init int init_updates(void)
> +{
> +	schedule_next_update();
> +	return 0;
> +}
> +/*
> + * It has to be run after workqueues are initialized, since we call
> + * schedule_delayed_work. Other than that, we have no specific requirements
> + */
> +late_initcall(init_updates);
>    

Should this run on bare metal too?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] keep guest wallclock in sync with host clock
  2009-09-02 11:44   ` [PATCH 1/2] keep guest wallclock in sync with host clock Avi Kivity
@ 2009-09-02 12:21     ` Glauber Costa
  2009-09-02 12:24       ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2009-09-02 12:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

On Wed, Sep 02, 2009 at 02:44:11PM +0300, Avi Kivity wrote:
> On 09/01/2009 02:50 PM, Glauber Costa wrote:
>> KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
>> However, the current mechanism will not propagate changes in wallclock value
>> upwards. This effectively means that in a large pool of VMs that need accurate timing,
>> all of them has to run NTP, instead of just the host doing it.
>>
>> Since the host updates information in the shared memory area upon msr writes,
>> this patch introduces a worker that writes to that msr, and calls do_settimeofday
>> at fixed intervals, with second resolution. A interval of 0 determines that we
>> are not interested in this behaviour. A later patch will make this optional at
>> runtime
>>
>> +
>> +static void kvm_sync_wall_clock(struct work_struct *work)
>> +{
>> +	struct timespec now;
>> +
>> +	kvm_get_wall_ts(&now);
>>    
>
> What happens if we schedule here?
hummm, I guess disabling preemption would be enough to make us safe here?

>
>> +
>> +	do_settimeofday(&now);
>> +	schedule_next_update();
>> +}
>> +
>> +static __init int init_updates(void)
>> +{
>> +	schedule_next_update();
>> +	return 0;
>> +}
>> +/*
>> + * It has to be run after workqueues are initialized, since we call
>> + * schedule_delayed_work. Other than that, we have no specific requirements
>> + */
>> +late_initcall(init_updates);
>>    
>
> Should this run on bare metal too?
>
> -- 
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] keep guest wallclock in sync with host clock
  2009-09-02 12:21     ` Glauber Costa
@ 2009-09-02 12:24       ` Avi Kivity
  2009-09-02 12:48         ` Glauber Costa
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2009-09-02 12:24 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel

On 09/02/2009 03:21 PM, Glauber Costa wrote:
>
>>> +static void kvm_sync_wall_clock(struct work_struct *work)
>>> +{
>>> +	struct timespec now;
>>> +
>>> +	kvm_get_wall_ts(&now);
>>>
>>>        
>> What happens if we schedule here?
>>      
> hummm, I guess disabling preemption would be enough to make us safe here?
>    

You can't prevent host preemption.  You might read kvmclock again and 
repeat if too much time has passed.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] keep guest wallclock in sync with host clock
  2009-09-02 12:24       ` Avi Kivity
@ 2009-09-02 12:48         ` Glauber Costa
  2009-09-02 12:56           ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2009-09-02 12:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

On Wed, Sep 02, 2009 at 03:24:26PM +0300, Avi Kivity wrote:
> On 09/02/2009 03:21 PM, Glauber Costa wrote:
>>
>>>> +static void kvm_sync_wall_clock(struct work_struct *work)
>>>> +{
>>>> +	struct timespec now;
>>>> +
>>>> +	kvm_get_wall_ts(&now);
>>>>
>>>>        
>>> What happens if we schedule here?
>>>      
>> hummm, I guess disabling preemption would be enough to make us safe here?
>>    
>
> You can't prevent host preemption.  You might read kvmclock again and  
> repeat if too much time has passed.
But then you can be scheduled after you did settimeofday, but before reading
kvmclock again. Since we're aiming for periodic adjustments here,
any discrepancies should not last long, so we can maybe live with it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] keep guest wallclock in sync with host clock
  2009-09-02 12:48         ` Glauber Costa
@ 2009-09-02 12:56           ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2009-09-02 12:56 UTC (permalink / raw)
  To: Glauber Costa; +Cc: kvm, linux-kernel

On 09/02/2009 03:48 PM, Glauber Costa wrote:
>
>> You can't prevent host preemption.  You might read kvmclock again and
>> repeat if too much time has passed.
>>      
> But then you can be scheduled after you did settimeofday, but before reading
> kvmclock again. Since we're aiming for periodic adjustments here,
> any discrepancies should not last long, so we can maybe live with it.
>    

do {
     read_kvmclock
     settimeofday
     read_kvmclock
} while the_difference_between_the_two_reads_is_too_large


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-09-02 12:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-01 11:50 [PATCH 0/2] Automatically grab wallclock time updates from hypervisor Glauber Costa
2009-09-01 11:50 ` [PATCH 1/2] keep guest wallclock in sync with host clock Glauber Costa
2009-09-01 11:50   ` [PATCH 2/2] add sysctl for kvm wallclock sync Glauber Costa
2009-09-02  6:54     ` Chris Lalancette
2009-09-02 11:31       ` Glauber Costa
2009-09-02 11:40         ` Avi Kivity
2009-09-02 11:44   ` [PATCH 1/2] keep guest wallclock in sync with host clock Avi Kivity
2009-09-02 12:21     ` Glauber Costa
2009-09-02 12:24       ` Avi Kivity
2009-09-02 12:48         ` Glauber Costa
2009-09-02 12:56           ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.