* [RFC 5/7] kvm steal time implementation
2010-08-25 21:43 ` [RFC 4/7] change kernel accounting to include steal time Glauber Costa
@ 2010-08-25 21:43 ` Glauber Costa
2010-08-25 21:43 ` [RFC 6/7] touch softlockup watchdog Glauber Costa
2010-08-26 22:13 ` [RFC 5/7] kvm steal time implementation Rik van Riel
2010-08-26 17:23 ` [RFC 4/7] change kernel accounting to include steal time Marcelo Tosatti
` (2 subsequent siblings)
3 siblings, 2 replies; 40+ messages in thread
From: Glauber Costa @ 2010-08-25 21:43 UTC (permalink / raw)
To: kvm; +Cc: avi, zamsden, mtosatti, riel
This is the proposed kvm-side steal time implementation.
It is migration safe, as it checks flags at every read.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/kernel/kvmclock.c | 35 +++++++++++++++++++++++++++++++++++
1 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..a1f4852 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -18,6 +18,8 @@
#include <linux/clocksource.h>
#include <linux/kvm_para.h>
+#include <linux/kernel_stat.h>
+#include <linux/sched.h>
#include <asm/pvclock.h>
#include <asm/msr.h>
#include <asm/apic.h>
@@ -41,6 +43,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
/* The hypervisor will put information about time periodically here */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
+static DEFINE_PER_CPU(u64, steal_info);
static struct pvclock_wall_clock wall_clock;
/*
@@ -82,6 +85,32 @@ static cycle_t kvm_clock_read(void)
return ret;
}
+static DEFINE_PER_CPU(u64, steal_info);
+
+cputime_t kvm_get_steal_time(void)
+{
+ u64 delta = 0;
+ u64 *last_steal_info, this_steal_info;
+ struct pvclock_vcpu_time_info *src;
+
+ src = &get_cpu_var(hv_clock);
+ if (!(src->flags & PVCLOCK_STEAL_BIT))
+ goto out;
+
+ this_steal_info = src->steal_time;
+ put_cpu_var(hv_clock);
+
+ last_steal_info = &get_cpu_var(steal_info);
+
+ delta = this_steal_info - *last_steal_info;
+
+ *last_steal_info = this_steal_info;
+ put_cpu_var(steal_info);
+
+out:
+ return msecs_to_cputime(delta);
+}
+
static cycle_t kvm_clock_get_cycles(struct clocksource *cs)
{
return kvm_clock_read();
@@ -134,6 +163,8 @@ static int kvm_register_clock(char *txt)
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
+ per_cpu(steal_info, cpu) = 0;
+
return native_write_msr_safe(msr_kvm_system_time, low, high);
}
@@ -218,4 +249,8 @@ void __init kvmclock_init(void)
if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+
+
+ if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STEAL_BIT))
+ hypervisor_steal_time = kvm_get_steal_time;
}
--
1.6.2.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [RFC 6/7] touch softlockup watchdog
2010-08-25 21:43 ` [RFC 5/7] kvm steal time implementation Glauber Costa
@ 2010-08-25 21:43 ` Glauber Costa
2010-08-25 21:43 ` [RFC 7/7] tell guest about steal time feature Glauber Costa
2010-08-26 22:13 ` [RFC 5/7] kvm steal time implementation Rik van Riel
1 sibling, 1 reply; 40+ messages in thread
From: Glauber Costa @ 2010-08-25 21:43 UTC (permalink / raw)
To: kvm; +Cc: avi, zamsden, mtosatti, riel
With a reliable steal time mechanism, we can tell if we're
out of the cpu for very long, differentiating from the case
that we simply got a real softlockup.
In the case we were out of cpu, the watchdog is fed, making
bogus softlockups disappear.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/kernel/kvmclock.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index a1f4852..1c496c8 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -104,6 +104,9 @@ cputime_t kvm_get_steal_time(void)
delta = this_steal_info - *last_steal_info;
+ if (delta > 1000UL)
+ touch_softlockup_watchdog();
+
*last_steal_info = this_steal_info;
put_cpu_var(steal_info);
--
1.6.2.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [RFC 7/7] tell guest about steal time feature
2010-08-25 21:43 ` [RFC 6/7] touch softlockup watchdog Glauber Costa
@ 2010-08-25 21:43 ` Glauber Costa
0 siblings, 0 replies; 40+ messages in thread
From: Glauber Costa @ 2010-08-25 21:43 UTC (permalink / raw)
To: kvm; +Cc: avi, zamsden, mtosatti, riel
Guest kernel will only activate steal time if the host exports it.
Warn the guest about it.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/kvm/x86.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 680feaa..9a20cd8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2101,7 +2101,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
(1 << KVM_FEATURE_NOP_IO_DELAY) |
(1 << KVM_FEATURE_CLOCKSOURCE2) |
- (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+ (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+ (1 << KVM_FEATURE_CLOCKSOURCE_STEAL_BIT);
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
--
1.6.2.2
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFC 5/7] kvm steal time implementation
2010-08-25 21:43 ` [RFC 5/7] kvm steal time implementation Glauber Costa
2010-08-25 21:43 ` [RFC 6/7] touch softlockup watchdog Glauber Costa
@ 2010-08-26 22:13 ` Rik van Riel
2010-08-26 22:35 ` Glauber Costa
1 sibling, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2010-08-26 22:13 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, avi, zamsden, mtosatti
On 08/25/2010 05:43 PM, Glauber Costa wrote:
> This is the proposed kvm-side steal time implementation.
> It is migration safe, as it checks flags at every read.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
> ---
> arch/x86/kernel/kvmclock.c | 35 +++++++++++++++++++++++++++++++++++
> 1 files changed, 35 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index eb9b76c..a1f4852 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -18,6 +18,8 @@
>
> #include<linux/clocksource.h>
> #include<linux/kvm_para.h>
> +#include<linux/kernel_stat.h>
> +#include<linux/sched.h>
> #include<asm/pvclock.h>
> #include<asm/msr.h>
> #include<asm/apic.h>
> @@ -41,6 +43,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
>
> /* The hypervisor will put information about time periodically here */
> static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> +static DEFINE_PER_CPU(u64, steal_info);
> static struct pvclock_wall_clock wall_clock;
>
> /*
> @@ -82,6 +85,32 @@ static cycle_t kvm_clock_read(void)
> return ret;
> }
>
> +static DEFINE_PER_CPU(u64, steal_info);
> +
> +cputime_t kvm_get_steal_time(void)
> +{
> + u64 delta = 0;
> + u64 *last_steal_info, this_steal_info;
> + struct pvclock_vcpu_time_info *src;
> +
> + src =&get_cpu_var(hv_clock);
> + if (!(src->flags& PVCLOCK_STEAL_BIT))
> + goto out;
> +
> + this_steal_info = src->steal_time;
> + put_cpu_var(hv_clock);
> +
> + last_steal_info =&get_cpu_var(steal_info);
> +
> + delta = this_steal_info - *last_steal_info;
> +
> + *last_steal_info = this_steal_info;
> + put_cpu_var(steal_info);
> +
> +out:
> + return msecs_to_cputime(delta);
> +}
Can this be changed to properly deal with overflow in
src->steal_time, the same way we deal with (eg jiffie)
overflow elsewhere in the kernel?
--
All rights reversed
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 5/7] kvm steal time implementation
2010-08-26 22:13 ` [RFC 5/7] kvm steal time implementation Rik van Riel
@ 2010-08-26 22:35 ` Glauber Costa
0 siblings, 0 replies; 40+ messages in thread
From: Glauber Costa @ 2010-08-26 22:35 UTC (permalink / raw)
To: Rik van Riel; +Cc: kvm, avi, zamsden, mtosatti
On Thu, Aug 26, 2010 at 06:13:07PM -0400, Rik van Riel wrote:
> On 08/25/2010 05:43 PM, Glauber Costa wrote:
> >This is the proposed kvm-side steal time implementation.
> >It is migration safe, as it checks flags at every read.
> >
> >Signed-off-by: Glauber Costa<glommer@redhat.com>
> >---
> > arch/x86/kernel/kvmclock.c | 35 +++++++++++++++++++++++++++++++++++
> > 1 files changed, 35 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> >index eb9b76c..a1f4852 100644
> >--- a/arch/x86/kernel/kvmclock.c
> >+++ b/arch/x86/kernel/kvmclock.c
> >@@ -18,6 +18,8 @@
> >
> > #include<linux/clocksource.h>
> > #include<linux/kvm_para.h>
> >+#include<linux/kernel_stat.h>
> >+#include<linux/sched.h>
> > #include<asm/pvclock.h>
> > #include<asm/msr.h>
> > #include<asm/apic.h>
> >@@ -41,6 +43,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
> >
> > /* The hypervisor will put information about time periodically here */
> > static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> >+static DEFINE_PER_CPU(u64, steal_info);
> > static struct pvclock_wall_clock wall_clock;
> >
> > /*
> >@@ -82,6 +85,32 @@ static cycle_t kvm_clock_read(void)
> > return ret;
> > }
> >
> >+static DEFINE_PER_CPU(u64, steal_info);
> >+
> >+cputime_t kvm_get_steal_time(void)
> >+{
> >+ u64 delta = 0;
> >+ u64 *last_steal_info, this_steal_info;
> >+ struct pvclock_vcpu_time_info *src;
> >+
> >+ src =&get_cpu_var(hv_clock);
> >+ if (!(src->flags& PVCLOCK_STEAL_BIT))
> >+ goto out;
> >+
> >+ this_steal_info = src->steal_time;
> >+ put_cpu_var(hv_clock);
> >+
> >+ last_steal_info =&get_cpu_var(steal_info);
> >+
> >+ delta = this_steal_info - *last_steal_info;
> >+
> >+ *last_steal_info = this_steal_info;
> >+ put_cpu_var(steal_info);
> >+
> >+out:
> >+ return msecs_to_cputime(delta);
> >+}
>
> Can this be changed to properly deal with overflow in
> src->steal_time, the same way we deal with (eg jiffie)
> overflow elsewhere in the kernel?
I believe so.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-25 21:43 ` [RFC 4/7] change kernel accounting to include steal time Glauber Costa
2010-08-25 21:43 ` [RFC 5/7] kvm steal time implementation Glauber Costa
@ 2010-08-26 17:23 ` Marcelo Tosatti
2010-08-26 20:28 ` Glauber Costa
2010-08-26 21:19 ` Rik van Riel
2010-08-29 9:59 ` Avi Kivity
3 siblings, 1 reply; 40+ messages in thread
From: Marcelo Tosatti @ 2010-08-26 17:23 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, avi, zamsden, riel
On Wed, Aug 25, 2010 at 05:43:14PM -0400, Glauber Costa wrote:
> This patch proposes a common steal time implementation. When no
> steal time is accounted, we just add a branch to the current
> accounting code, that shouldn't add much overhead.
>
> When we do want to register steal time, we proceed as following:
> - if we would account user or system time in this tick, and there is
> out-of-cpu time registered, we skip it altogether, and account steal
> time only.
> - if we would account user or system time in this tick, and we got the
> cpu for the whole slice, we proceed normaly.
> - if we are idle in this tick, we flush out-of-cpu time to give it the
> chance to update whatever last-measure internal variable it may have.
Problem of using sched notifiers is that you don't differentiate whether
the vcpu scheduled out by its own (via hlt emulation) or not.
Skipping accounting of user/system time whenever there's any stolen
time detected probably breaks u/s accounting on non-cpu-hog loads.
I suppose steal time should be accounted separately from u/s ticks, as
Xen does.
+ if (delta > 1000UL)
+ touch_softlockup_watchdog();
+
This will break authentic soft lockup detection whenever qemu processing
takes more than 1s.
>
> This approach is simple, but proved to work well for my test scenarios.
> in a UP guest on UP host, with a cpu-hog in both guest and host shows
> ~ 50 % steal time. steal time is also accounted proportionally, if
> nice values are given to the host cpu-hog.
>
> A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> with 100 % idle, as one would expect.
>
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
> include/linux/sched.h | 1 +
> kernel/sched.c | 29 +++++++++++++++++++++++++++++
> 2 files changed, 30 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0478888..e571ddd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout);
> extern void cpu_init (void);
> extern void trap_init(void);
> extern void update_process_times(int user);
> +extern cputime_t (*hypervisor_steal_time)(void);
> extern void scheduler_tick(void);
>
> extern void sched_show_task(struct task_struct *p);
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f52a880..9695c92 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
> return ns;
> }
>
> +cputime_t (*hypervisor_steal_time)(void) = NULL;
> +
> +static inline cputime_t get_steal_time_from_hypervisor(void)
> +{
> + if (!hypervisor_steal_time)
> + return 0;
> + return hypervisor_steal_time();
> +}
> +
> +
> /*
> * Account user cpu time to a process.
> * @p: the process that the cpu time gets accounted to
> @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
> struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
> cputime64_t tmp;
>
> + tmp = get_steal_time_from_hypervisor();
> + if (tmp) {
> + account_steal_time(tmp);
> + return;
> + }
> +
> /* Add user time to process. */
> p->utime = cputime_add(p->utime, cputime);
> p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
> @@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
> return;
> }
>
> + tmp = get_steal_time_from_hypervisor();
> + if (tmp) {
> + account_steal_time(tmp);
> + return;
> + }
> +
> /* Add system time to process. */
> p->stime = cputime_add(p->stime, cputime);
> p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
> @@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime)
> cputime64_t cputime64 = cputime_to_cputime64(cputime);
> struct rq *rq = this_rq();
>
> + /*
> + * if we're idle, we don't account it as steal time, since we did
> + * not want to run anyway. We do call the steal function, however, to
> + * give the guest the chance to flush its internal buffers
> + */
> + get_steal_time_from_hypervisor();
> +
> if (atomic_read(&rq->nr_iowait) > 0)
> cpustat->iowait = cputime64_add(cpustat->iowait, cputime64);
> else
> --
> 1.6.2.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 17:23 ` [RFC 4/7] change kernel accounting to include steal time Marcelo Tosatti
@ 2010-08-26 20:28 ` Glauber Costa
2010-08-26 20:47 ` Marcelo Tosatti
2010-08-26 21:14 ` Anthony Liguori
0 siblings, 2 replies; 40+ messages in thread
From: Glauber Costa @ 2010-08-26 20:28 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 02:23:03PM -0300, Marcelo Tosatti wrote:
> On Wed, Aug 25, 2010 at 05:43:14PM -0400, Glauber Costa wrote:
> > This patch proposes a common steal time implementation. When no
> > steal time is accounted, we just add a branch to the current
> > accounting code, that shouldn't add much overhead.
> >
> > When we do want to register steal time, we proceed as following:
> > - if we would account user or system time in this tick, and there is
> > out-of-cpu time registered, we skip it altogether, and account steal
> > time only.
> > - if we would account user or system time in this tick, and we got the
> > cpu for the whole slice, we proceed normaly.
> > - if we are idle in this tick, we flush out-of-cpu time to give it the
> > chance to update whatever last-measure internal variable it may have.
>
> Problem of using sched notifiers is that you don't differentiate whether
> the vcpu scheduled out by its own (via hlt emulation) or not.
And we don't need to. If we're out because we want to, we're idle.
And so, we don't account steal time.
> Skipping accounting of user/system time whenever there's any stolen
> time detected probably breaks u/s accounting on non-cpu-hog loads.
I am willing to test some workloads you can suggest, but right now,
(yeah, I mostly used cpu-hogs), this scheme worked better.
Linux does statistical sampling for accounting anyway, so I don't see
it getting much worse.
>
> I suppose steal time should be accounted separately from u/s ticks, as
> Xen does.
It requires us to hook somewhere else, which I deem as overcomplicated.
Do you have any suggestion on how to make it simple?
Furthermore, "doing separate", is equivalent of not skipping user/system,
if we really prefer to.
> + if (delta > 1000UL)
> + touch_softlockup_watchdog();
> +
>
> This will break authentic soft lockup detection whenever qemu processing
> takes more than 1s.
This should be 10s. 1000UL is a typo.
>
> >
> > This approach is simple, but proved to work well for my test scenarios.
> > in a UP guest on UP host, with a cpu-hog in both guest and host shows
> > ~ 50 % steal time. steal time is also accounted proportionally, if
> > nice values are given to the host cpu-hog.
> >
> > A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> > with 100 % idle, as one would expect.
> >
> > Signed-off-by: Glauber Costa <glommer@redhat.com>
> > ---
> > include/linux/sched.h | 1 +
> > kernel/sched.c | 29 +++++++++++++++++++++++++++++
> > 2 files changed, 30 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 0478888..e571ddd 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout);
> > extern void cpu_init (void);
> > extern void trap_init(void);
> > extern void update_process_times(int user);
> > +extern cputime_t (*hypervisor_steal_time)(void);
> > extern void scheduler_tick(void);
> >
> > extern void sched_show_task(struct task_struct *p);
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index f52a880..9695c92 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
> > return ns;
> > }
> >
> > +cputime_t (*hypervisor_steal_time)(void) = NULL;
> > +
> > +static inline cputime_t get_steal_time_from_hypervisor(void)
> > +{
> > + if (!hypervisor_steal_time)
> > + return 0;
> > + return hypervisor_steal_time();
> > +}
> > +
> > +
> > /*
> > * Account user cpu time to a process.
> > * @p: the process that the cpu time gets accounted to
> > @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
> > struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
> > cputime64_t tmp;
> >
> > + tmp = get_steal_time_from_hypervisor();
> > + if (tmp) {
> > + account_steal_time(tmp);
> > + return;
> > + }
> > +
> > /* Add user time to process. */
> > p->utime = cputime_add(p->utime, cputime);
> > p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
> > @@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
> > return;
> > }
> >
> > + tmp = get_steal_time_from_hypervisor();
> > + if (tmp) {
> > + account_steal_time(tmp);
> > + return;
> > + }
> > +
> > /* Add system time to process. */
> > p->stime = cputime_add(p->stime, cputime);
> > p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
> > @@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime)
> > cputime64_t cputime64 = cputime_to_cputime64(cputime);
> > struct rq *rq = this_rq();
> >
> > + /*
> > + * if we're idle, we don't account it as steal time, since we did
> > + * not want to run anyway. We do call the steal function, however, to
> > + * give the guest the chance to flush its internal buffers
> > + */
> > + get_steal_time_from_hypervisor();
> > +
> > if (atomic_read(&rq->nr_iowait) > 0)
> > cpustat->iowait = cputime64_add(cpustat->iowait, cputime64);
> > else
> > --
> > 1.6.2.2
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 20:28 ` Glauber Costa
@ 2010-08-26 20:47 ` Marcelo Tosatti
2010-08-26 21:05 ` Rik van Riel
2010-08-26 21:13 ` Glauber Costa
2010-08-26 21:14 ` Anthony Liguori
1 sibling, 2 replies; 40+ messages in thread
From: Marcelo Tosatti @ 2010-08-26 20:47 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 05:28:56PM -0300, Glauber Costa wrote:
> On Thu, Aug 26, 2010 at 02:23:03PM -0300, Marcelo Tosatti wrote:
> > On Wed, Aug 25, 2010 at 05:43:14PM -0400, Glauber Costa wrote:
> > > This patch proposes a common steal time implementation. When no
> > > steal time is accounted, we just add a branch to the current
> > > accounting code, that shouldn't add much overhead.
> > >
> > > When we do want to register steal time, we proceed as following:
> > > - if we would account user or system time in this tick, and there is
> > > out-of-cpu time registered, we skip it altogether, and account steal
> > > time only.
> > > - if we would account user or system time in this tick, and we got the
> > > cpu for the whole slice, we proceed normaly.
> > > - if we are idle in this tick, we flush out-of-cpu time to give it the
> > > chance to update whatever last-measure internal variable it may have.
> >
> > Problem of using sched notifiers is that you don't differentiate whether
> > the vcpu scheduled out by its own (via hlt emulation) or not.
> And we don't need to. If we're out because we want to, we're idle.
> And so, we don't account steal time.
Think of the program below.
> > Skipping accounting of user/system time whenever there's any stolen
> > time detected probably breaks u/s accounting on non-cpu-hog loads.
> I am willing to test some workloads you can suggest, but right now,
> (yeah, I mostly used cpu-hogs), this scheme worked better.
>
> Linux does statistical sampling for accounting anyway, so I don't see
> it getting much worse.
A "cpu hog" that sleeps 1us every 1ms.
> > I suppose steal time should be accounted separately from u/s ticks, as
> > Xen does.
> It requires us to hook somewhere else, which I deem as overcomplicated.
> Do you have any suggestion on how to make it simple?
Unfortunately no.
> Furthermore, "doing separate", is equivalent of not skipping user/system,
> if we really prefer to.
>
> > + if (delta > 1000UL)
> > + touch_softlockup_watchdog();
> > +
> >
> > This will break authentic soft lockup detection whenever qemu processing
> > takes more than 1s.
>
> This should be 10s. 1000UL is a typo.
Comment is still valid.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 20:47 ` Marcelo Tosatti
@ 2010-08-26 21:05 ` Rik van Riel
2010-08-26 21:13 ` Glauber Costa
1 sibling, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2010-08-26 21:05 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Glauber Costa, kvm, avi, zamsden
On 08/26/2010 04:47 PM, Marcelo Tosatti wrote:
> On Thu, Aug 26, 2010 at 05:28:56PM -0300, Glauber Costa wrote:
>> On Thu, Aug 26, 2010 at 02:23:03PM -0300, Marcelo Tosatti wrote:
>>> Skipping accounting of user/system time whenever there's any stolen
>>> time detected probably breaks u/s accounting on non-cpu-hog loads.
Steal time does not completely skip accounting of user/system
time. Say that it has been 10ms since the last timer tick, we
are currently in user mode, and 4ms have been accounted as steal
time.
The steal time accounting code will then account 4ms as steal
time, and 6ms as user time.
It does not "skip accounting of user/system time" at all.
>> I am willing to test some workloads you can suggest, but right now,
>> (yeah, I mostly used cpu-hogs), this scheme worked better.
>>
>> Linux does statistical sampling for accounting anyway, so I don't see
>> it getting much worse.
>
> A "cpu hog" that sleeps 1us every 1ms.
This kind of program can be an issue with or without
steal time.
I don't see steal time make the situation any worse.
--
All rights reversed
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 20:47 ` Marcelo Tosatti
2010-08-26 21:05 ` Rik van Riel
@ 2010-08-26 21:13 ` Glauber Costa
1 sibling, 0 replies; 40+ messages in thread
From: Glauber Costa @ 2010-08-26 21:13 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 05:47:12PM -0300, Marcelo Tosatti wrote:
> > Linux does statistical sampling for accounting anyway, so I don't see
> > it getting much worse.
>
> A "cpu hog" that sleeps 1us every 1ms.
>
Imagine a user program, that at periodic intervals, goes to the system.
Under the current scheme, it will be accounted as 100 % system.
This is all statistical anyway
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 20:28 ` Glauber Costa
2010-08-26 20:47 ` Marcelo Tosatti
@ 2010-08-26 21:14 ` Anthony Liguori
2010-08-26 21:40 ` Glauber Costa
1 sibling, 1 reply; 40+ messages in thread
From: Anthony Liguori @ 2010-08-26 21:14 UTC (permalink / raw)
To: Glauber Costa; +Cc: Marcelo Tosatti, kvm, avi, zamsden, riel
On 08/26/2010 03:28 PM, Glauber Costa wrote:
>
>> + if (delta> 1000UL)
>> + touch_softlockup_watchdog();
>> +
>>
>> This will break authentic soft lockup detection whenever qemu processing
>> takes more than 1s.
>>
> This should be 10s. 1000UL is a typo.
>
I was wondering that when I first saw the patch.. 10s is the default
detection time but it's actually run time configurable so hard coding
10s is not correct.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 21:14 ` Anthony Liguori
@ 2010-08-26 21:40 ` Glauber Costa
2010-08-26 23:12 ` Marcelo Tosatti
0 siblings, 1 reply; 40+ messages in thread
From: Glauber Costa @ 2010-08-26 21:40 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Marcelo Tosatti, kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 04:14:47PM -0500, Anthony Liguori wrote:
> On 08/26/2010 03:28 PM, Glauber Costa wrote:
> >
> >>+ if (delta> 1000UL)
> >>+ touch_softlockup_watchdog();
> >>+
> >>
> >>This will break authentic soft lockup detection whenever qemu processing
> >>takes more than 1s.
> >This should be 10s. 1000UL is a typo.
>
> I was wondering that when I first saw the patch.. 10s is the
> default detection time but it's actually run time configurable so
> hard coding 10s is not correct.
Indeed, you are right.
Thanks
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 21:40 ` Glauber Costa
@ 2010-08-26 23:12 ` Marcelo Tosatti
2010-08-27 0:33 ` Glauber Costa
0 siblings, 1 reply; 40+ messages in thread
From: Marcelo Tosatti @ 2010-08-26 23:12 UTC (permalink / raw)
To: Glauber Costa; +Cc: Anthony Liguori, kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 06:40:36PM -0300, Glauber Costa wrote:
> On Thu, Aug 26, 2010 at 04:14:47PM -0500, Anthony Liguori wrote:
> > On 08/26/2010 03:28 PM, Glauber Costa wrote:
> > >
> > >>+ if (delta> 1000UL)
> > >>+ touch_softlockup_watchdog();
> > >>+
> > >>
> > >>This will break authentic soft lockup detection whenever qemu processing
> > >>takes more than 1s.
> > >This should be 10s. 1000UL is a typo.
> >
> > I was wondering that when I first saw the patch.. 10s is the
> > default detection time but it's actually run time configurable so
> > hard coding 10s is not correct.
This is not what i'm referring to. The code above will disable
softlockup detection in case of a vcpu blocked in qemu for longer than
softlockup threshold, which is a legitimate case.
>
> Indeed, you are right.
>
> Thanks
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 23:12 ` Marcelo Tosatti
@ 2010-08-27 0:33 ` Glauber Costa
2010-08-27 15:25 ` Marcelo Tosatti
0 siblings, 1 reply; 40+ messages in thread
From: Glauber Costa @ 2010-08-27 0:33 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Anthony Liguori, kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 08:12:40PM -0300, Marcelo Tosatti wrote:
> On Thu, Aug 26, 2010 at 06:40:36PM -0300, Glauber Costa wrote:
> > On Thu, Aug 26, 2010 at 04:14:47PM -0500, Anthony Liguori wrote:
> > > On 08/26/2010 03:28 PM, Glauber Costa wrote:
> > > >
> > > >>+ if (delta> 1000UL)
> > > >>+ touch_softlockup_watchdog();
> > > >>+
> > > >>
> > > >>This will break authentic soft lockup detection whenever qemu processing
> > > >>takes more than 1s.
> > > >This should be 10s. 1000UL is a typo.
> > >
> > > I was wondering that when I first saw the patch.. 10s is the
> > > default detection time but it's actually run time configurable so
> > > hard coding 10s is not correct.
>
> This is not what i'm referring to. The code above will disable
> softlockup detection in case of a vcpu blocked in qemu for longer than
> softlockup threshold, which is a legitimate case.
This is equivalent to a hardware so broken it can't even send
an NMI. Not sure we should worry too much about it.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-27 0:33 ` Glauber Costa
@ 2010-08-27 15:25 ` Marcelo Tosatti
0 siblings, 0 replies; 40+ messages in thread
From: Marcelo Tosatti @ 2010-08-27 15:25 UTC (permalink / raw)
To: Glauber Costa; +Cc: Anthony Liguori, kvm, avi, zamsden, riel
On Thu, Aug 26, 2010 at 09:33:02PM -0300, Glauber Costa wrote:
> On Thu, Aug 26, 2010 at 08:12:40PM -0300, Marcelo Tosatti wrote:
> > On Thu, Aug 26, 2010 at 06:40:36PM -0300, Glauber Costa wrote:
> > > On Thu, Aug 26, 2010 at 04:14:47PM -0500, Anthony Liguori wrote:
> > > > On 08/26/2010 03:28 PM, Glauber Costa wrote:
> > > > >
> > > > >>+ if (delta> 1000UL)
> > > > >>+ touch_softlockup_watchdog();
> > > > >>+
> > > > >>
> > > > >>This will break authentic soft lockup detection whenever qemu processing
> > > > >>takes more than 1s.
> > > > >This should be 10s. 1000UL is a typo.
> > > >
> > > > I was wondering that when I first saw the patch.. 10s is the
> > > > default detection time but it's actually run time configurable so
> > > > hard coding 10s is not correct.
> >
> > This is not what i'm referring to. The code above will disable
> > softlockup detection in case of a vcpu blocked in qemu for longer than
> > softlockup threshold, which is a legitimate case.
> This is equivalent to a hardware so broken it can't even send
> an NMI. Not sure we should worry too much about it.
Well, take the virtio-net issues for example. You'd disable reporting
for those.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-25 21:43 ` [RFC 4/7] change kernel accounting to include steal time Glauber Costa
2010-08-25 21:43 ` [RFC 5/7] kvm steal time implementation Glauber Costa
2010-08-26 17:23 ` [RFC 4/7] change kernel accounting to include steal time Marcelo Tosatti
@ 2010-08-26 21:19 ` Rik van Riel
2010-08-26 21:39 ` Glauber Costa
2010-08-29 9:59 ` Avi Kivity
3 siblings, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2010-08-26 21:19 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, avi, zamsden, mtosatti
On 08/25/2010 05:43 PM, Glauber Costa wrote:
> This patch proposes a common steal time implementation. When no
> steal time is accounted, we just add a branch to the current
> accounting code, that shouldn't add much overhead.
>
> When we do want to register steal time, we proceed as following:
> - if we would account user or system time in this tick, and there is
> out-of-cpu time registered, we skip it altogether, and account steal
> time only.
> - if we would account user or system time in this tick, and we got the
> cpu for the whole slice, we proceed normaly.
> - if we are idle in this tick, we flush out-of-cpu time to give it the
> chance to update whatever last-measure internal variable it may have.
>
> This approach is simple, but proved to work well for my test scenarios.
> in a UP guest on UP host, with a cpu-hog in both guest and host shows
> ~ 50 % steal time. steal time is also accounted proportionally, if
> nice values are given to the host cpu-hog.
>
> A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> with 100 % idle, as one would expect.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
> ---
> include/linux/sched.h | 1 +
> kernel/sched.c | 29 +++++++++++++++++++++++++++++
> 2 files changed, 30 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0478888..e571ddd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout);
> extern void cpu_init (void);
> extern void trap_init(void);
> extern void update_process_times(int user);
> +extern cputime_t (*hypervisor_steal_time)(void);
> extern void scheduler_tick(void);
>
> extern void sched_show_task(struct task_struct *p);
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f52a880..9695c92 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
> return ns;
> }
>
> +cputime_t (*hypervisor_steal_time)(void) = NULL;
> +
> +static inline cputime_t get_steal_time_from_hypervisor(void)
> +{
> + if (!hypervisor_steal_time)
> + return 0;
> + return hypervisor_steal_time();
> +}
> +
> +
> /*
> * Account user cpu time to a process.
> * @p: the process that the cpu time gets accounted to
> @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
> struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat;
> cputime64_t tmp;
>
> + tmp = get_steal_time_from_hypervisor();
> + if (tmp) {
> + account_steal_time(tmp);
> + return;
> + }
> +
> /* Add user time to process. */
> p->utime = cputime_add(p->utime, cputime);
> p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
I see one problem here.
What if get_steal_time_from_hypervisor() returns a smaller
amount of time than "cputime"?
Would it be better to account tmp as stealtime, and the
difference (cputime - tmp) as user/sys/... time?
--
All rights reversed
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-26 21:19 ` Rik van Riel
@ 2010-08-26 21:39 ` Glauber Costa
0 siblings, 0 replies; 40+ messages in thread
From: Glauber Costa @ 2010-08-26 21:39 UTC (permalink / raw)
To: Rik van Riel; +Cc: kvm, avi, zamsden, mtosatti
On Thu, Aug 26, 2010 at 05:19:23PM -0400, Rik van Riel wrote:
> On 08/25/2010 05:43 PM, Glauber Costa wrote:
> >This patch proposes a common steal time implementation. When no
> >steal time is accounted, we just add a branch to the current
> >accounting code, that shouldn't add much overhead.
> >
> >When we do want to register steal time, we proceed as following:
> >- if we would account user or system time in this tick, and there is
> > out-of-cpu time registered, we skip it altogether, and account steal
> > time only.
> >- if we would account user or system time in this tick, and we got the
> > cpu for the whole slice, we proceed normaly.
> >- if we are idle in this tick, we flush out-of-cpu time to give it the
> > chance to update whatever last-measure internal variable it may have.
> >
> >This approach is simple, but proved to work well for my test scenarios.
> >in a UP guest on UP host, with a cpu-hog in both guest and host shows
> >~ 50 % steal time. steal time is also accounted proportionally, if
> >nice values are given to the host cpu-hog.
> >
> >A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> >with 100 % idle, as one would expect.
> >
> >Signed-off-by: Glauber Costa<glommer@redhat.com>
> >---
> > include/linux/sched.h | 1 +
> > kernel/sched.c | 29 +++++++++++++++++++++++++++++
> > 2 files changed, 30 insertions(+), 0 deletions(-)
> >
> >diff --git a/include/linux/sched.h b/include/linux/sched.h
> >index 0478888..e571ddd 100644
> >--- a/include/linux/sched.h
> >+++ b/include/linux/sched.h
> >@@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout);
> > extern void cpu_init (void);
> > extern void trap_init(void);
> > extern void update_process_times(int user);
> >+extern cputime_t (*hypervisor_steal_time)(void);
> > extern void scheduler_tick(void);
> >
> > extern void sched_show_task(struct task_struct *p);
> >diff --git a/kernel/sched.c b/kernel/sched.c
> >index f52a880..9695c92 100644
> >--- a/kernel/sched.c
> >+++ b/kernel/sched.c
> >@@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
> > return ns;
> > }
> >
> >+cputime_t (*hypervisor_steal_time)(void) = NULL;
> >+
> >+static inline cputime_t get_steal_time_from_hypervisor(void)
> >+{
> >+ if (!hypervisor_steal_time)
> >+ return 0;
> >+ return hypervisor_steal_time();
> >+}
> >+
> >+
> > /*
> > * Account user cpu time to a process.
> > * @p: the process that the cpu time gets accounted to
> >@@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
> > struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat;
> > cputime64_t tmp;
> >
> >+ tmp = get_steal_time_from_hypervisor();
> >+ if (tmp) {
> >+ account_steal_time(tmp);
> >+ return;
> >+ }
> >+
> > /* Add user time to process. */
> > p->utime = cputime_add(p->utime, cputime);
> > p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
>
> I see one problem here.
>
> What if get_steal_time_from_hypervisor() returns a smaller
> amount of time than "cputime"?
>
> Would it be better to account tmp as stealtime, and the
> difference (cputime - tmp) as user/sys/... time?
There is also the case in which tmp is greater than cputime,
but not a multiple of it. In which case, I believe we should
account cputime - (tmp % cputime) as user/sys too.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-25 21:43 ` [RFC 4/7] change kernel accounting to include steal time Glauber Costa
` (2 preceding siblings ...)
2010-08-26 21:19 ` Rik van Riel
@ 2010-08-29 9:59 ` Avi Kivity
2010-08-29 15:13 ` Rik van Riel
2010-08-30 12:42 ` Glauber Costa
3 siblings, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2010-08-29 9:59 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, zamsden, mtosatti, riel
On 08/26/2010 12:43 AM, Glauber Costa wrote:
> This patch proposes a common steal time implementation. When no
> steal time is accounted, we just add a branch to the current
> accounting code, that shouldn't add much overhead.
>
> When we do want to register steal time, we proceed as following:
> - if we would account user or system time in this tick, and there is
> out-of-cpu time registered, we skip it altogether, and account steal
> time only.
> - if we would account user or system time in this tick, and we got the
> cpu for the whole slice, we proceed normaly.
> - if we are idle in this tick, we flush out-of-cpu time to give it the
> chance to update whatever last-measure internal variable it may have.
>
> This approach is simple, but proved to work well for my test scenarios.
> in a UP guest on UP host, with a cpu-hog in both guest and host shows
> ~ 50 % steal time. steal time is also accounted proportionally, if
> nice values are given to the host cpu-hog.
>
> A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> with 100 % idle, as one would expect.
>
The scheduler people and lkml need to be copied on this patch.
Since s390 does steal time (I think?), can this code be shared?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-29 9:59 ` Avi Kivity
@ 2010-08-29 15:13 ` Rik van Riel
2010-08-29 15:25 ` Avi Kivity
2010-08-30 12:42 ` Glauber Costa
1 sibling, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2010-08-29 15:13 UTC (permalink / raw)
To: Avi Kivity; +Cc: Glauber Costa, kvm, zamsden, mtosatti
On 08/29/2010 05:59 AM, Avi Kivity wrote:
> The scheduler people and lkml need to be copied on this patch.
Good idea for the second version of the series.
> Since s390 does steal time (I think?), can this code be shared?
That part already is shared. Glauber's patches reuse the same
code that s390 and Xen use.
--
All rights reversed
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-29 15:13 ` Rik van Riel
@ 2010-08-29 15:25 ` Avi Kivity
2010-08-29 15:42 ` Rik van Riel
0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2010-08-29 15:25 UTC (permalink / raw)
To: Rik van Riel; +Cc: Glauber Costa, kvm, zamsden, mtosatti
On 08/29/2010 06:13 PM, Rik van Riel wrote:
>
>> Since s390 does steal time (I think?), can this code be shared?
>
> That part already is shared. Glauber's patches reuse the same
> code that s390 and Xen use.
>
Why can't we use the same approach as s390 and ppc (calling
account_steal_time from the timer interrupt)? or perhaps those should
be moved into the scheduler proper?
Didn't find the Xen hooks.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-29 15:25 ` Avi Kivity
@ 2010-08-29 15:42 ` Rik van Riel
2010-08-29 15:47 ` Avi Kivity
0 siblings, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2010-08-29 15:42 UTC (permalink / raw)
To: Avi Kivity; +Cc: Glauber Costa, kvm, zamsden, mtosatti
On 08/29/2010 11:25 AM, Avi Kivity wrote:
> On 08/29/2010 06:13 PM, Rik van Riel wrote:
>>
>>> Since s390 does steal time (I think?), can this code be shared?
>>
>> That part already is shared. Glauber's patches reuse the same
>> code that s390 and Xen use.
>>
>
> Why can't we use the same approach as s390 and ppc (calling
> account_steal_time from the timer interrupt)? or perhaps those should be
> moved into the scheduler proper?
Now that is a good question.
> Didn't find the Xen hooks.
I suspect Xen lost them in the paravirt ops port...
--
All rights reversed
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-29 15:42 ` Rik van Riel
@ 2010-08-29 15:47 ` Avi Kivity
0 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2010-08-29 15:47 UTC (permalink / raw)
To: Rik van Riel; +Cc: Glauber Costa, kvm, zamsden, mtosatti
On 08/29/2010 06:42 PM, Rik van Riel wrote:
>> Didn't find the Xen hooks.
>
> I suspect Xen lost them in the paravirt ops port...
>
Glauber: please copy Jeremy on this patchset so we can coordinate things.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-29 9:59 ` Avi Kivity
2010-08-29 15:13 ` Rik van Riel
@ 2010-08-30 12:42 ` Glauber Costa
2010-08-30 13:15 ` Avi Kivity
1 sibling, 1 reply; 40+ messages in thread
From: Glauber Costa @ 2010-08-30 12:42 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, zamsden, mtosatti, riel
On Sun, Aug 29, 2010 at 12:59:36PM +0300, Avi Kivity wrote:
> On 08/26/2010 12:43 AM, Glauber Costa wrote:
> >This patch proposes a common steal time implementation. When no
> >steal time is accounted, we just add a branch to the current
> >accounting code, that shouldn't add much overhead.
> >
> >When we do want to register steal time, we proceed as following:
> >- if we would account user or system time in this tick, and there is
> > out-of-cpu time registered, we skip it altogether, and account steal
> > time only.
> >- if we would account user or system time in this tick, and we got the
> > cpu for the whole slice, we proceed normaly.
> >- if we are idle in this tick, we flush out-of-cpu time to give it the
> > chance to update whatever last-measure internal variable it may have.
> >
> >This approach is simple, but proved to work well for my test scenarios.
> >in a UP guest on UP host, with a cpu-hog in both guest and host shows
> >~ 50 % steal time. steal time is also accounted proportionally, if
> >nice values are given to the host cpu-hog.
> >
> >A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> >with 100 % idle, as one would expect.
> >
>
> The scheduler people and lkml need to be copied on this patch.
>
> Since s390 does steal time (I think?), can this code be shared?
AFAIK, s390 enables CONFIG_VIRT_CPU_ACCOUNTING, so all timings
comes from the hypervisor, and statistical sampling is not involved.
We could do that, if our hardware had any method to say precisely
how much time we spent in each state, which I don't think we do.
So in a summary, s390 is in a totally different ifdef side.
Who should we copy at the scheduler side?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC 4/7] change kernel accounting to include steal time
2010-08-30 12:42 ` Glauber Costa
@ 2010-08-30 13:15 ` Avi Kivity
0 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2010-08-30 13:15 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, zamsden, Marcelo Tosatti, riel
On 08/30/2010 03:42 PM, Glauber Costa wrote:
> On Sun, Aug 29, 2010 at 12:59:36PM +0300, Avi Kivity wrote:
>> On 08/26/2010 12:43 AM, Glauber Costa wrote:
>>> This patch proposes a common steal time implementation. When no
>>> steal time is accounted, we just add a branch to the current
>>> accounting code, that shouldn't add much overhead.
>>>
>>> When we do want to register steal time, we proceed as following:
>>> - if we would account user or system time in this tick, and there is
>>> out-of-cpu time registered, we skip it altogether, and account steal
>>> time only.
>>> - if we would account user or system time in this tick, and we got the
>>> cpu for the whole slice, we proceed normaly.
>>> - if we are idle in this tick, we flush out-of-cpu time to give it the
>>> chance to update whatever last-measure internal variable it may have.
>>>
>>> This approach is simple, but proved to work well for my test scenarios.
>>> in a UP guest on UP host, with a cpu-hog in both guest and host shows
>>> ~ 50 % steal time. steal time is also accounted proportionally, if
>>> nice values are given to the host cpu-hog.
>>>
>>> A cpu-hog in the host with no load in the guest, produces 0 % steal time,
>>> with 100 % idle, as one would expect.
>>>
>> The scheduler people and lkml need to be copied on this patch.
>>
>> Since s390 does steal time (I think?), can this code be shared?
> AFAIK, s390 enables CONFIG_VIRT_CPU_ACCOUNTING, so all timings
> comes from the hypervisor, and statistical sampling is not involved.
Ok. I see ppc does something similar as well (taking care of
user/kernel transitions itself).
> We could do that, if our hardware had any method to say precisely
> how much time we spent in each state, which I don't think we do.
We don't, though I'm sure everyone is wondering why we can't have cheap
accurate global clocks on x86.
> So in a summary, s390 is in a totally different ifdef side.
Yes.
> Who should we copy at the scheduler side?
From MAINTAINERS:
Ingo Molnar <mingo@elte.hu>
Peter Zijlstra <peterz@infradead.org>
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread