From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [RFC 4/7] change kernel accounting to include steal time Date: Thu, 26 Aug 2010 14:23:03 -0300 Message-ID: <20100826172303.GB21273@amt.cnet> References: <1282772597-4183-1-git-send-email-glommer@redhat.com> <1282772597-4183-2-git-send-email-glommer@redhat.com> <1282772597-4183-3-git-send-email-glommer@redhat.com> <1282772597-4183-4-git-send-email-glommer@redhat.com> <1282772597-4183-5-git-send-email-glommer@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org, avi@redhat.com, zamsden@redhat.com, riel@redhat.com To: Glauber Costa Return-path: Received: from mx1.redhat.com ([209.132.183.28]:46239 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752168Ab0HZRXa (ORCPT ); Thu, 26 Aug 2010 13:23:30 -0400 Received: from int-mx08.intmail.prod.int.phx2.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o7QHNTj8018928 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 26 Aug 2010 13:23:30 -0400 Content-Disposition: inline In-Reply-To: <1282772597-4183-5-git-send-email-glommer@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Aug 25, 2010 at 05:43:14PM -0400, Glauber Costa wrote: > This patch proposes a common steal time implementation. When no > steal time is accounted, we just add a branch to the current > accounting code, that shouldn't add much overhead. > > When we do want to register steal time, we proceed as following: > - if we would account user or system time in this tick, and there is > out-of-cpu time registered, we skip it altogether, and account steal > time only. > - if we would account user or system time in this tick, and we got the > cpu for the whole slice, we proceed normaly. > - if we are idle in this tick, we flush out-of-cpu time to give it the > chance to update whatever last-measure internal variable it may have. Problem of using sched notifiers is that you don't differentiate whether the vcpu scheduled out by its own (via hlt emulation) or not. Skipping accounting of user/system time whenever there's any stolen time detected probably breaks u/s accounting on non-cpu-hog loads. I suppose steal time should be accounted separately from u/s ticks, as Xen does. + if (delta > 1000UL) + touch_softlockup_watchdog(); + This will break authentic soft lockup detection whenever qemu processing takes more than 1s. > > This approach is simple, but proved to work well for my test scenarios. > in a UP guest on UP host, with a cpu-hog in both guest and host shows > ~ 50 % steal time. steal time is also accounted proportionally, if > nice values are given to the host cpu-hog. > > A cpu-hog in the host with no load in the guest, produces 0 % steal time, > with 100 % idle, as one would expect. > > Signed-off-by: Glauber Costa > --- > include/linux/sched.h | 1 + > kernel/sched.c | 29 +++++++++++++++++++++++++++++ > 2 files changed, 30 insertions(+), 0 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 0478888..e571ddd 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout); > extern void cpu_init (void); > extern void trap_init(void); > extern void update_process_times(int user); > +extern cputime_t (*hypervisor_steal_time)(void); > extern void scheduler_tick(void); > > extern void sched_show_task(struct task_struct *p); > diff --git a/kernel/sched.c b/kernel/sched.c > index f52a880..9695c92 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p) > return ns; > } > > +cputime_t (*hypervisor_steal_time)(void) = NULL; > + > +static inline cputime_t get_steal_time_from_hypervisor(void) > +{ > + if (!hypervisor_steal_time) > + return 0; > + return hypervisor_steal_time(); > +} > + > + > /* > * Account user cpu time to a process. > * @p: the process that the cpu time gets accounted to > @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime, > struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; > cputime64_t tmp; > > + tmp = get_steal_time_from_hypervisor(); > + if (tmp) { > + account_steal_time(tmp); > + return; > + } > + > /* Add user time to process. */ > p->utime = cputime_add(p->utime, cputime); > p->utimescaled = cputime_add(p->utimescaled, cputime_scaled); > @@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int hardirq_offset, > return; > } > > + tmp = get_steal_time_from_hypervisor(); > + if (tmp) { > + account_steal_time(tmp); > + return; > + } > + > /* Add system time to process. */ > p->stime = cputime_add(p->stime, cputime); > p->stimescaled = cputime_add(p->stimescaled, cputime_scaled); > @@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime) > cputime64_t cputime64 = cputime_to_cputime64(cputime); > struct rq *rq = this_rq(); > > + /* > + * if we're idle, we don't account it as steal time, since we did > + * not want to run anyway. We do call the steal function, however, to > + * give the guest the chance to flush its internal buffers > + */ > + get_steal_time_from_hypervisor(); > + > if (atomic_read(&rq->nr_iowait) > 0) > cpustat->iowait = cputime64_add(cpustat->iowait, cputime64); > else > -- > 1.6.2.2 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html