From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC v2 4/7] change kernel accounting to include steal time Date: Mon, 30 Aug 2010 12:06:23 -0400 Message-ID: <1283184391-7785-8-git-send-email-glommer@redhat.com> References: <1283184391-7785-1-git-send-email-glommer@redhat.com> <1283184391-7785-2-git-send-email-glommer@redhat.com> <1283184391-7785-3-git-send-email-glommer@redhat.com> <1283184391-7785-4-git-send-email-glommer@redhat.com> <1283184391-7785-5-git-send-email-glommer@redhat.com> <1283184391-7785-6-git-send-email-glommer@redhat.com> <1283184391-7785-7-git-send-email-glommer@redhat.com> Cc: avi@redhat.com, zamsden@redhat.com, mtosatti@redhat.com, riel@redhat.com, peterz@infradead.org, mingo@elte.hu, jeremy@goop.org To: kvm@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:19300 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755899Ab0H3QXx (ORCPT ); Mon, 30 Aug 2010 12:23:53 -0400 In-Reply-To: <1283184391-7785-7-git-send-email-glommer@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: This patch proposes a common steal time implementation. When no steal time is accounted, we just add a branch to the current accounting code, that shouldn't add much overhead. When we do want to register steal time, we proceed as following: - if we would account user or system time in this tick, and there is out-of-cpu time registered, we skip it altogether, and account steal time only. - if we would account user or system time in this tick, and we got the cpu for the whole slice, we proceed normaly. - if we are idle in this tick, we flush out-of-cpu time to give it the chance to update whatever last-measure internal variable it may have. This approach is simple, but proved to work well for my test scenarios. in a UP guest on UP host, with a cpu-hog in both guest and host shows ~ 50 % steal time. steal time is also accounted proportionally, if nice values are given to the host cpu-hog. A cpu-hog in the host with no load in the guest, produces 0 % steal time, with 100 % idle, as one would expect. Signed-off-by: Glauber Costa --- include/linux/sched.h | 1 + kernel/sched.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 0478888..e571ddd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout); extern void cpu_init (void); extern void trap_init(void); extern void update_process_times(int user); +extern cputime_t (*hypervisor_steal_time)(void); extern void scheduler_tick(void); extern void sched_show_task(struct task_struct *p); diff --git a/kernel/sched.c b/kernel/sched.c index f52a880..9695c92 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p) return ns; } +cputime_t (*hypervisor_steal_time)(void) = NULL; + +static inline cputime_t get_steal_time_from_hypervisor(void) +{ + if (!hypervisor_steal_time) + return 0; + return hypervisor_steal_time(); +} + + /* * Account user cpu time to a process. * @p: the process that the cpu time gets accounted to @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime, struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; cputime64_t tmp; + tmp = get_steal_time_from_hypervisor(); + if (tmp) { + account_steal_time(tmp); + return; + } + /* Add user time to process. */ p->utime = cputime_add(p->utime, cputime); p->utimescaled = cputime_add(p->utimescaled, cputime_scaled); @@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int hardirq_offset, return; } + tmp = get_steal_time_from_hypervisor(); + if (tmp) { + account_steal_time(tmp); + return; + } + /* Add system time to process. */ p->stime = cputime_add(p->stime, cputime); p->stimescaled = cputime_add(p->stimescaled, cputime_scaled); @@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime) cputime64_t cputime64 = cputime_to_cputime64(cputime); struct rq *rq = this_rq(); + /* + * if we're idle, we don't account it as steal time, since we did + * not want to run anyway. We do call the steal function, however, to + * give the guest the chance to flush its internal buffers + */ + get_steal_time_from_hypervisor(); + if (atomic_read(&rq->nr_iowait) > 0) cpustat->iowait = cputime64_add(cpustat->iowait, cputime64); else -- 1.6.2.2