kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Glauber Costa <glommer@redhat.com>
Cc: kvm@vger.kernel.org, avi@redhat.com, zamsden@redhat.com, riel@redhat.com
Subject: Re: [RFC 4/7] change kernel accounting to include steal time
Date: Thu, 26 Aug 2010 14:23:03 -0300	[thread overview]
Message-ID: <20100826172303.GB21273@amt.cnet> (raw)
In-Reply-To: <1282772597-4183-5-git-send-email-glommer@redhat.com>

On Wed, Aug 25, 2010 at 05:43:14PM -0400, Glauber Costa wrote:
> This patch proposes a common steal time implementation. When no
> steal time is accounted, we just add a branch to the current
> accounting code, that shouldn't add much overhead.
> 
> When we do want to register steal time, we proceed as following:
> - if we would account user or system time in this tick, and there is
>   out-of-cpu time registered, we skip it altogether, and account steal
>   time only.
> - if we would account user or system time in this tick, and we got the
>   cpu for the whole slice, we proceed normaly.
> - if we are idle in this tick, we flush out-of-cpu time to give it the
>   chance to update whatever last-measure internal variable it may have.

Problem of using sched notifiers is that you don't differentiate whether
the vcpu scheduled out by its own (via hlt emulation) or not.

Skipping accounting of user/system time whenever there's any stolen
time detected probably breaks u/s accounting on non-cpu-hog loads.

I suppose steal time should be accounted separately from u/s ticks, as
Xen does.

+   if (delta > 1000UL)
+               touch_softlockup_watchdog();
+

This will break authentic soft lockup detection whenever qemu processing
takes more than 1s.

> 
> This approach is simple, but proved to work well for my test scenarios.
> in a UP guest on UP host, with a cpu-hog in both guest and host shows
> ~ 50 % steal time. steal time is also accounted proportionally, if
> nice values are given to the host cpu-hog.
> 
> A cpu-hog in the host with no load in the guest, produces 0 % steal time,
> with 100 % idle, as one would expect.
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  include/linux/sched.h |    1 +
>  kernel/sched.c        |   29 +++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0478888..e571ddd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout);
>  extern void cpu_init (void);
>  extern void trap_init(void);
>  extern void update_process_times(int user);
> +extern cputime_t (*hypervisor_steal_time)(void);
>  extern void scheduler_tick(void);
>  
>  extern void sched_show_task(struct task_struct *p);
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f52a880..9695c92 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
>  	return ns;
>  }
>  
> +cputime_t (*hypervisor_steal_time)(void) = NULL;
> +
> +static inline cputime_t get_steal_time_from_hypervisor(void)
> +{
> +	if (!hypervisor_steal_time)
> +		return 0;
> +	return hypervisor_steal_time();
> +}
> +
> +
>  /*
>   * Account user cpu time to a process.
>   * @p: the process that the cpu time gets accounted to
> @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
>  	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
>  	cputime64_t tmp;
>  
> +	tmp = get_steal_time_from_hypervisor();
> +	if (tmp) {
> +		account_steal_time(tmp);
> +		return;
> +	}
> +
>  	/* Add user time to process. */
>  	p->utime = cputime_add(p->utime, cputime);
>  	p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
> @@ -3234,6 +3250,12 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
>  		return;
>  	}
>  
> +	tmp = get_steal_time_from_hypervisor();
> +	if (tmp) {
> +		account_steal_time(tmp);
> +		return;
> +	}
> +
>  	/* Add system time to process. */
>  	p->stime = cputime_add(p->stime, cputime);
>  	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
> @@ -3276,6 +3298,13 @@ void account_idle_time(cputime_t cputime)
>  	cputime64_t cputime64 = cputime_to_cputime64(cputime);
>  	struct rq *rq = this_rq();
>  
> +	/*
> +	 * if we're idle, we don't account it as steal time, since we did
> +	 * not want to run anyway. We do call the steal function, however, to
> +	 * give the guest the chance to flush its internal buffers
> +	 */
> +	get_steal_time_from_hypervisor();
> +
>  	if (atomic_read(&rq->nr_iowait) > 0)
>  		cpustat->iowait = cputime64_add(cpustat->iowait, cputime64);
>  	else
> -- 
> 1.6.2.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-08-26 17:23 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-25 21:43 [RFC 0/7] KVM steal time implementation Glauber Costa
2010-08-25 21:43 ` [RFC 1/7] Implement getnsboottime kernel API Glauber Costa
2010-08-25 21:43   ` [RFC 2/7] change headers preparing for steal time Glauber Costa
2010-08-25 21:43     ` [RFC 3/7] measure time out of guest Glauber Costa
2010-08-25 21:43       ` [RFC 4/7] change kernel accounting to include steal time Glauber Costa
2010-08-25 21:43         ` [RFC 5/7] kvm steal time implementation Glauber Costa
2010-08-25 21:43           ` [RFC 6/7] touch softlockup watchdog Glauber Costa
2010-08-25 21:43             ` [RFC 7/7] tell guest about steal time feature Glauber Costa
2010-08-26 22:13           ` [RFC 5/7] kvm steal time implementation Rik van Riel
2010-08-26 22:35             ` Glauber Costa
2010-08-26 17:23         ` Marcelo Tosatti [this message]
2010-08-26 20:28           ` [RFC 4/7] change kernel accounting to include steal time Glauber Costa
2010-08-26 20:47             ` Marcelo Tosatti
2010-08-26 21:05               ` Rik van Riel
2010-08-26 21:13               ` Glauber Costa
2010-08-26 21:14             ` Anthony Liguori
2010-08-26 21:40               ` Glauber Costa
2010-08-26 23:12                 ` Marcelo Tosatti
2010-08-27  0:33                   ` Glauber Costa
2010-08-27 15:25                     ` Marcelo Tosatti
2010-08-26 21:19         ` Rik van Riel
2010-08-26 21:39           ` Glauber Costa
2010-08-29  9:59         ` Avi Kivity
2010-08-29 15:13           ` Rik van Riel
2010-08-29 15:25             ` Avi Kivity
2010-08-29 15:42               ` Rik van Riel
2010-08-29 15:47                 ` Avi Kivity
2010-08-30 12:42           ` Glauber Costa
2010-08-30 13:15             ` Avi Kivity
2010-08-26 20:54       ` [RFC 3/7] measure time out of guest Zachary Amsden
2010-08-26 21:14         ` Glauber Costa
2010-08-29  9:53       ` Avi Kivity
2010-08-26 20:44     ` [RFC 2/7] change headers preparing for steal time Zachary Amsden
2010-08-26 21:04       ` Rik van Riel
2010-08-26 21:17         ` Glauber Costa
2010-08-26 22:11           ` Rik van Riel
2010-08-29  9:51     ` Avi Kivity
2010-08-30 12:44       ` Glauber Costa
2010-08-30 13:10         ` Avi Kivity
2010-08-26 19:46   ` [RFC 1/7] Implement getnsboottime kernel API Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100826172303.GB21273@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=zamsden@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).