From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [RFC v2 4/7] change kernel accounting to include steal time Date: Tue, 31 Aug 2010 10:11:49 +0200 Message-ID: <1283242309.1820.1471.camel@laptop> References: <1283184391-7785-1-git-send-email-glommer@redhat.com> <1283184391-7785-2-git-send-email-glommer@redhat.com> <1283184391-7785-3-git-send-email-glommer@redhat.com> <1283184391-7785-4-git-send-email-glommer@redhat.com> <1283184391-7785-5-git-send-email-glommer@redhat.com> <1283184391-7785-6-git-send-email-glommer@redhat.com> <1283184391-7785-7-git-send-email-glommer@redhat.com> <1283184391-7785-8-git-send-email-glommer@redhat.com> <4C7BEA9C.1060605@goop.org> <4C7BFACD.4030409@redhat.com> <4C7C0187.7040401@goop.org> <4C7C03CB.1060700@redhat.com> <1283196005.1820.1340.camel@laptop> <4C7C0A57.2010906@redhat.com> <4C7C3709.3040706@goop.org> <4C7C38BC.1090907@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Cc: Jeremy Fitzhardinge , Glauber Costa , kvm@vger.kernel.org, avi@redhat.com, zamsden@redhat.com, mtosatti@redhat.com, mingo@elte.hu To: Rik van Riel Return-path: Received: from casper.infradead.org ([85.118.1.10]:35917 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756945Ab0HaIME convert rfc822-to-8bit (ORCPT ); Tue, 31 Aug 2010 04:12:04 -0400 In-Reply-To: <4C7C38BC.1090907@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, 2010-08-30 at 19:03 -0400, Rik van Riel wrote: > > > I think it basically comes down to adding "sched_clock_unstolen()" which > > the scheduler can use to measure time a process spends running, and > > sched_clock() for measuring sleep times. In the normal case, > > sched_clock_unstolen() would be the same as sched_clock(). > > That requires the host to export (any time the guest is scheduled > in), the amount of CPU time the VCPU thread has used, and the time > the VCPU was scheduled in. > > Since the VCPU must be running when it is examining these variables, > it can calculate the additional time (since it was last scheduled) > to account to the task, and remember the currently calculated time > in its own per-vcpu variable, so next time it can get a delta again. I think its easier (and sufficient) for the host to tell the guest how long it was _not_ running. That can simply be passed in when you start the vcpu again and doesn't need a fancy communication channel. The guests sched_clock() will measure wall time, the guests sched_clock_stolen() will report the accumulation of these stolen times. Then you can make sched_clock_unstolen() be sched_clock() - sched_clock_stolen(). And like Jeremy said, if you make the sched_fair stuff use sched_clock_unstolen() things should more or less work. The problem with all that is that you'll start to schedule on unstolen time instead of wall-time, which might not give the best results for things like latencies etc.. but I guess that's one of the prices you pay for using virt. Also, like said yesterday, you need some factor in update_cpu_power(), a quick hack might be to add all stolen time to sched_rt_avg_update().