From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751466AbXCNToa (ORCPT ); Wed, 14 Mar 2007 15:44:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752129AbXCNToa (ORCPT ); Wed, 14 Mar 2007 15:44:30 -0400 Received: from gw.goop.org ([64.81.55.164]:37470 "EHLO mail.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751466AbXCNTo3 (ORCPT ); Wed, 14 Mar 2007 15:44:29 -0400 Message-ID: <45F8508F.3070109@goop.org> Date: Wed, 14 Mar 2007 12:44:15 -0700 From: Jeremy Fitzhardinge User-Agent: Thunderbird 1.5.0.10 (X11/20070302) MIME-Version: 1.0 To: Daniel Walker CC: john stultz , Andi Kleen , Ingo Molnar , Thomas Gleixner , Con Kolivas , Rusty Russell , Zachary Amsden , James Morris , Chris Wright , Linux Kernel Mailing List , cpufreq@lists.linux.org.uk, Virtualization Mailing List , Peter Chubb Subject: Re: Stolen and degraded time and schedulers References: <45F6D1D0.6080905@goop.org> <1173816769.22180.14.camel@localhost> <45F70A71.9090205@goop.org> <1173821224.1416.24.camel@dwalker1> <45F71EA5.2090203@goop.org> <1173837606.23595.32.camel@imap.mvista.com> <45F79B9C.20609@goop.org> <1173888673.3101.12.camel@imap.mvista.com> <45F824BE.1060708@goop.org> <1173891595.3101.17.camel@imap.mvista.com> <45F82C01.3000704@goop.org> <1173895607.3101.58.camel@imap.mvista.com> <45F841EE.6060703@goop.org> <1173898800.3101.81.camel@imap.mvista.com> In-Reply-To: <1173898800.3101.81.camel@imap.mvista.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Daniel Walker wrote: > sched_clock is used to bank real time against some specific states > inside the scheduler, and no it doesn't _just_ measure a processes > executing time. > Could you point these places out? All uses of sched_clock() that I could see in kernel/sched.c seemed to be related to working out how long something spent executing, either in the scheduler proper, or benchmarking cache characteristics. >> 1. If the cpu is stolen by the hypervisor, the kernel will get no >> state transition notification. It can generally find out that >> some time was stolen after the fact, but there's no specific event >> at the time it happens. >> > > The hypervisor would need to do it's own accounting I'd imagine then > provide that to the scheduler. > Yes. Xen, at least, provides nanosecond resolution information about how long each vcpu spent in its various states. But the question is how this information should be exposed to the scheduler. I could provide a raw dump of the info, but in general the scheduler doesn't care and other hypervisors might not be able to produce the same information. The essential information is "how long did process X actually run on a real CPU"? And that, as far as I can tell, is the question sched_clock() is already designed to answer. >> 2. It doesn't map particularly well to a cpu changing speed. In >> particular if a cpu has continuously varying execution speed >> (Transmeta?), then the best you can hope for is the integration of >> cpu work done over a time period rather than discrete cpu >> speed-change events. >> > > True, but as I said in my original email it's not trivial to follow > physical cpu speed changes since the changes are free form and change > potentially per system. Your better off do it just with the hypervisor > since you can control it .. > No, I'm talking about cpu speed changes as a completely separate case, which is primarily an issue while running a kernel on bare hardware. But it is, in some ways, more complex than running on a hypervisor. There are numerous mechanisms for cpu speed control, some kernel driven, some autonomous, some stepwise, some continuous. I'm arguing that its the cpufreq subsystem's job to keep track of all that detail, but the only information it needs to provide to the scheduler is, again, "how much work did my process get done on the CPU"? J