Re: A question on the credit scheduler

From: gavin  <gbtux@126.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: A question on the credit scheduler
Date: Tue, 20 Dec 2011 15:30:04 +0800 (CST)	[thread overview]
Message-ID: <5097e419.1e4d1.1345a608e69.Coremail.gbtux@126.com> (raw)
In-Reply-To: <CAFLBxZZP=K504L86WDJ-F+9SuTf7aKU_C=i8nLZwqDuejoVGag@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 7111 bytes --]

1) My original goal is to calculate the usage percentage of CPU in a different way from other tools, such as xentrace.

Because the aim of all the scheduling is to map the vCPU to pCPU, we can get a mapping sequence of pCPU and vCPU by monitoring the pCPU in a fixed interval. 

For example, if the SCHEDULE_SOFTIRQ is just raised when the timeslice is finished, the vCPU runs as described in the figure 1 (two pCPU and three vCPU). We monitor all the CPUs once every time t, which is equal to the time of the timeslice here. In ten timeslice (maybe 0.3 second in credit1), we can get the mapping sequence, which contains 20 (pCPU, vCPU) pairs as the following.

(pCPU1, vCPU1), (pCPU2, vCPU2), (pCPU1, vCPU1), (pCPU2, non), (pCPU1, vCPU1), (pCPU2, vCPU3), (pCPU1, vCPU2), (pCPU2, non), (pCPU1, non), (pCPU2, vCPU2), (pCPU1, vCPU3), (pCPU2, non), (pCPU1, vCPU3), (pCPU2, vCPU1), (pCPU1, non), (pCPU2, vCPU1), (pCPU1, vCPU1), (pCPU2, non), (pCPU1, non), (pCPU2, vCPU2)。

If there is no vCPU mapped on the pCPU, we use (pCPU*, non) pair, which means the pCPU is idle. In the above sequence, there are 7 idle pairs in total. 

So, from the above mapping sequence, we can calculate the usage percentage of the CPU in ten timeslcie (0.3 sencond).

Usage Percentage = (20-7)/20 = 65% 

2) If we can get such a mapping sequence of pCPU and vCPU, besides calculating the usage percentage of CPU, maybe we also can find somelaws of the mapping sequence and use the laws to infer the CPU-bound task and I/O-bound task. 

However, this idea may not work. Because the SCHEDULE_SOFTIRQ is not only raised when the timeslice is finished, but also raised in many other situations, we cannot get a regular mapping of vCPU and pCPU just like figure 1. The mapping is irregular possibly shown in figure 2. In the figure 2, we cannot find a very proper time t to monitor the CPUs.

--
Best Regards,
Gavin

At 2011-12-19 18:37:14,"George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>2011/12/17 gavin <gbtux@126.com>:
>>
>> At 2011-12-16 23:58:26,"George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>>>2011/12/16 gavin <gbtux@126.com>:
>>>> At 2011-12-16 19:04:19,"George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>>>>
>>>>>2011/12/16 zhikai <gbtux@126.com>:
>>>>>> Hi All,
>>>>>>
>>>>>> In the credit scheduler, the scheduling decision function csched_schedule()
>>>>>> is called in the schedule function in scheduler.c, such as the following.
>>>>>> next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
>>>>>>
>>>>>> But, how often the csched_schedule() is called and to run? Does this
>>>>>> frequency have something to do with the slice of credit scheduler that is
>>>>>> 30ms?
>>>>>
>>>>>The scheduler runs whenever the SCHEDULE_SOFTIRQ is raised.  If you
>>>>>grep through the source code fro that string, you can find all the
>>>>>places where it's raised.
>>>>>
>>>>>Some examples include:
>>>>>* When the 30ms timeslice is finished
>>>>>* When a sleeping vcpu of higher priority than what's currently running wakes up
>>>>>* When a vcpu blocks
>>>>>* When a vcpu is migrated from one cpu to another
>>>>>
>>>>>30ms is actually a pretty long time; in typical workloads, vcpus block
>>>>>or are preempted by other waking vcpus without using up their full
>>>>>timeslice.
>>>>
>>>> Thank you very much for your reply.
>>>>
>>>> So, the vcpu is very likely to be preempted whenever the SCHEDULE_SOFTIRQ is
>>>> raised.
>>>
>>>It depends; if you have a cpu-burning vcpu running on a cpu all by
>>>itself, then after its 30ms timeslice, Xen will have no one else to
>>>run, and so will let it run again.
>>>
>>>But yes, if there are other vcpus on the runqueue, or the host is
>>>moderately busy, it's likely that SCHEDULE_SOFTIRQ will cause a
>>>context-switch.
>>>
>>>> And we cannot find a small timeslice, such as a(ms), which makes the
>>>> time any vcpu spending on running phase is k*a(ms), k is integer here. There
>>>> is no such a small timeslice. Is it right?
>>>
>>>I'm sorry, I don't really understand your question.  Perhaps if you
>>>told me what you're trying to accomplish?
>>
>> I try to describe my idea as the following clearly. But I really don't know
>> if it will work. Please give me some advice if possible.
>>
>> According to the credit scheduler in Xen, a vCPU can run a 30ms timeslice
>> when it is scheduled on the physical CPU. And, a vCPU with the BOOST
>> priority will preempt the running one and run additional 10ms. So, what I
>> think is if we monitor the physical CPU every 10ms and we can get the
>> mapping information of a physical CPU and a vCPU. And also, we can get the
>> un-mapping information that a physical CPU isn’t mapped to any vCPU. Thus,
>> we can get the CPU usage by calculating the proportion of the mapping
>> information to the total time when we monitored.
>>
>> For example, if we monitor the physical CPUs every 10ms and we can get 100
>> pairs of pCPU and vCPU in a second, such as (pCPU_id, vCPU_id). If there is
>> 60 mapping pairs that the pCPU is mapped to a valid vPCU, and 40 un-mapping
>> pairs that we cannot find the pCPU to be mapped a valid vCPU. So, we can get
>> the usage of the physical CPUs that is 60%.
>>
>> Here, we monitor the physical CPUs every 10ms. We also can monitor them once
>> less than the 10ms interval, such as 1ms interval. Whatever interval we
>> choose, we must make sure no CPU content switch in the interval or the
>> context switch always occur at the edge of interval. Only in this condition,
>> can this idea work.
>>
>> So, I am not sure whether we can find such a time interval that can meet
>> this condition. In other words, whether we can find such a time interval
>> that ensures all the CPU content switch occur at the edge of interval.
>
>You still haven't described exactly what it is you're trying to
>accomplish: what is your end goal?  It seems to be related somehow to
>measuring how busy the system is (i.e., the number of active pcpus and
>idle pcpus); but as I don't know what you want to do with that
>information, I can't tell you the best way to get it.
>
>Regarding a map of pcpus to vcpus, that already exists.  The
>scheduling code will keep track of the currently running vcpu here:
>  per_cpu(schedule_data, pcpu_id).curr
>
>You can see examples of the above structure used in
>xen/common/sched_credit2.c.  If "is_idle(per_cpu(schedule_data,
>pcpu).curr)" is false, then the cpu is running a vcpu; if it is true,
>then the pcpu is idle (although it may be running a tasklet).
>
>Additionally, if all you want is the number of non-idle cpus, the
>credit1 scheduler keeps track of the idle and non-idle cpus in
>prv->idlers.  You could easily use "cpumask_weight(&prv->idlers)" to
>find out how many idle cpus there are at any given time.  If you know
>how many online cpus there are, that will give you the busy-ness of
>the system.
>
>So now that you have this instantaneous percentage, what do you want
>to do with it?
>
> -George
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 13577 bytes --]

[-- Attachment #2: fig1.jpg --]
[-- Type: image/jpeg, Size: 26669 bytes --]

[-- Attachment #3: fig2.jpg --]
[-- Type: image/jpeg, Size: 32119 bytes --]

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel