Question about the ability of credit scheduler to handle I/O and CPU intensive VMs

All of lore.kernel.org
 help / color / mirror / Atom feed

* Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
@ 2010-09-13 21:37 Yuehai Xu
  2010-09-13 23:29 ` Jeremy Fitzhardinge
       [not found] ` <AANLkTin9E1m_jFcj4Ak7nB9OxcQynrznpQ_nNPi_U7hN@mail.gmail.com>
  0 siblings, 2 replies; 21+ messages in thread
From: Yuehai Xu @ 2010-09-13 21:37 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, Jeremy Fitzhardinge, yhxu, Keir.Fraser

Hi all,

Even though credit scheduler is the default VM scheduler in XEN, I
don't think it works well in the case of I/O plus CPU intensive cases.
The result below is from a simple test case. Suppose there are two
VMs, both of which has a single VCPU, these two VMs share a single
PCPU.

1. If I run a CPU intensive program in each VM, the percentage of PCPU
for each VCPU is 50% vs. 50%. This makes sense

2. However, If I run a CPU intensive program in a VM while another VM
runs an I/O intensive program, the percentage is : 83% vs. 17%.
Actually, the VM which only runs I/O intensive program needs little
CPU, but still, almost 17% of PCPU is occupied by this VM. The
throughput of I/O is 104MB/s, which is the peak throughput of my hard
disk.

3. Now, one VM runs a CPU intensive program while the other VM runs
CPU + I/O intensive program, the percentage of CPU is : 50% vs. 50%.
However, the I/O throughput is just 53MB/s, this doesn't make any
sense, only 50% of I/O bandwidth is used for VM.

4. Last case, both the two VMs only run a single I/O intensive
program, the throughput of I/O reaches to almost 100MB/s

The document http://www.xen.org/files/xensummit_intel09/George_Dunlap.pdf
has explained why credit scheduler doesn't work well for CPU+I/O
intensive workload, however, since nothing seems happened after this
short paper, at least the performance of I/O remains poor. Is it
because of some technical issues?

I know when a domU has some I/O events, it need to be waked up
first(put it into BOOST queue), however, as long as its VCPU is not
idle, it will be put back into UNDER queue, which prevents this domU
to be scheduled immediately. Is it possible that when a domU has an
I/O event, the scheduler gives it a very short period of time, say,
50us, to make sure that it can be scheduled at once? In that way, the
latency of I/O should be lowed down.

Any ideas?

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-09-13 21:37 Question about the ability of credit scheduler to handle I/O and CPU intensive VMs Yuehai Xu
@ 2010-09-13 23:29 ` Jeremy Fitzhardinge
  2010-09-14  1:38   ` Yuehai Xu
       [not found] ` <AANLkTin9E1m_jFcj4Ak7nB9OxcQynrznpQ_nNPi_U7hN@mail.gmail.com>
  1 sibling, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2010-09-13 23:29 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: George.Dunlap, xen-devel, yhxu, Keir.Fraser

 On 09/13/2010 02:37 PM, Yuehai Xu wrote:
> The document http://www.xen.org/files/xensummit_intel09/George_Dunlap.pdf
> has explained why credit scheduler doesn't work well for CPU+I/O
> intensive workload, however, since nothing seems happened after this
> short paper, at least the performance of I/O remains poor. Is it
> because of some technical issues?

Have you tried the credit2 scheduler?

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-09-13 23:29 ` Jeremy Fitzhardinge
@ 2010-09-14  1:38   ` Yuehai Xu
  0 siblings, 0 replies; 21+ messages in thread
From: Yuehai Xu @ 2010-09-14  1:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: George.Dunlap, xen-devel, yhxu, Keir.Fraser

On Mon, Sep 13, 2010 at 7:29 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>  On 09/13/2010 02:37 PM, Yuehai Xu wrote:
>> The document http://www.xen.org/files/xensummit_intel09/George_Dunlap.pdf
>> has explained why credit scheduler doesn't work well for CPU+I/O
>> intensive workload, however, since nothing seems happened after this
>> short paper, at least the performance of I/O remains poor. Is it
>> because of some technical issues?
>
> Have you tried the credit2 scheduler?
>
>    J
>
I am afraid when I set the scheduler to credit2, my computer can't
boot successfully, it seems something wrong with credit2.

Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
       [not found] ` <AANLkTin9E1m_jFcj4Ak7nB9OxcQynrznpQ_nNPi_U7hN@mail.gmail.com>
@ 2010-09-14 14:58   ` Yuehai Xu
  2010-09-30 12:28   ` Yuehai Xu
  1 sibling, 0 replies; 21+ messages in thread
From: Yuehai Xu @ 2010-09-14 14:58 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

> I agree, letting a VM with an interrupt run for a short period of time
> makes sense.  The challenge is to make sure that it can't simply send
> itself interrupts every 50us and get to run 100% of the time. :-)

Do you mean that the real time for a VM to have PCPU is quite
uncertain in the case of 50us? I am not familiar with the code
structure, however, as I remembered, the guest kernel should schedule
I/O related process as soon as the completion of its request. For
example, CFS uses vruntime(the time to have CPU) to select the most
suitable process to schedule, and it is highly possible that I/O
process has less vruntime, which means it should preempt other current
process, and be scheduled immediately.

So, if we try to give a very short period of time, 50us for example,
even only 1/10 has been used, in the guest kernel, the I/O process in
VM should be scheduled and it can continuously dispatch requests, in
that way, the problem of I/O latency might be solved.

I am afraid I don't quite understand "..get to run 100% of the time",
I have tried credit2, but my computer can't boot successfully. I know
it is very very hard to debug the code since capture the error log is
difficult, or by remote serial console? I don't try it yet.

Once we give a very short period of time to VM which is waken up by
I/O event, is it possible that the problem of I/O latency be solved?
or the overhead for this frequent interrupt is too high to do this?
Or, no one has tried to test this idea because of difficult debug?

I appreciate your replay.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
       [not found] ` <AANLkTin9E1m_jFcj4Ak7nB9OxcQynrznpQ_nNPi_U7hN@mail.gmail.com>
  2010-09-14 14:58   ` Yuehai Xu
@ 2010-09-30 12:28   ` Yuehai Xu
  2010-09-30 13:27     ` George Dunlap
  1 sibling, 1 reply; 21+ messages in thread
From: Yuehai Xu @ 2010-09-30 12:28 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

On Tue, Sep 14, 2010 at 5:22 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> Credit2 development is mostly stalled; I've just got too many other
> things to do at the moment.  If you know someone good at hypervisor
> development that wants to move to Cambridge to help me out, I think we
> have some open positions... :-)
>
> The problem you describe, which I call the "mixed workload" problem,
> is something that I'd like to try to solve with credit2.  The actual
> problem with credit1, at the moment, is that when a vcpu is scheduled
> to run, it can always run for 30ms if it wants to.  So if it's a CPU
> burner, in order to give it 50%, you have to keep it from running for
> 30ms before letting it run for 30ms again.
>
> I agree, letting a VM with an interrupt run for a short period of time
> makes sense.  The challenge is to make sure that it can't simply send
> itself interrupts every 50us and get to run 100% of the time. :-)

I am afraid I don't really understand the challenge is, or, in another
word, this method is good principally, but in practice, it is hard to
implement? As I know, the OS should always schedules I/O related
processes once they are in runnable queue, so, as long as we give even
a very short period of time to the waken up guest VM, the I/O process
in it should be scheduled at once. In that case, this problem should
be solved. Of course, I don't do experiments, saying is always much
easier than doing.

Thanks,
Yuehai
>
> I don't have time to work on this right now, but if you work up some
> patches, I can give you feedback.  Be advised, that getting this stuff
> to work right is not easy.
>
>  -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-09-30 12:28   ` Yuehai Xu
@ 2010-09-30 13:27     ` George Dunlap
  2010-10-05  2:52       ` Yuehai Xu
  2010-10-05  4:30       ` question about lineat pagetable and mfn_x strongerwill
  0 siblings, 2 replies; 21+ messages in thread
From: George Dunlap @ 2010-09-30 13:27 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

On Thu, Sep 30, 2010 at 1:28 PM, Yuehai Xu <yuehaixu@gmail.com> wrote:
>> I agree, letting a VM with an interrupt run for a short period of time
>> makes sense.  The challenge is to make sure that it can't simply send
>> itself interrupts every 50us and get to run 100% of the time. :-)
>
> I am afraid I don't really understand the challenge is, or, in another
> word, this method is good principally, but in practice, it is hard to
> implement? As I know, the OS should always schedules I/O related
> processes once they are in runnable queue, so, as long as we give even
> a very short period of time to the waken up guest VM, the I/O process
> in it should be scheduled at once. In that case, this problem should
> be solved. Of course, I don't do experiments, saying is always much
> easier than doing.

What I mean is that you have to be careful when implementing it.  A
very simple implementation would look like this:
* Normally, let the VM with the highest credits run.  However, if a VM
is sent an interrupt, give it priority to run for 50us.

Now, suppose, however, that a rogue VM sets up a periodic timer to
send itself an interrupt every 55us.  Then it will get an interrupt,
get priority for 50us, be preempted for 5us, and then get another
interrupt, allowing it to run for another 50us.    Thus it runs 90% of
the time, even though it should only run (for example) 50% of the
time.

We need a way to balance interrupt latency (how long after an
interrupt is raised before a VM can run) and cpu scheduling fairness.
That means that if we let a VM run for 50us, and then preempt it, and
it gets an interrupt 5us later, we need a way to know not to schedule
it until it's been off the cpu for a reasonable amount of time.  It's
possible, but it will take some experimentation to see what the best
option is.

 -George

>
> Thanks,
> Yuehai
>>
>> I don't have time to work on this right now, but if you work up some
>> patches, I can give you feedback.  Be advised, that getting this stuff
>> to work right is not easy.
>>
>>  -George
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-09-30 13:27     ` George Dunlap
@ 2010-10-05  2:52       ` Yuehai Xu
  2010-10-05 14:16         ` George Dunlap
  2010-10-05  4:30       ` question about lineat pagetable and mfn_x strongerwill
  1 sibling, 1 reply; 21+ messages in thread
From: Yuehai Xu @ 2010-10-05  2:52 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

On Thu, Sep 30, 2010 at 9:27 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Thu, Sep 30, 2010 at 1:28 PM, Yuehai Xu <yuehaixu@gmail.com> wrote:
>>> I agree, letting a VM with an interrupt run for a short period of time
>>> makes sense.  The challenge is to make sure that it can't simply send
>>> itself interrupts every 50us and get to run 100% of the time. :-)
>>
>> I am afraid I don't really understand the challenge is, or, in another
>> word, this method is good principally, but in practice, it is hard to
>> implement? As I know, the OS should always schedules I/O related
>> processes once they are in runnable queue, so, as long as we give even
>> a very short period of time to the waken up guest VM, the I/O process
>> in it should be scheduled at once. In that case, this problem should
>> be solved. Of course, I don't do experiments, saying is always much
>> easier than doing.
>
> What I mean is that you have to be careful when implementing it.  A
> very simple implementation would look like this:
> * Normally, let the VM with the highest credits run.  However, if a VM
> is sent an interrupt, give it priority to run for 50us.
>
> Now, suppose, however, that a rogue VM sets up a periodic timer to
> send itself an interrupt every 55us.  Then it will get an interrupt,
> get priority for 50us, be preempted for 5us, and then get another
> interrupt, allowing it to run for another 50us.    Thus it runs 90% of
> the time, even though it should only run (for example) 50% of the
> time.
>
> We need a way to balance interrupt latency (how long after an
> interrupt is raised before a VM can run) and cpu scheduling fairness.
> That means that if we let a VM run for 50us, and then preempt it, and
> it gets an interrupt 5us later, we need a way to know not to schedule
> it until it's been off the cpu for a reasonable amount of time.  It's
> possible, but it will take some experimentation to see what the best
> option is.
>
>  -George
>

I'd like try to implement this idea to XEN, even though I am not sure
whether I can do it since I am not an expert. :-D.

The first step for me is to write a very simple scheduler without
considering CPU fairness, I/O performance, etc. Its mechanism is very
simple,
the selection of next VCPU is based on the algorithm of round robin.
The current VCPU is always inserted into the tail of the list while
the next
VCPU of the head is selected to be scheduled. The current test code is
basing on credit scheduler of XEN 4.0.1-rc6-pre, except that I delete
all
the component of credit calculation related, the tick of every 10ms,
30ms is also deleted. The time for the next VCPU which is selected is
set to 30ms.

Here, my pre-assumption is that Dom0 pins to PCPU0, while other DomU
pins to PCPU1 for simplicity.

However, some problems puzzle me a lot. When I start two DomU which
shares PCPU1, and in both of which I run a CPU intensive program,
the trace log from xenalyze is below(I modify some code so that the
format is different from the original):
...
<  0.399300204 -x d1v0> (dom: 1) --> (dom: 2) vruntime : 30000802)
<  0.424239058 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 30000582)
<  0.449177708 -x d1v0> (dom: 1) --> (dom: 2) vruntime : 30000336)
<  0.474116762 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 30000827)
<  0.499055641 -x d1v0> (dom: 1) --> (dom: 2) vruntime : 30000596)
<  0.523972987 |x d2v0> (dom: 2) --> (dom: 1) vruntime : 30001301)
<  0.548911095 -x d1v0> (dom: 1) --> (dom: 2) vruntime : 29999684)
...
I think these results make sense since every domU is using almost 30ms of PCPU1

However, I stop one of the CPU intensive program in a DomU while keep
the other running, the results are:
.....
<  0.327815345 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 1542607)
<  0.327906620 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 109521)
<  0.344349033 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 19779544)
<  0.344377129 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 33528)
<  0.344570662 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 232540)
<  0.344643933 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 87857)
<  0.345009170 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 439081)
<  0.345034387 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 30059)
<  0.369973183 -x d1v0> (dom: 1) --> (dom: 1) vruntime : 30000506)
<  0.392423279 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 27006658)
....

Here I am gotten confusing, since my algorithm of scheduling is very
simple, every VM should have 30ms of PCPU, however, from the results,
the time for
each VCPU to have PCPU is quite unstable. I think somewhere, the
routine of schedule() should be invoked frequently, and from xentop,
the VM with CPU
intensive occupies PCPU almost at 97%.


Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* question about lineat pagetable and mfn_x
  2010-09-30 13:27     ` George Dunlap
  2010-10-05  2:52       ` Yuehai Xu
@ 2010-10-05  4:30       ` strongerwill
  1 sibling, 0 replies; 21+ messages in thread
From: strongerwill @ 2010-10-05  4:30 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 644 bytes --]

Hi fellow Xen developers,
   When I read the source code of Xen, I find some data structure is hard to understand, such as the linear pagetable and mfn_x. 
    Let's consider the setting Xen 4.0.0 and 32bits, PAE-enable PV domain, how the linear pagetable shares the same machine address with current process's pagetable? I have read some documents about the implementation of linear pagetable, but I still do not well understand it.
    Another question is about mfn_x. I even cannot find out its definition:(.  I still want to know the structures involved in the mfn_x. Is it a table or...?
 
Cheers,

Yueqiang


2010-10-05 



strongerwill 

[-- Attachment #1.2: Type: text/html, Size: 5593 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-05  2:52       ` Yuehai Xu
@ 2010-10-05 14:16         ` George Dunlap
  2010-10-05 14:56           ` Yuehai Xu
  0 siblings, 1 reply; 21+ messages in thread
From: George Dunlap @ 2010-10-05 14:16 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

On Tue, Oct 5, 2010 at 3:52 AM, Yuehai Xu <yuehaixu@gmail.com> wrote:
> However, I stop one of the CPU intensive program in a DomU while keep
> the other running, the results are:
> .....
> <  0.327815345 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 1542607)
> <  0.327906620 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 109521)
> <  0.344349033 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 19779544)
> <  0.344377129 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 33528)
> <  0.344570662 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 232540)
> <  0.344643933 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 87857)
> <  0.345009170 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 439081)
> <  0.345034387 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 30059)
> <  0.369973183 -x d1v0> (dom: 1) --> (dom: 1) vruntime : 30000506)
> <  0.392423279 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 27006658)
> ....
>
> Here I am gotten confusing, since my algorithm of scheduling is very
> simple, every VM should have 30ms of PCPU, however, from the results,
> the time for
> each VCPU to have PCPU is quite unstable. I think somewhere, the
> routine of schedule() should be invoked frequently, and from xentop,
> the VM with CPU
> intensive occupies PCPU almost at 97%.

Idle VMs are never 100% idle; there are a lot of "maintenance" tasks
still to be done.

Your mostly-idle domain (domain 2) seems to be running for pretty
short periods of time -- less than 100us.  That's pretty reasonable.

The question to ask is, what happens when domain 2 wakes up -- is it
put on the runqueue, waiting for the running domain to finish its
timeslice?  Or is it run immediately?  Given the trace you give here,
I would guess that it's run immediately.  If that's not what you
expect, you need to figure out why it's doing that and make it do
something else. :-)

 -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-05 14:16         ` George Dunlap
@ 2010-10-05 14:56           ` Yuehai Xu
  2010-10-05 15:02             ` George Dunlap
  0 siblings, 1 reply; 21+ messages in thread
From: Yuehai Xu @ 2010-10-05 14:56 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

On Tue, Oct 5, 2010 at 10:16 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Tue, Oct 5, 2010 at 3:52 AM, Yuehai Xu <yuehaixu@gmail.com> wrote:
>> However, I stop one of the CPU intensive program in a DomU while keep
>> the other running, the results are:
>> .....
>> <  0.327815345 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 1542607)
>> <  0.327906620 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 109521)
>> <  0.344349033 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 19779544)
>> <  0.344377129 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 33528)
>> <  0.344570662 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 232540)
>> <  0.344643933 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 87857)
>> <  0.345009170 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 439081)
>> <  0.345034387 -x d2v0> (dom: 2) --> (dom: 1) vruntime : 30059)
>> <  0.369973183 -x d1v0> (dom: 1) --> (dom: 1) vruntime : 30000506)
>> <  0.392423279 |x d1v0> (dom: 1) --> (dom: 2) vruntime : 27006658)
>> ....
>>
>> Here I am gotten confusing, since my algorithm of scheduling is very
>> simple, every VM should have 30ms of PCPU, however, from the results,
>> the time for
>> each VCPU to have PCPU is quite unstable. I think somewhere, the
>> routine of schedule() should be invoked frequently, and from xentop,
>> the VM with CPU
>> intensive occupies PCPU almost at 97%.
>
> Idle VMs are never 100% idle; there are a lot of "maintenance" tasks
> still to be done.
>
> Your mostly-idle domain (domain 2) seems to be running for pretty
> short periods of time -- less than 100us.  That's pretty reasonable.
>
> The question to ask is, what happens when domain 2 wakes up -- is it
> put on the runqueue, waiting for the running domain to finish its
> timeslice?  Or is it run immediately?  Given the trace you give here,
> I would guess that it's run immediately.  If that's not what you
> expect, you need to figure out why it's doing that and make it do
> something else. :-)

Yes, you are right, when domain 2 is waken up, it will be inserted to
the head of the scheduling list, and then, run immediately. This is
what I expect. I am sorry that I don't express the question clearly.

You have said that the mostly-idle domain should be running for a
pretty short periods of time, however, according to my understanding,
it is the scheduler to decide how long a domain should run. Since this
time is a fixed value, that is 30ms in my scheduler. What I am puzzled
is that why the running time for this mostly-idle domain is so short.
Is it because this domain idle? Or this domain is actually put into
other lists instead of runnable? is there other places that affect how
to schedule the VMs except such as csched_credit.c?

I know in the kernel of Linux, when a process is stocked because of
I/O, it will be deleted from runnable queue, so that the scheduler of
CPU can select next runnable process immediately. However, I thought
this was different from the scheduling of XEN. Since the scheduler
didn't really know whether the VCPU was consuming PCPU, it just
provided a certain period of time to the VM. I might be wrong. If it
is true, even a most idle VM should always consumes as the same PCPU
time as the busy one  in my scheduler. But the result is opposite. The
idle VM consumes much less PCPU then the busy one. This should not be
determined by the scheduling itself, otherwise, the idle one should
also have 50% PCPU. Then, what mechanism cause this result?

I really appreciate your help.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-05 14:56           ` Yuehai Xu
@ 2010-10-05 15:02             ` George Dunlap
  2010-10-07 22:18               ` Yuehai Xu
  0 siblings, 1 reply; 21+ messages in thread
From: George Dunlap @ 2010-10-05 15:02 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

On Tue, Oct 5, 2010 at 3:56 PM, Yuehai Xu <yuehaixu@gmail.com> wrote:
> I know in the kernel of Linux, when a process is stocked because of
> I/O, it will be deleted from runnable queue, so that the scheduler of
> CPU can select next runnable process immediately. However, I thought
> this was different from the scheduling of XEN. Since the scheduler
> didn't really know whether the VCPU was consuming PCPU, it just
> provided a certain period of time to the VM. I might be wrong. If it
> is true, even a most idle VM should always consumes as the same PCPU
> time as the busy one  in my scheduler. But the result is opposite. The
> idle VM consumes much less PCPU then the busy one. This should not be
> determined by the scheduling itself, otherwise, the idle one should
> also have 50% PCPU. Then, what mechanism cause this result?

Your understanding of Xen is not correct.  In Xen, the VM itself will
initiate blocking if there is nothing for it to do.  PV domains call
SCHED_OP_block(), which will cause the VM to block until it is woken
by an event channel; and HVM domains will execute the HLT instruction,
which will cause the VM to block until it is woken by an interrupt.

If you do a more complete trace (i.e., "xentrace -e all") and look at
the results with xenalyze, you'll see dom2 making a sched_op
hypercall, then transitioning from RUNSTATE_running to
RUNSTATE_blocked.

 -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-05 15:02             ` George Dunlap
@ 2010-10-07 22:18               ` Yuehai Xu
  2010-10-08  0:25                 ` Yuehai Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Yuehai Xu @ 2010-10-07 22:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

> Your understanding of Xen is not correct.  In Xen, the VM itself will
> initiate blocking if there is nothing for it to do.  PV domains call
> SCHED_OP_block(), which will cause the VM to block until it is woken
> by an event channel; and HVM domains will execute the HLT instruction,
> which will cause the VM to block until it is woken by an interrupt.
>
> If you do a more complete trace (i.e., "xentrace -e all") and look at
> the results with xenalyze, you'll see dom2 making a sched_op
> hypercall, then transitioning from RUNSTATE_running to
> RUNSTATE_blocked.
>

I originally considered that when a Dom has an I/O event, its VCPU
would be waken up, in another word, csched_vcpu_wake(struct vcpu *vc)
should be invoked. However, I find I am definitely wrong. As long as
there is a CPU intensive program running in a Dom, this Dom should
never be in a state of "sleep"? In another word, it should never be
waken up?

In such case, suppose there is an I/O event for a VM, how can I insert
this VM schedule entity into the head of the link list and make
schedule() work immediately? I tried in the function of
csched_vcpu_wake, but since this VM also has CPU intensive program,
the throughput of I/O makes no difference.

I feel that in order to make VM that has I/O scheduled as soon as
possible(without considering CPU fairness), other files should be
modified besides csched_credit.c. But where it is?

I really appreciate your help.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-07 22:18               ` Yuehai Xu
@ 2010-10-08  0:25                 ` Yuehai Xu
  2010-10-08  9:57                   ` George Dunlap
  0 siblings, 1 reply; 21+ messages in thread
From: Yuehai Xu @ 2010-10-08  0:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

> I originally considered that when a Dom has an I/O event, its VCPU
> would be waken up, in another word, csched_vcpu_wake(struct vcpu *vc)
> should be invoked. However, I find I am definitely wrong. As long as
> there is a CPU intensive program running in a Dom, this Dom should
> never be in a state of "sleep"? In another word, it should never be
> waken up?
>

The trace result from xenalyze confirms that when a VM has a running
CPU intensive program, it never needs to be waken up. So, my question
is, how can I schedule a VM that has I/O event immediately even this
VM is CPU intensive? I think it is impossible to implement it in the
function csched_vcpu_wake.

Also, is it possible to trace the I/O procedure by xenalyze? I notice
even the macro TRC_HVM_IO_READ is defined, I don't find it is used in
anywhere.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-08  0:25                 ` Yuehai Xu
@ 2010-10-08  9:57                   ` George Dunlap
  2010-10-08 10:03                     ` George Dunlap
  2010-10-10  4:08                     ` Yuehai Xu
  0 siblings, 2 replies; 21+ messages in thread
From: George Dunlap @ 2010-10-08  9:57 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

The file you're looking for re getting an event channel is
xen/common/schedule.c, the common scheduling code.

In schedule.c, vcpu_wake() will always call the scheduler wake()
function;  However, event channels and other "you have an event"
functions call call vcpu_unblock() instead, which will check the
vcpu's VPF_blocked flag and only call vcpu_wake if it was set.

You could try changing vcpu_unblock() to always call vcpu_wake() for
your experimental development; I'm not sure what side-effects this
would have.

Regarding tracing IO_READs: The code that did IO was refactored some
time back; I remember trying to figure out how to cleanly do the
tracing, but finding it difficult.  I can't remember now, though, what
the issue was.  Let me take a look again.

 -George

On Fri, Oct 8, 2010 at 1:25 AM, Yuehai Xu <yuehaixu@gmail.com> wrote:
>> I originally considered that when a Dom has an I/O event, its VCPU
>> would be waken up, in another word, csched_vcpu_wake(struct vcpu *vc)
>> should be invoked. However, I find I am definitely wrong. As long as
>> there is a CPU intensive program running in a Dom, this Dom should
>> never be in a state of "sleep"? In another word, it should never be
>> waken up?
>>
>
> The trace result from xenalyze confirms that when a VM has a running
> CPU intensive program, it never needs to be waken up. So, my question
> is, how can I schedule a VM that has I/O event immediately even this
> VM is CPU intensive? I think it is impossible to implement it in the
> function csched_vcpu_wake.
>
> Also, is it possible to trace the I/O procedure by xenalyze? I notice
> even the macro TRC_HVM_IO_READ is defined, I don't find it is used in
> anywhere.
>
> Thanks,
> Yuehai
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-08  9:57                   ` George Dunlap
@ 2010-10-08 10:03                     ` George Dunlap
  2010-10-08 10:11                       ` George Dunlap
  2010-10-10  4:08                     ` Yuehai Xu
  1 sibling, 1 reply; 21+ messages in thread
From: George Dunlap @ 2010-10-08 10:03 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

On Fri, Oct 8, 2010 at 10:57 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> Regarding tracing IO_READs: The code that did IO was refactored some
> time back; I remember trying to figure out how to cleanly do the
> tracing, but finding it difficult.  I can't remember now, though, what
> the issue was.  Let me take a look again.

Ah, that's right -- IO_READ was replaced with
IO{PORT,MEM}_{READ_WRITE}.  It's traced in
xen/arch/x86/hvm/emulate.c:hvmtrace_io_assist().  xenalyze simply
hasn't been updated to understand those traces yet.

 -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-08 10:03                     ` George Dunlap
@ 2010-10-08 10:11                       ` George Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: George Dunlap @ 2010-10-08 10:11 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

OK, scratch that... the #defines were renamed, but the actual values
remained the same; so the binaries are compatible.

The results show up in the xenalyze output as "mmio_assist" (for MMIO)
and "io [read|write]" (for PIO).

 -George

On Fri, Oct 8, 2010 at 11:03 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Fri, Oct 8, 2010 at 10:57 AM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>> Regarding tracing IO_READs: The code that did IO was refactored some
>> time back; I remember trying to figure out how to cleanly do the
>> tracing, but finding it difficult.  I can't remember now, though, what
>> the issue was.  Let me take a look again.
>
> Ah, that's right -- IO_READ was replaced with
> IO{PORT,MEM}_{READ_WRITE}.  It's traced in
> xen/arch/x86/hvm/emulate.c:hvmtrace_io_assist().  xenalyze simply
> hasn't been updated to understand those traces yet.
>
>  -George
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-08  9:57                   ` George Dunlap
  2010-10-08 10:03                     ` George Dunlap
@ 2010-10-10  4:08                     ` Yuehai Xu
  2010-10-10  8:30                       ` cendhu
  2010-10-11 11:05                       ` George Dunlap
  1 sibling, 2 replies; 21+ messages in thread
From: Yuehai Xu @ 2010-10-10  4:08 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

> The file you're looking for re getting an event channel is
> xen/common/schedule.c, the common scheduling code.
>
> In schedule.c, vcpu_wake() will always call the scheduler wake()
> function;  However, event channels and other "you have an event"
> functions call call vcpu_unblock() instead, which will check the
> vcpu's VPF_blocked flag and only call vcpu_wake if it was set.
>
> You could try changing vcpu_unblock() to always call vcpu_wake() for
> your experimental development; I'm not sure what side-effects this
> would have.

After changing vcpu_unblock() to always call vcpu_wake(), the VM that
has event can be scheduled immediately no matter whether it is CPU
intensive. However, I have another question. Except the I/O event, it
seems there are many other events too. Our design is to give a VM a
very short period of time when it has "I/O event", and right now,
vcpu_wake() is invoked when an event comes, even it is not "I/O
event", this will cause that the VM is scheduled much more frequently
than what I except.

For example, suppose 2 VMs, one is CPU intensive and another is CPU +
I/O intensive, from the level of scheduler, almost the same number of
events are received from the two VMs. Even I/O itself creates event,
since there are other events, the total number of events are almost
the same. In such case, I think we need to differentiate the I/O
events from other events.

I add trace point to __run_tickle() and notice the result that the
number of events are almost the same from two VMs, one of which is CPU
intensive and the other is CPU + I/O intensive. Although I do not
completely confirm what I have said currently, I need do more
experiments.

Is it possible for me to detect the "I/O event" from "event" so that I
can give VM that has "I/O event" the priority to be scheduled
immediately?

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-10  4:08                     ` Yuehai Xu
@ 2010-10-10  8:30                       ` cendhu
  2010-10-11 11:05                       ` George Dunlap
  1 sibling, 0 replies; 21+ messages in thread
From: cendhu @ 2010-10-10  8:30 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: George Dunlap, xen-devel, yhxu


[-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --]

Similar work has been done already (but not for the credit scheduler)...

Have a look at this paper
http://csl.cse.psu.edu/publications/vee07.pdf



On Sun, Oct 10, 2010 at 9:38 AM, Yuehai Xu <yuehaixu@gmail.com> wrote:

> > The file you're looking for re getting an event channel is
> > xen/common/schedule.c, the common scheduling code.
> >
> > In schedule.c, vcpu_wake() will always call the scheduler wake()
> > function;  However, event channels and other "you have an event"
> > functions call call vcpu_unblock() instead, which will check the
> > vcpu's VPF_blocked flag and only call vcpu_wake if it was set.
> >
> > You could try changing vcpu_unblock() to always call vcpu_wake() for
> > your experimental development; I'm not sure what side-effects this
> > would have.
>
> After changing vcpu_unblock() to always call vcpu_wake(), the VM that
> has event can be scheduled immediately no matter whether it is CPU
> intensive. However, I have another question. Except the I/O event, it
> seems there are many other events too. Our design is to give a VM a
> very short period of time when it has "I/O event", and right now,
> vcpu_wake() is invoked when an event comes, even it is not "I/O
> event", this will cause that the VM is scheduled much more frequently
> than what I except.
>
> For example, suppose 2 VMs, one is CPU intensive and another is CPU +
> I/O intensive, from the level of scheduler, almost the same number of
> events are received from the two VMs. Even I/O itself creates event,
> since there are other events, the total number of events are almost
> the same. In such case, I think we need to differentiate the I/O
> events from other events.
>
> I add trace point to __run_tickle() and notice the result that the
> number of events are almost the same from two VMs, one of which is CPU
> intensive and the other is CPU + I/O intensive. Although I do not
> completely confirm what I have said currently, I need do more
> experiments.
>
> Is it possible for me to detect the "I/O event" from "event" so that I
> can give VM that has "I/O event" the priority to be scheduled
> immediately?
>
> Thanks,
> Yuehai
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 3078 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-10  4:08                     ` Yuehai Xu
  2010-10-10  8:30                       ` cendhu
@ 2010-10-11 11:05                       ` George Dunlap
  2010-10-12 12:42                         ` Yuehai Xu
  1 sibling, 1 reply; 21+ messages in thread
From: George Dunlap @ 2010-10-11 11:05 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

On Sun, Oct 10, 2010 at 5:08 AM, Yuehai Xu <yuehaixu@gmail.com> wrote:
> After changing vcpu_unblock() to always call vcpu_wake(), the VM that
> has event can be scheduled immediately no matter whether it is CPU
> intensive. However, I have another question. Except the I/O event, it
> seems there are many other events too. Our design is to give a VM a
> very short period of time when it has "I/O event", and right now,
> vcpu_wake() is invoked when an event comes, even it is not "I/O
> event", this will cause that the VM is scheduled much more frequently
> than what I except.
>
> For example, suppose 2 VMs, one is CPU intensive and another is CPU +
> I/O intensive, from the level of scheduler, almost the same number of
> events are received from the two VMs. Even I/O itself creates event,
> since there are other events, the total number of events are almost
> the same. In such case, I think we need to differentiate the I/O
> events from other events.
>
> I add trace point to __run_tickle() and notice the result that the
> number of events are almost the same from two VMs, one of which is CPU
> intensive and the other is CPU + I/O intensive. Although I do not
> completely confirm what I have said currently, I need do more
> experiments.

Remind me, are you running in HVM mode, or PV mode?

That sounds unusual.  Is it the number of events delivered, or the
number of times the guest woke up?  NB they're not the same -- an HVM
guest will block and then wake up on the completion of an I/O
instruction which is handled by qemu.

If you're running in HVM mode, you can use "xenalyze -s" will give you
a summary of the trace.  In the summary you can see not only now many
times a VM woke up, but which interrupt was delivered how many times.

At the moment, from Xen's perspective, an event delivery is an event
delivery.  You'd have to manually add some way of classifying an event
as "I/O".

 -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-11 11:05                       ` George Dunlap
@ 2010-10-12 12:42                         ` Yuehai Xu
  2010-10-18 10:25                           ` George Dunlap
  0 siblings, 1 reply; 21+ messages in thread
From: Yuehai Xu @ 2010-10-12 12:42 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, yhxu

Sorry for making noise, the mode is PV. Because my scheduler is set to
CPU fairness only, so the number I calculated is almost the same, as
long as I set it to I/O favor, the number is different.

Here is another question, since we always say a short period of time,
how long it should be? 500us? 50us? 1ms? is there any hint that I can
follow?

Thanks,
Yuehai


>
> Remind me, are you running in HVM mode, or PV mode?
>
> That sounds unusual.  Is it the number of events delivered, or the
> number of times the guest woke up?  NB they're not the same -- an HVM
> guest will block and then wake up on the completion of an I/O
> instruction which is handled by qemu.
>
> If you're running in HVM mode, you can use "xenalyze -s" will give you
> a summary of the trace.  In the summary you can see not only now many
> times a VM woke up, but which interrupt was delivered how many times.
>
> At the moment, from Xen's perspective, an event delivery is an event
> delivery.  You'd have to manually add some way of classifying an event
> as "I/O".
>
>  -George
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Question about the ability of credit scheduler to handle I/O and CPU intensive VMs
  2010-10-12 12:42                         ` Yuehai Xu
@ 2010-10-18 10:25                           ` George Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: George Dunlap @ 2010-10-18 10:25 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: xen-devel, yhxu

I think this is probably an area open for research. :-)

I would think 500us or 1ms would be decent options, but that's mostly a 
guess.

  -George

On 12/10/10 13:42, Yuehai Xu wrote:
> Sorry for making noise, the mode is PV. Because my scheduler is set to
> CPU fairness only, so the number I calculated is almost the same, as
> long as I set it to I/O favor, the number is different.
>
> Here is another question, since we always say a short period of time,
> how long it should be? 500us? 50us? 1ms? is there any hint that I can
> follow?
>
> Thanks,
> Yuehai
>
>
>>
>> Remind me, are you running in HVM mode, or PV mode?
>>
>> That sounds unusual.  Is it the number of events delivered, or the
>> number of times the guest woke up?  NB they're not the same -- an HVM
>> guest will block and then wake up on the completion of an I/O
>> instruction which is handled by qemu.
>>
>> If you're running in HVM mode, you can use "xenalyze -s" will give you
>> a summary of the trace.  In the summary you can see not only now many
>> times a VM woke up, but which interrupt was delivered how many times.
>>
>> At the moment, from Xen's perspective, an event delivery is an event
>> delivery.  You'd have to manually add some way of classifying an event
>> as "I/O".
>>
>>   -George
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-10-18 10:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-13 21:37 Question about the ability of credit scheduler to handle I/O and CPU intensive VMs Yuehai Xu
2010-09-13 23:29 ` Jeremy Fitzhardinge
2010-09-14  1:38   ` Yuehai Xu
     [not found] ` <AANLkTin9E1m_jFcj4Ak7nB9OxcQynrznpQ_nNPi_U7hN@mail.gmail.com>
2010-09-14 14:58   ` Yuehai Xu
2010-09-30 12:28   ` Yuehai Xu
2010-09-30 13:27     ` George Dunlap
2010-10-05  2:52       ` Yuehai Xu
2010-10-05 14:16         ` George Dunlap
2010-10-05 14:56           ` Yuehai Xu
2010-10-05 15:02             ` George Dunlap
2010-10-07 22:18               ` Yuehai Xu
2010-10-08  0:25                 ` Yuehai Xu
2010-10-08  9:57                   ` George Dunlap
2010-10-08 10:03                     ` George Dunlap
2010-10-08 10:11                       ` George Dunlap
2010-10-10  4:08                     ` Yuehai Xu
2010-10-10  8:30                       ` cendhu
2010-10-11 11:05                       ` George Dunlap
2010-10-12 12:42                         ` Yuehai Xu
2010-10-18 10:25                           ` George Dunlap
2010-10-05  4:30       ` question about lineat pagetable and mfn_x strongerwill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.