xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues
@ 2016-05-15  4:11 Tony S
  2016-05-16 11:30 ` Dario Faggioli
  2016-05-17  9:27 ` George Dunlap
  0 siblings, 2 replies; 5+ messages in thread
From: Tony S @ 2016-05-15  4:11 UTC (permalink / raw)
  To: xen-devel

Hi all,

When I was running latency-sensitive applications in VMs on Xen, I
found some bugs in the credit scheduler which will cause long tail
latency in I/O-intensive VMs.


(1) Problem description

------------Description------------
My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
3.18.21), Dom U(Linux 3.18.21).

Environment setup:
We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
simply a loop(denoted as CPU-VM). A client on another physical machine
sent UDP requests to the I/O-VM.

Here are my tail latency results (micro-second):
Case   Avg      90%       99%        99.9%      99.99%
#1     108   &  114    &  128     &  129     &  130
#2     7811  &  13892  &  14874   &  15315   &  16383
#3     943   &  131    &  21755   &  26453   &  26553
#4     116   &  96     &  105     &  8217    &  13472
#5     116   &  117    &  129     &  131     &  132

Bug 1, 2, and 3 will be discussed below.

Case #1:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
idling (no processes running).

Case #2:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0

Case #3:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0 with bug 1 fixed

Case #4:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed

Case #5:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed

---------------------------------------


(2) Problem analysis

------------Analysis----------------

[Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly
boosted due to CPU affinity.

http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.html

We have already discussed this bug and a potential patch in the above
link. Although the discussed patch improved the tail latency, i.e.,
reducing the 90th percentile latency, the long tail latency is till
not bounded. Next, we discussed two new bugs that inflict latency hike
at the very far end of the tail.



[Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning
credits and is removed from the active CPU list(in
__csched_vcpu_acct_stop_locked) if its credit is larger than the upper
bound. Because the domain has only one VCPU and the VM will also be
removed from the active domain list.

Every 10ms, csched_tick() --> csched_vcpu_acct() -->
__csched_vcpu_acct_start() will be executed and tries to put inactive
VCPUs back to the active list. However, __csched_vcpu_acct_start()
will only put the current VCPU back to the active list. If an
I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
not be put back to the active VCPU list. If so, the I/O-bound VCPU
will likely miss the next credit refill in csched_acct() and can
easily enter the OVER state. As such, the I/O-bound VM will be unable
to be boosted and have very long latency. It takes at least one time
slice (e.g., 30ms) before the I/O VM is activated and starts to
receive credits.

[Possible Solution] Try to activate any inactive VCPUs back to active
before next credit refill, instead of just the current VCPU.



[Bug 3]: The BOOST priority might be changed to UNDER before the
boosted VCPU preempts the current running VCPU. If so, VCPU boosting
can not take effect.

If a VCPU is in UNDER state and wakes up from sleep, it will be
boosted in csched_vcpu_wake(). However, the boosting is successful
only when __runq_tickle() preempts the current VCPU. It is possible
that csched_acct() can run between csched_vcpu_wake() and
__runq_tickle(), which will sometimes change the BOOST state back to
UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER
cannot preempt another UNDER VCPU. This also contributes to the far
end of the long tail latency.

[Possible Solution]
1. add a lock to prevent csched_acct() from interleaving with
csched_vcpu_wake();
2. separate the BOOST state from UNDER and OVER states.
---------------------------------------


Please confirm these bugs.
Thanks.

--
Tony. S
Ph. D student of University of Colorado, Colorado Springs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues
  2016-05-15  4:11 [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues Tony S
@ 2016-05-16 11:30 ` Dario Faggioli
  2016-05-16 18:22   ` Tony S
  2016-05-17  9:27 ` George Dunlap
  1 sibling, 1 reply; 5+ messages in thread
From: Dario Faggioli @ 2016-05-16 11:30 UTC (permalink / raw)
  To: Tony S, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 6168 bytes --]

[Adding George, and avoiding trimming, for his benefit]

On Sat, 2016-05-14 at 22:11 -0600, Tony S wrote:
> Hi all,
> 
Hi Tony,

> When I was running latency-sensitive applications in VMs on Xen, I
> found some bugs in the credit scheduler which will cause long tail
> latency in I/O-intensive VMs.
> 
Ok, first of all, thanks for looking into and reporting this.

This is certainly something we need to think about... For now, just a
couple of questions.

> (1) Problem description
> 
> ------------Description------------
> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
> 3.18.21), Dom U(Linux 3.18.21).
> 
> Environment setup:
> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
> simply a loop(denoted as CPU-VM). A client on another physical
> machine
> sent UDP requests to the I/O-VM.
> 
So, just to be sure I've understood, you have 2 VMs, each with 1 vCPU,
*both* pinned on the *same* pCPU, is this the case?

> Here are my tail latency results (micro-second):
> Case   Avg      90%       99%        99.9%      99.99%
> #1     108   &  114    &  128     &  129     &  130
> #2     7811  &  13892  &  14874   &  15315   &  16383
> #3     943   &  131    &  21755   &  26453   &  26553
> #4     116   &  96     &  105     &  8217    &  13472
> #5     116   &  117    &  129     &  131     &  132
> 
> Bug 1, 2, and 3 will be discussed below.
> 
> Case #1:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> idling (no processes running).
> 
> Case #2:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0
> 
> Case #3:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 fixed
> 
> Case #4:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed
> 
> Case #5:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed
> 
> ---------------------------------------
> 
> 
> (2) Problem analysis
> 
> ------------Analysis----------------
> 
> [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly
> boosted due to CPU affinity.
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.
> html
> 
> We have already discussed this bug and a potential patch in the above
> link. Although the discussed patch improved the tail latency, i.e.,
> reducing the 90th percentile latency, the long tail latency is till
> not bounded. Next, we discussed two new bugs that inflict latency
> hike
> at the very far end of the tail.
> 
Right, and there is a fix upstream for this. It's not the patch you
proposed in the thread linked above, but it should have had the same
effect.

Can you perhaps try something more recent thatn 4.5 (4.7-rc would be
great) and confirm that the number still look similar?

About this below here, I'll read carefully and think about it. Thanks
again.

> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops
> earning
> credits and is removed from the active CPU list(in
> __csched_vcpu_acct_stop_locked) if its credit is larger than the
> upper
> bound. Because the domain has only one VCPU and the VM will also be
> removed from the active domain list.
> 
> Every 10ms, csched_tick() --> csched_vcpu_acct() -->
> __csched_vcpu_acct_start() will be executed and tries to put inactive
> VCPUs back to the active list. However, __csched_vcpu_acct_start()
> will only put the current VCPU back to the active list. If an
> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
> not be put back to the active VCPU list. If so, the I/O-bound VCPU
> will likely miss the next credit refill in csched_acct() and can
> easily enter the OVER state. As such, the I/O-bound VM will be unable
> to be boosted and have very long latency. It takes at least one time
> slice (e.g., 30ms) before the I/O VM is activated and starts to
> receive credits.
> 
> [Possible Solution] Try to activate any inactive VCPUs back to active
> before next credit refill, instead of just the current VCPU.
> 
> 
> 
> [Bug 3]: The BOOST priority might be changed to UNDER before the
> boosted VCPU preempts the current running VCPU. If so, VCPU boosting
> can not take effect.
> 
> If a VCPU is in UNDER state and wakes up from sleep, it will be
> boosted in csched_vcpu_wake(). However, the boosting is successful
> only when __runq_tickle() preempts the current VCPU. It is possible
> that csched_acct() can run between csched_vcpu_wake() and
> __runq_tickle(), which will sometimes change the BOOST state back to
> UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER
> cannot preempt another UNDER VCPU. This also contributes to the far
> end of the long tail latency.
> 
> [Possible Solution]
> 1. add a lock to prevent csched_acct() from interleaving with
> csched_vcpu_wake();
> 2. separate the BOOST state from UNDER and OVER states.
> ---------------------------------------
> 
> 
> Please confirm these bugs.
> Thanks.
> 
> --
> Tony. S
> Ph. D student of University of Colorado, Colorado Springs
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues
  2016-05-16 11:30 ` Dario Faggioli
@ 2016-05-16 18:22   ` Tony S
  0 siblings, 0 replies; 5+ messages in thread
From: Tony S @ 2016-05-16 18:22 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On Mon, May 16, 2016 at 5:30 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> [Adding George, and avoiding trimming, for his benefit]
>
> On Sat, 2016-05-14 at 22:11 -0600, Tony S wrote:
>> Hi all,
>>
> Hi Tony,
>
>> When I was running latency-sensitive applications in VMs on Xen, I
>> found some bugs in the credit scheduler which will cause long tail
>> latency in I/O-intensive VMs.
>>
> Ok, first of all, thanks for looking into and reporting this.
>
> This is certainly something we need to think about... For now, just a
> couple of questions.

Hi Dario,

Thank you for your reply. :-)

>
>> (1) Problem description
>>
>> ------------Description------------
>> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
>> 3.18.21), Dom U(Linux 3.18.21).
>>
>> Environment setup:
>> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
>> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
>> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
>> simply a loop(denoted as CPU-VM). A client on another physical
>> machine
>> sent UDP requests to the I/O-VM.
>>
> So, just to be sure I've understood, you have 2 VMs, each with 1 vCPU,
> *both* pinned on the *same* pCPU, is this the case?
>

Yes.

>> Here are my tail latency results (micro-second):
>> Case   Avg      90%       99%        99.9%      99.99%
>> #1     108   &  114    &  128     &  129     &  130
>> #2     7811  &  13892  &  14874   &  15315   &  16383
>> #3     943   &  131    &  21755   &  26453   &  26553
>> #4     116   &  96     &  105     &  8217    &  13472
>> #5     116   &  117    &  129     &  131     &  132
>>
>> Bug 1, 2, and 3 will be discussed below.
>>
>> Case #1:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> idling (no processes running).
>>
>> Case #2:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0
>>
>> Case #3:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 fixed
>>
>> Case #4:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed
>>
>> Case #5:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed
>>
>> ---------------------------------------
>>
>>
>> (2) Problem analysis
>>
>> ------------Analysis----------------
>>
>> [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly
>> boosted due to CPU affinity.
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.
>> html
>>
>> We have already discussed this bug and a potential patch in the above
>> link. Although the discussed patch improved the tail latency, i.e.,
>> reducing the 90th percentile latency, the long tail latency is till
>> not bounded. Next, we discussed two new bugs that inflict latency
>> hike
>> at the very far end of the tail.
>>
> Right, and there is a fix upstream for this. It's not the patch you
> proposed in the thread linked above, but it should have had the same
> effect.
>
> Can you perhaps try something more recent thatn 4.5 (4.7-rc would be
> great) and confirm that the number still look similar?

I have tried the latest stable version Xen 4.6 today. Here is my results:

Case   Avg      90%       99%        99.9%      99.99%
#1     91     &  93         &  101       &  105       &  110
#2     22506 & 43011  &  231946  &  259501   &  265561
#3     917   &  95    &  25257   &  30048   &  30756
#4     110   &  95     &  102     &  12448    &  13255
#5     114   &  118   &  130     &  134     &  136

It seems that case#2 is much worse. The other cases are similar. My
raw latency data is pasted below.

For xen 4.7-rc, I have some installment issues on my machine,
therefore I have not tried that.


Raw data is as follows. Hope this could help you understand the issues
better. :-)
# case 1:
sockperf: ====> avg-lat= 91.688 (std-dev=2.950)
sockperf: ---> <MAX> observation =  110.647
sockperf: ---> percentile  99.99 =  110.647
sockperf: ---> percentile  99.90 =  105.242
sockperf: ---> percentile  99.50 =  101.531
sockperf: ---> percentile  99.00 =  101.066
sockperf: ---> percentile  95.00 =   97.016
sockperf: ---> percentile  90.00 =   93.294
sockperf: ---> percentile  75.00 =   92.157
sockperf: ---> percentile  50.00 =   91.437
sockperf: ---> percentile  25.00 =   90.786
sockperf: ---> <MIN> observation =   73.071


# case 2:
sockperf: ====> avg-lat=90019.931 (std-dev=136620.722)
sockperf: ---> <MAX> observation = 637712.152
sockperf: ---> percentile  99.99 = 637712.152
sockperf: ---> percentile  99.90 = 632901.547
sockperf: ---> percentile  99.50 = 615972.778
sockperf: ---> percentile  99.00 = 599698.318
sockperf: ---> percentile  95.00 = 428857.020
sockperf: ---> percentile  90.00 = 259316.760
sockperf: ---> percentile  75.00 = 114029.044
sockperf: ---> percentile  50.00 = 24629.429
sockperf: ---> percentile  25.00 = 10368.731
sockperf: ---> <MIN> observation =   81.046


#case 3:
sockperf: ====> avg-lat=917.394 (std-dev=3943.142)
sockperf: ---> <MAX> observation = 30756.289
sockperf: ---> percentile  99.99 = 30756.289
sockperf: ---> percentile  99.90 = 30048.372
sockperf: ---> percentile  99.50 = 25962.687
sockperf: ---> percentile  99.00 = 25257.746
sockperf: ---> percentile  95.00 = 5615.028
sockperf: ---> percentile  90.00 =   95.726
sockperf: ---> percentile  75.00 =   92.916
sockperf: ---> percentile  50.00 =   90.387
sockperf: ---> percentile  25.00 =   89.162
sockperf: ---> <MIN> observation =   67.762


#case 4:
sockperf: ====> avg-lat=110.159 (std-dev=555.153)
sockperf: ---> <MAX> observation = 13255.732
sockperf: ---> percentile  99.99 = 13255.732
sockperf: ---> percentile  99.90 = 12448.629
sockperf: ---> percentile  99.50 =  104.799
sockperf: ---> percentile  99.00 =  101.954
sockperf: ---> percentile  95.00 =   97.295
sockperf: ---> percentile  90.00 =   95.995
sockperf: ---> percentile  75.00 =   91.866
sockperf: ---> percentile  50.00 =   88.803
sockperf: ---> percentile  25.00 =   71.088
sockperf: ---> <MIN> observation =   65.826


#case 5:
sockperf: ====> avg-lat=114.984 (std-dev=3.782)
sockperf: ---> <MAX> observation =  136.748
sockperf: ---> percentile  99.99 =  136.748
sockperf: ---> percentile  99.90 =  134.192
sockperf: ---> percentile  99.50 =  131.467
sockperf: ---> percentile  99.00 =  130.200
sockperf: ---> percentile  95.00 =  121.575
sockperf: ---> percentile  90.00 =  118.518
sockperf: ---> percentile  75.00 =  116.343
sockperf: ---> percentile  50.00 =  114.356
sockperf: ---> percentile  25.00 =  112.479
sockperf: ---> <MIN> observation =   94.932


>
> About this below here, I'll read carefully and think about it. Thanks
> again.

Thank you, Dario.

For bug 2 and bug 3, although they will not influence the throughput,
latency, especially long tail latency is a big issue due to them.

>
>> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops
>> earning
>> credits and is removed from the active CPU list(in
>> __csched_vcpu_acct_stop_locked) if its credit is larger than the
>> upper
>> bound. Because the domain has only one VCPU and the VM will also be
>> removed from the active domain list.
>>
>> Every 10ms, csched_tick() --> csched_vcpu_acct() -->
>> __csched_vcpu_acct_start() will be executed and tries to put inactive
>> VCPUs back to the active list. However, __csched_vcpu_acct_start()
>> will only put the current VCPU back to the active list. If an
>> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
>> not be put back to the active VCPU list. If so, the I/O-bound VCPU
>> will likely miss the next credit refill in csched_acct() and can
>> easily enter the OVER state. As such, the I/O-bound VM will be unable
>> to be boosted and have very long latency. It takes at least one time
>> slice (e.g., 30ms) before the I/O VM is activated and starts to
>> receive credits.
>>
>> [Possible Solution] Try to activate any inactive VCPUs back to active
>> before next credit refill, instead of just the current VCPU.
>>
>>
>>
>> [Bug 3]: The BOOST priority might be changed to UNDER before the
>> boosted VCPU preempts the current running VCPU. If so, VCPU boosting
>> can not take effect.
>>
>> If a VCPU is in UNDER state and wakes up from sleep, it will be
>> boosted in csched_vcpu_wake(). However, the boosting is successful
>> only when __runq_tickle() preempts the current VCPU. It is possible
>> that csched_acct() can run between csched_vcpu_wake() and
>> __runq_tickle(), which will sometimes change the BOOST state back to
>> UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER
>> cannot preempt another UNDER VCPU. This also contributes to the far
>> end of the long tail latency.
>>
>> [Possible Solution]
>> 1. add a lock to prevent csched_acct() from interleaving with
>> csched_vcpu_wake();
>> 2. separate the BOOST state from UNDER and OVER states.
>> ---------------------------------------
>>
>>
>> Please confirm these bugs.
>> Thanks.
>>
>> --
>> Tony. S
>> Ph. D student of University of Colorado, Colorado Springs
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

-- 

Tony S.
Ph. D student of University of Colorado, Colorado Springs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues
  2016-05-15  4:11 [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues Tony S
  2016-05-16 11:30 ` Dario Faggioli
@ 2016-05-17  9:27 ` George Dunlap
  2016-05-17 16:11   ` Tony S
  1 sibling, 1 reply; 5+ messages in thread
From: George Dunlap @ 2016-05-17  9:27 UTC (permalink / raw)
  To: Tony S; +Cc: xen-devel

On Sun, May 15, 2016 at 5:11 AM, Tony S <suokunstar@gmail.com> wrote:
> Hi all,
>
> When I was running latency-sensitive applications in VMs on Xen, I
> found some bugs in the credit scheduler which will cause long tail
> latency in I/O-intensive VMs.
>
>
> (1) Problem description
>
> ------------Description------------
> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
> 3.18.21), Dom U(Linux 3.18.21).
>
> Environment setup:
> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
> simply a loop(denoted as CPU-VM). A client on another physical machine
> sent UDP requests to the I/O-VM.
>
> Here are my tail latency results (micro-second):
> Case   Avg      90%       99%        99.9%      99.99%
> #1     108   &  114    &  128     &  129     &  130
> #2     7811  &  13892  &  14874   &  15315   &  16383
> #3     943   &  131    &  21755   &  26453   &  26553
> #4     116   &  96     &  105     &  8217    &  13472
> #5     116   &  117    &  129     &  131     &  132
>
> Bug 1, 2, and 3 will be discussed below.
>
> Case #1:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> idling (no processes running).
>
> Case #2:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0
>
> Case #3:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 fixed
>
> Case #4:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed
>
> Case #5:
> I/O-VM was processing Sockperf requests from clients; CPU-VM was
> running a compute-bound task.
> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed
>
> ---------------------------------------
>
>
> (2) Problem analysis

Hey Tony,

Thanks for looking at this.  These issues in the credit1 algorithm are
essentially exactly the reason that I started work on the credit2
scheduler several years ago.  We meant credit2 to have replaced
credit1 by now, but we ran out of time to test it properly; we're in
the process of doing that right now, and are hoping it will be the
default scheduler for the 4.8 release.

So if I could make two suggestions that would help your effort be more
helpful to us:

1. Use cpupools for testing rather than pinning. A lot of the
algorithms are designed with the assumption that they have all the
cpus to run on, and the credit allocation / priority algorithms fail
to work properly when they are only pinned.  Cpupools was specifically
designed to allow the scheduler algorithms to work as designed with a
smaller number of cpus than the system had.

2. Test credit2. :-)

One comment about your analysis here...

> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning
> credits and is removed from the active CPU list(in
> __csched_vcpu_acct_stop_locked) if its credit is larger than the upper
> bound. Because the domain has only one VCPU and the VM will also be
> removed from the active domain list.
>
> Every 10ms, csched_tick() --> csched_vcpu_acct() -->
> __csched_vcpu_acct_start() will be executed and tries to put inactive
> VCPUs back to the active list. However, __csched_vcpu_acct_start()
> will only put the current VCPU back to the active list. If an
> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
> not be put back to the active VCPU list. If so, the I/O-bound VCPU
> will likely miss the next credit refill in csched_acct() and can
> easily enter the OVER state. As such, the I/O-bound VM will be unable
> to be boosted and have very long latency. It takes at least one time
> slice (e.g., 30ms) before the I/O VM is activated and starts to
> receive credits.
>
> [Possible Solution] Try to activate any inactive VCPUs back to active
> before next credit refill, instead of just the current VCPU.

When we stop accounting, we divide the credits in half, so that when
it starts out, it should have a reasonable amount of credit (15ms
worth).  Is this not taking effect for some reason?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues
  2016-05-17  9:27 ` George Dunlap
@ 2016-05-17 16:11   ` Tony S
  0 siblings, 0 replies; 5+ messages in thread
From: Tony S @ 2016-05-17 16:11 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Dario Faggioli

On Tue, May 17, 2016 at 3:27 AM, George Dunlap <dunlapg@umich.edu> wrote:
> On Sun, May 15, 2016 at 5:11 AM, Tony S <suokunstar@gmail.com> wrote:
>> Hi all,
>>
>> When I was running latency-sensitive applications in VMs on Xen, I
>> found some bugs in the credit scheduler which will cause long tail
>> latency in I/O-intensive VMs.
>>
>>
>> (1) Problem description
>>
>> ------------Description------------
>> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
>> 3.18.21), Dom U(Linux 3.18.21).
>>
>> Environment setup:
>> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
>> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
>> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
>> simply a loop(denoted as CPU-VM). A client on another physical machine
>> sent UDP requests to the I/O-VM.
>>
>> Here are my tail latency results (micro-second):
>> Case   Avg      90%       99%        99.9%      99.99%
>> #1     108   &  114    &  128     &  129     &  130
>> #2     7811  &  13892  &  14874   &  15315   &  16383
>> #3     943   &  131    &  21755   &  26453   &  26553
>> #4     116   &  96     &  105     &  8217    &  13472
>> #5     116   &  117    &  129     &  131     &  132
>>
>> Bug 1, 2, and 3 will be discussed below.
>>
>> Case #1:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> idling (no processes running).
>>
>> Case #2:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0
>>
>> Case #3:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 fixed
>>
>> Case #4:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed
>>
>> Case #5:
>> I/O-VM was processing Sockperf requests from clients; CPU-VM was
>> running a compute-bound task.
>> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed
>>
>> ---------------------------------------
>>
>>
>> (2) Problem analysis
>
> Hey Tony,
>
> Thanks for looking at this.  These issues in the credit1 algorithm are
> essentially exactly the reason that I started work on the credit2
> scheduler several years ago.  We meant credit2 to have replaced
> credit1 by now, but we ran out of time to test it properly; we're in
> the process of doing that right now, and are hoping it will be the
> default scheduler for the 4.8 release.
>
> So if I could make two suggestions that would help your effort be more
> helpful to us:
>
> 1. Use cpupools for testing rather than pinning. A lot of the
> algorithms are designed with the assumption that they have all the
> cpus to run on, and the credit allocation / priority algorithms fail
> to work properly when they are only pinned.  Cpupools was specifically
> designed to allow the scheduler algorithms to work as designed with a
> smaller number of cpus than the system had.
>
> 2. Test credit2. :-)
>

Hi George,

Thank you for reply. I will try cpupools and credit2 later. :-)


> One comment about your analysis here...
>
>> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning
>> credits and is removed from the active CPU list(in
>> __csched_vcpu_acct_stop_locked) if its credit is larger than the upper
>> bound. Because the domain has only one VCPU and the VM will also be
>> removed from the active domain list.
>>
>> Every 10ms, csched_tick() --> csched_vcpu_acct() -->
>> __csched_vcpu_acct_start() will be executed and tries to put inactive
>> VCPUs back to the active list. However, __csched_vcpu_acct_start()
>> will only put the current VCPU back to the active list. If an
>> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
>> not be put back to the active VCPU list. If so, the I/O-bound VCPU
>> will likely miss the next credit refill in csched_acct() and can
>> easily enter the OVER state. As such, the I/O-bound VM will be unable
>> to be boosted and have very long latency. It takes at least one time
>> slice (e.g., 30ms) before the I/O VM is activated and starts to
>> receive credits.
>>
>> [Possible Solution] Try to activate any inactive VCPUs back to active
>> before next credit refill, instead of just the current VCPU.
>
> When we stop accounting, we divide the credits in half, so that when
> it starts out, it should have a reasonable amount of credit (15ms
> worth).  Is this not taking effect for some reason?
>

Actually, for bug 2, dividing the credits in half to have a reasonable
credit is not the issue. The problem here is that the VCPU will be
removed from active VCPU list(in __csched_vcpu_acct_stop_locked) and
will not be put back to active list in time sometimes(as I explained
in the first thread). If the VCPU is not active, next time the
csched_acct will not allocate new credits to this VCPU. If many rounds
happened, the credit of this VCPU will be a small negative
number(e.g., -1000) and won't be scheduled. The I/O-intensive
applications on it, especially latency-intensive workloads, will
suffer long tail latency issue.

>  -George



-- 
Tony

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-17 16:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-15  4:11 [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues Tony S
2016-05-16 11:30 ` Dario Faggioli
2016-05-16 18:22   ` Tony S
2016-05-17  9:27 ` George Dunlap
2016-05-17 16:11   ` Tony S

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).