[BUG] Bugs existing Xen's credit scheduler cause long tail latency issues

* [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues
@ 2016-05-15  4:11 Tony S
  2016-05-16 11:30 ` Dario Faggioli
  2016-05-17  9:27 ` George Dunlap
  0 siblings, 2 replies; 5+ messages in thread
From: Tony S @ 2016-05-15  4:11 UTC (permalink / raw)
  To: xen-devel

Hi all,

When I was running latency-sensitive applications in VMs on Xen, I
found some bugs in the credit scheduler which will cause long tail
latency in I/O-intensive VMs.

(1) Problem description

------------Description------------
My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
3.18.21), Dom U(Linux 3.18.21).

Environment setup:
We created two 1-vCPU, 4GB-memory VMs and pinned them onto one
physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server
program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or
simply a loop(denoted as CPU-VM). A client on another physical machine
sent UDP requests to the I/O-VM.

Here are my tail latency results (micro-second):
Case   Avg      90%       99%        99.9%      99.99%
#1     108   &  114    &  128     &  129     &  130
#2     7811  &  13892  &  14874   &  15315   &  16383
#3     943   &  131    &  21755   &  26453   &  26553
#4     116   &  96     &  105     &  8217    &  13472
#5     116   &  117    &  129     &  131     &  132

Bug 1, 2, and 3 will be discussed below.

Case #1:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
idling (no processes running).

Case #2:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0

Case #3:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0 with bug 1 fixed

Case #4:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed

Case #5:
I/O-VM was processing Sockperf requests from clients; CPU-VM was
running a compute-bound task.
Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed

---------------------------------------

(2) Problem analysis

------------Analysis----------------

[Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly
boosted due to CPU affinity.

http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.html

We have already discussed this bug and a potential patch in the above
link. Although the discussed patch improved the tail latency, i.e.,
reducing the 90th percentile latency, the long tail latency is till
not bounded. Next, we discussed two new bugs that inflict latency hike
at the very far end of the tail.

[Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning
credits and is removed from the active CPU list(in
__csched_vcpu_acct_stop_locked) if its credit is larger than the upper
bound. Because the domain has only one VCPU and the VM will also be
removed from the active domain list.

Every 10ms, csched_tick() --> csched_vcpu_acct() -->
__csched_vcpu_acct_start() will be executed and tries to put inactive
VCPUs back to the active list. However, __csched_vcpu_acct_start()
will only put the current VCPU back to the active list. If an
I/O-bound VCPU is not the current VCPU at the csched_tick(), it will
not be put back to the active VCPU list. If so, the I/O-bound VCPU
will likely miss the next credit refill in csched_acct() and can
easily enter the OVER state. As such, the I/O-bound VM will be unable
to be boosted and have very long latency. It takes at least one time
slice (e.g., 30ms) before the I/O VM is activated and starts to
receive credits.

[Possible Solution] Try to activate any inactive VCPUs back to active
before next credit refill, instead of just the current VCPU.

[Bug 3]: The BOOST priority might be changed to UNDER before the
boosted VCPU preempts the current running VCPU. If so, VCPU boosting
can not take effect.

If a VCPU is in UNDER state and wakes up from sleep, it will be
boosted in csched_vcpu_wake(). However, the boosting is successful
only when __runq_tickle() preempts the current VCPU. It is possible
that csched_acct() can run between csched_vcpu_wake() and
__runq_tickle(), which will sometimes change the BOOST state back to
UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER
cannot preempt another UNDER VCPU. This also contributes to the far
end of the long tail latency.

[Possible Solution]
1. add a lock to prevent csched_acct() from interleaving with
csched_vcpu_wake();
2. separate the BOOST state from UNDER and OVER states.
---------------------------------------

Please confirm these bugs.
Thanks.

--
Tony. S
Ph. D student of University of Colorado, Colorado Springs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread