Introduce rt real-time scheduler for Xen

* Introduce rt real-time scheduler for Xen
@ 2014-07-29  1:52 Meng Xu
  2014-07-29  1:52 ` [PATCH RFC v2 1/4] xen: add real time scheduler rt Meng Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Meng Xu @ 2014-07-29  1:52 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	ian.jackson, xumengpanda, JBeulich, lichong659, dgolomb

Hi all,

This serie of patches adds rt real-time scheduler to Xen.

In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler;
2) Assign/display each VCPU's parameters of each domain;
3) Supports CPU Pool

Based on the review comments/suggestion on version 1, the version 2 has the following improvements:
    a) Changed the interface of get/set vcpu's parameter from staticlly allocating a large array to dynamically allocating memory based on the number of vcpus of a domain. (This is a major change from v1 to v2 because many comments on v1 is about the code related to this functionality.)
    b) Changed the time unit at user interferce from 1ms to 1us; Changed the type of period and buget of a VCPU from uint16 to s_time_s.
    c) Changed the code style, rearrange the patch order, add comments to better explain the code.
    d) Domain 0 is no longer treated as a special domain. Domain 0's VCPUs are changed to be the same with domUs' VCPUs: domain 0's VCPUs have the same default value with domUs' VCPUs and are scheduled as domUs' VCPUs.
    e) Add more ASSERT(), e.g., in  __runq_insert() in sched_rt.c

-----------------------------------------------------------------------------------------------------------------------------
TODO:
    a) Add TRACE() in sched_rt.c functions.[easy]
       We will add a few xentrace tracepoints, like TRC_CSCHED2_RUNQ_POS in credit2 scheduler, in rt scheduler, to debug via tracing.
    b) Split runnable and depleted (=no budget left) VCPU queues.[easy]
    c) Deal with budget overrun in the algorithm [medium]
    d) Try using timers for replenishment, instead of scanning the full runqueue every now and then [medium]
    e) Reconsider the rt_vcpu_insert() and rt_vcpu_remove() for cpu pool support. 
    f) Method of improving the performance of rt scheduler [future work]
       VCPUs of the same domain may preempt each other based on the preemptive global EDF scheduling policy. This self-switch issue does not bring benefit to the domain but introduce more overhead. When this situation happens, we can simply promote the current running lower-priority VCPU’s priority and let it  borrow budget from higher priority VCPUs to avoid such self-swtich issue.

Plan: 
    TODO a) and b) are expected in RFC v3; (2 weeks)
    TODO c), d) and e) are expected in RFC v4, v5; (3-4 weeks)
    TODO f) will be delayed after this scheduler is upstreamed because the improvement will make the scheduler not a pure global EDF scheduler.

-----------------------------------------------------------------------------------------------------------------------------
The design of this rt scheduler is as follows:
This rt scheduler follows the Preemptive Global Earliest Deadline First (GEDF) theory in real-time field.
Each VCPU can have a dedicated period and budget. While scheduled, a VCPU burns its budget. Each VCPU has its budget replenished at the beginning of each of its periods; Each VCPU discards its unused budget at the end of each of its periods. If a VCPU runs out of budget in a period, it has to wait until next period.
The mechanism of how to burn a VCPU's budget depends on the server mechanism implemented for each VCPU.
The mechanism of deciding the priority of VCPUs at each scheduling point is based on the Preemptive Global Earliest Deadline First scheduling scheme.

Server mechanism: a VCPU is implemented as a deferrable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Priority scheme: Global Earliest Deadline First (EDF).
At any scheduling point, the VCPU with earliest deadline has highest priority.

Queue scheme: A global runqueue for each CPU pool.
The runqueue holds all runnable VCPUs.
VCPUs in the runqueue are divided into two parts: with and without remaining budget.
At each part, VCPUs are sorted based on GEDF priority scheme.

Scheduling quanta: 1 ms.

If you are intersted in the details of the design and evaluation of this rt scheduler, please refer to our paper "Real-Time Multi-Core Virtual Machine Scheduling in Xen" (http://www.cis.upenn.edu/~mengxu/emsoft14/emsoft14.pdf) in EMSOFT14. This paper has the following details:
    a) Desgin of this scheduler;
    b) Measurement of the implementation overhead, e.g., scheduler overhead, context switch overhead, etc. 
    c) Comparison of this rt scheduler and credit scheduler in terms of the real-time performance.
-----------------------------------------------------------------------------------------------------------------------------
One scenario to show the functionality of this rt scheduler is as follows:
//list each vcpu's parameters of each domain in cpu pools using rt scheduler
#xl sched-rt
Cpupool Pool-0: sched=EDF
Name                                ID VCPU Period Budget
Domain-0                             0    0  10000  10000
Domain-0                             0    1  20000  20000
Domain-0                             0    2  30000  30000
Domain-0                             0    3  10000  10000
litmus1                              1    0  10000   4000
litmus1                              1    1  10000   4000

//set the parameters of the vcpu 1 of domain litmus1:
# xl sched-rt -d litmus1 -v 1 -p 20000 -b 10000

//domain litmus1's vcpu 1's parameters are changed, display each VCPU's parameters separately:
# xl sched-rt -d litmus1
Name                                ID VCPU Period Budget
litmus1                              1    0  10000   4000
litmus1                              1    1  20000  10000

// list cpupool information
xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0              12        rt       y          2

//create a cpupool test
#xl cpupool-cpu-remove Pool-0 11
#xl cpupool-cpu-remove Pool-0 10
#xl cpupool-create name=\"test\" sched=\"credit\"
#xl cpupool-cpu-add test 11
#xl cpupool-cpu-add test 10
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0              10        rt       y          2
test                 2    credit       y          0

//migrate litmus1 from cpupool Pool-0 to cpupool test.
#xl cpupool-migrate litmus1 test

//now litmus1 is in cpupool test
# xl sched-credit
Cpupool test: tslice=30ms ratelimit=1000us
Name                                ID Weight  Cap
litmus1                              1    256    0

-----------------------------------------------------------------------------------------------------------------------------
[PATCH RFC v2 1/4] xen: add real time scheduler rt
[PATCH RFC v2 2/4] libxc: add rt scheduler
[PATCH RFC v2 3/4] libxl: add rt scheduler
[PATCH RFC v2 4/4] xl: introduce rt scheduler
-----------------------------------------------------------------------------------------------------------------------------
Thanks for Dario, Wei, Ian, Andrew, George, and Konrad for your valuable comments and suggestions!

Any comment, question, and concerns are more than welcome! :-)

Thank you very much!

Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread