Introduce rt real-time scheduler for Xen

* Introduce rt real-time scheduler for Xen
@ 2014-09-07 19:40 Meng Xu
  2014-09-07 19:40 ` [PATCH v2 1/4] xen: add real time scheduler rt Meng Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Meng Xu @ 2014-09-07 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, ptxlinh, xumengpanda, JBeulich,
	chaowang, lichong659, dgolomb

This serie of patches adds rt real-time scheduler to Xen.

In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler;
2) Assign/display VCPUs' parameters of each domain (All VCPUs of each domain have the same period and budget);
3) Supports CPU Pool
Note: 
a) Although the toolstack only allows users to set the paramters of all VCPUs of the same domain to the same number, the scheduler supports to schedule VCPUs with different parameters of the same domain. In Xen 4.6, we plan to support assign/display each VCPU's parameters of each domain. 
b) Parameters of a domain at tool stack is in microsecond, instead of millisecond.

Compared with the PATCH v1, this set of patch has the following modifications:
    a) Toolstack only allows users to set the parameters of all VCPUs of the same domain to be the same value; Toostack only displays the VCPUs' period and budget of a domain. (In PATCH v1, toolstack can assign/display each VCPU's parameters of each domain, but because it is hard to reach an agreement with the libxl interface for this functionality, we decide to delay this functionality to Xen 4.6 after the scheduler is merged into Xen 4.5.)
    b) Miscellous modification of the scheduler in sched_rt.c based on Dario's detailed comments.
    c) Code style correction in libxl. 

-----------------------------------------------------------------------------------------------------------------------------
TODO after Xen 4.5:
    a) Burn budget in finer granularity instead of 1ms; [medium]
    b) Use separate timer per vcpu for each vcpu's budget replenishment, instead of scanning the full runqueue every now and then [medium]
    c) Handle time stolen from domU by hypervisor. When it runs on a machine with many sockets and lots of cores, the spin-lock for global RunQ used in rt scheduler could eat up time from domU, which could make domU have less budget than it requires. [not sure about difficulty right now] (Thank Konrad Rzeszutek to point this out in the XenSummit. :-)) 
    d) Toolstack supports assiging/display each VCPU's parameters of each domain.

-----------------------------------------------------------------------------------------------------------------------------
The design of this rt scheduler is as follows:
This scheduler follows the Preemptive Global Earliest Deadline First (EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each of its periods;
A VCPU has its budget replenished at the beginning of each of its periods;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each of its periods.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme: A global runqueue for each CPU pool.
The runqueue holds all runnable VCPUs.
VCPUs in the runqueue are divided into two parts: with and without budget.
At the first part, VCPUs are sorted based on EDF priority scheme.

Note: cpumask and cpupool is supported.

If you are intersted in the details of the design and evaluation of this rt scheduler, please refer to our paper "Real-Time Multi-Core Virtual Machine Scheduling in Xen" (http://www.cis.upenn.edu/~mengxu/emsoft14/emsoft14.pdf), which will be published in EMSOFT14. This paper has the following details:
    a) Desgin of this scheduler;
    b) Measurement of the implementation overhead, e.g., scheduler overhead, context switch overhead, etc.
    c) Comparison of this rt scheduler and credit scheduler in terms of the real-time performance.

If you are interested in other real-time schedulers in Xen, please refer to the RT-Xen project's website (https://sites.google.com/site/realtimexen/). It also supports Preemptive Global Rate Monotonic schedulers.
-----------------------------------------------------------------------------------------------------------------------------
One scenario to show the functionality of this rt scheduler is as follows:
//list the domains
#xl list
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  3344     4     r-----     146.1
vm1                                          1   512     2     r-----     155.1

//list VCPUs' parameters of each domain in cpu pools using rt scheduler
#xl sched-rt
Cpupool Pool-0: sched=EDF
Name                                ID    Period    Budget
Domain-0                             0     10000      4000
vm1                                  1     10000      4000

//set VCPUs' parameters of each domain to new value
xl sched-rt -d Domain-0 -p 20000 -b 10000
//Now all vcpus of Domain-0 have period 20000us and budget 10000us.
#xl sched-rt
Cpupool Pool-0: sched=EDF
Name                                ID    Period    Budget
Domain-0                             0     20000     10000
vm1                                  1     10000      4000

// list cpupool information              
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0               4     rt_ds       y          2
#xl cpupool-list -c    
Name               CPU list    
Pool-0             0,1,2,3

//create a cpupool test
#xl cpupool-cpu-remove Pool-0 3
#xl cpupool-cpu-remove Pool-0 2
#xl cpupool-create name=\"test\" sched=\"rt_ds\"
#xl cpupool-cpu-add test 3 
#xl cpupool-cpu-add test 2
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0               2     rt_ds       y          2
test                 2     rt_ds       y          0

//migrate vm1 from cpupool Pool-0 to cpupool test.    
#xl cpupool-migrate vm1 test

//now vm1 is in cpupool test
# xl sched-rt
pupool Pool-0: sched=EDF
Name                                ID    Period    Budget
Domain-0                             0     20000     10000
Cpupool test: sched=EDF                 
Name                                ID    Period    Budget
vm1                                  1     10000      4000

-----------------------------------------------------------------------------------------------------------------------------
Any comment, question, and concerns are more than welcome! :-)

Thank you very much!

Best,

Meng

[PATCH v2 1/4] xen: add real time scheduler rt
[PATCH v2 2/4] libxc: add rt scheduler
[PATCH v2 3/4] libxl: add rt scheduler
[PATCH v2 4/4] xl: introduce rt scheduler

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 31+ messages in thread