Introduce rt real-time scheduler for Xen

* Introduce rt real-time scheduler for Xen
@ 2014-07-11  4:49 Meng Xu
  2014-07-11  4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Meng Xu @ 2014-07-11  4:49 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xumengpanda, lichong659, dgolomb

This serie of patches adds rt real-time scheduler to Xen.

In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler;
2) Assign/display each VCPU's parameters of each domain;
3) Supports CPU Pool

The design of this rt scheduler is as follows:
This rt scheduler follows the Preemptive Global Earliest Deadline First (GEDF) theory in real-time field.
Each VCPU can have a dedicated period and budget. While scheduled, a VCPU burns its budget. Each VCPU has its budget replenished at the beginning of each of its periods; Each VCPU discards its unused budget at the end of each of its periods. If a VCPU runs out of budget in a period, it has to wait until next period.
The mechanism of how to burn a VCPU's budget depends on the server mechanism implemented for each VCPU.
The mechanism of deciding the priority of VCPUs at each scheduling point is based on the Preemptive Global Earliest Deadline First scheduling scheme.

Server mechanism: a VCPU is implemented as a deferrable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Priority scheme: Global Earliest Deadline First (EDF).
At any scheduling point, the VCPU with earliest deadline has highest priority.

Queue scheme: A global runqueue for each CPU pool.
The runqueue holds all runnable VCPUs.
VCPUs in the runqueue are divided into two parts: with and without remaining budget.
At each part, VCPUs are sorted based on GEDF priority scheme.

Scheduling quanta: 1 ms; but accounting the budget is in microsecond.

-----------------------------------------------------------------------------------------------------------------------------
One scenario to show the functionality of this rt scheduler is as follows:
//list each vcpu's parameters of each domain in cpu pools using rt scheduler
#xl sched-rt
Cpupool Pool-0: sched=EDF
Name                                ID VCPU Period Budget
Domain-0                             0    0     10     10
Domain-0                             0    1     20     20
Domain-0                             0    2     30     30
Domain-0                             0    3     10     10
litmus1                              1    0     10      4
litmus1                              1    1     10      4

//set the parameters of the vcpu 1 of domain litmus1:
# xl sched-rt -d litmus1 -v 1 -p 20 -b 10

//domain litmus1's vcpu 1's parameters are changed, display each VCPU's parameters separately:
# xl sched-rt -d litmus1
Name                                ID VCPU Period Budget
litmus1                              1    0     10      4
litmus1                              1    1     20     10

// list cpupool information
xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0              12        rt       y          2

//create a cpupool test
#xl cpupool-cpu-remove Pool-0 11
#xl cpupool-cpu-remove Pool-0 10
#xl cpupool-create name=\"test\" sched=\"credit\"
#xl cpupool-cpu-add test 11
#xl cpupool-cpu-add test 10
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0              10        rt       y          2
test                 2    credit       y          0

//migrate litmus1 from cpupool Pool-0 to cpupool test.
#xl cpupool-migrate litmus1 test

//now litmus1 is in cpupool test
# xl sched-credit
Cpupool test: tslice=30ms ratelimit=1000us
Name                                ID Weight  Cap
litmus1                              1    256    0

-----------------------------------------------------------------------------------------------------------------------------
The differences between this new rt real-time scheduler and the sedf scheduler are as follows:
1) rt scheduler supports global EDF scheduling, while sedf only supports partitioned scheduling. With the support of vcpu mask, rt scheduler can also be used as partitioned scheduling by setting each VCPU’s cpumask to a specific cpu.
2) rt scheduler supports setting and getting each VCPU’s parameters of a domain. A domain can have multiple vcpus with different parameters, rt scheduler can let user get/set the parameters of each VCPU of a specific domain; (sedf scheduler does not support it now)
3) rt scheduler supports cpupool.
4) rt scheduler uses deferrable server to burn/replenish budget of a VCPU, while sedf uses constrant bandwidth server to burn/replenish budget of a VCPU. This is just two options of implementing a global EDF real-time scheduler and both options’ real-time performance have already been proved in academic.

(Briefly speaking, the functionality that the *SEDF* scheduler plans to implement and improve in the future release has already been supported in this rt scheduler.)
(Although it’s unnecessary to implement two server mechanisms, we can simply modify the two functions of burning and replenishing vcpus’ budget to incorporate the CBS server mechanism or other server mechanisms into this rt scheduler.)

-----------------------------------------------------------------------------------------------------------------------------
TODO:
1) Improve the code of getting/setting each VCPU’s parameters. [easy]
    Right now, it create an array with LIBXL_XEN_LEGACY_MAX_VCPUS (i.e., 32) elements to bounce all VCPUs’ parameters of a domain between xen tool and xen to get all VCPUs’ parameters of a domain. It is unnecessary to have LIBXL_XEN_LEGACY_MAX_VCPUS elements for this array.
    The current work is to first get the exact number of VCPUs of a domain and then create an array with that exact number of elements to bounce between xen tool and xen.
2) Provide microsecond time precision in xl interface instead of millisecond time precision. [easy]
    Right now, rt scheduler let user to specify each VCPU’s parameters (period, budget) in millisecond (i.e., ms). In some real-time application, user may want to specify VCPUs’ parameters in  microsecond (i.e., us). The next work is to let user specify VCPUs’ parameters in microsecond and count the time in microsecond (or nanosecond) in xen rt scheduler as well.
3) Add Xen trace into the rt scheduler. [easy]
    We will add a few xentrace tracepoints, like TRC_CSCHED2_RUNQ_POS in credit2 scheduler, in rt scheduler, to debug via tracing.
4) Method of improving the performance of rt scheduler [future work]
    VCPUs of the same domain may preempt each other based on the preemptive global EDF scheduling policy. This self-switch issue does not bring benefit to the domain but introduce more overhead. When this situation happens, we can simply promote the current running lower-priority VCPU’s priority and let it  borrow budget from higher priority VCPUs to avoid such self-swtich issue.

Timeline of implementing the TODOs:
We plan to finish the TODO 1), 2) and 3) within 3-4 weeks (or earlier).
Because TODO 4) will make the scheduling policy not pure GEDF, (people who wants the real GEDF may not be happy with this.) we look forward to hearing people’s opinions.

-----------------------------------------------------------------------------------------------------------------------------
Special huge thanks to Dario Faggioli for his helpful and detailed comments on the preview version of this rt scheduler. :-)

Any comment, question, and concerns are more than welcome! :-)

Thank you very much!

Meng

[PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor
[PATCH RFC v1 2/4] xl for rt scheduler
[PATCH RFC v1 3/4] libxl for rt scheduler
[PATCH RFC v1 4/4] libxc for rt scheduler

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread