All of lore.kernel.org
 help / color / mirror / Atom feed
* Introduce rtds real-time scheduler for Xen
@ 2014-09-14 21:37 Meng Xu
  2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-14 21:37 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, ptxlinh, xumengpanda, JBeulich,
	chaowang, lichong659, dgolomb

This serie of patches adds rtds real-time scheduler to Xen.

In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler;
2) Assign/display VCPUs' parameters of each domain (All VCPUs of each domain have the same period and budget);
3) Supports CPU Pool
Note: 1)  Although the toolstack only allows users to set the paramters of all VCPUs of the same domain to the same number, the scheduler supports to schedule VCPUs with different parameters of the same domain. In Xen 4.6, we plan to support assign/display each VCPU's parameters of each domain.
      2)  Parameters of a domain at tool stack is in microsecond, instead of millisecond.

Compared with the PATCH v2, this set of patch has the following modifications:
    a) Split one global RunQ to two queues: runq and depletedq, instead of using flag_vcpu;
    b) Remove prv->cpus and use cpupool_scheduler_cpumask to get the valid cpus for the scheduler; (one scheduler per cpupool)
    c) Remove unnecessary printk in sched_rt.c;
    d) Change the scheduler's name from rt to rtds;
    e) Miscellous modification, such as removing trailing spaces.

[PATCH v3 1/4] xen: add real time scheduler rtds
[PATCH v3 2/4] libxc: add rtds scheduler
[PATCH v3 3/4] libxl: add rtds scheduler
[PATCH v3 4/4] xl: introduce rtds scheduler
-----------------------------------------------------------------------------------------------------------------------------
TODO after Xen 4.5:
    a) Burn budget in finer granularity instead of 1ms; [medium]
    b) Use separate timer per vcpu for each vcpu's budget replenishment, instead of scanning the full runqueue every now and then [medium]
    c) Handle time stolen from domU by hypervisor. When it runs on a machine with many sockets and lots of cores, the spin-lock for global RunQ used in rtds scheduler could eat up time from domU, which could make domU have less budget than it requires. [not sure about difficulty right now] (Thank Konrad Rzeszutek to point this out in the XenSummit. :-))
    d) Toolstack supports assiging/display each VCPU's parameters of each domain.

-----------------------------------------------------------------------------------------------------------------------------
The design of this rtds scheduler is as follows:

This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.

Note: cpumask and cpupool is supported.

If you are intersted in the details of the design and evaluation of this rt scheduler, please refer to our paper "Real-Time Multi-Core Virtual Machine Scheduling in Xen" (http://www.cis.upenn.edu/~mengxu/emsoft14/emsoft14.pdf), which will be published in EMSOFT14. This paper has the following details:
    a) Desgin of this scheduler;
    b) Measurement of the implementation overhead, e.g., scheduler overhead, context switch overhead, etc. 
    c) Comparison of this rt scheduler and credit scheduler in terms of the real-time performance.

If you are interested in other real-time schedulers in Xen, please refer to the RT-Xen project's website (https://sites.google.com/site/realtimexen/). It also supports Preemptive Global Rate Monotonic schedulers.
-----------------------------------------------------------------------------------------------------------------------------
One scenario to show the functionality of this rtds scheduler is as follows:
//list the domains
#xl list
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  3344     4     r-----     146.1
vm1                                          1   512     2     r-----     155.1

//list VCPUs' parameters of each domain in cpu pools using rtds scheduler
#xl sched-rtds
Cpupool Pool-0: sched=EDF
Name                                ID    Period    Budget
Domain-0                             0     10000      4000
vm1                                  1     10000      4000

//set VCPUs' parameters of each domain to new value
xl sched-rtds -d Domain-0 -p 20000 -b 10000
//Now all vcpus of Domain-0 have period 20000us and budget 10000us.
#xl sched-rtds
Cpupool Pool-0: sched=EDF
Name                                ID    Period    Budget
Domain-0                             0     20000     10000
vm1                                  1     10000      4000

// list cpupool information
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0               4     rtds       y          2
#xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3

//create a cpupool test
#xl cpupool-cpu-remove Pool-0 3
#xl cpupool-cpu-remove Pool-0 2
#xl cpupool-create name=\"test\" sched=\"rtds\"
#xl cpupool-cpu-add test 3
#xl cpupool-cpu-add test 2
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0               2     rtds       y          2
test                 2     rtds       y          0

//migrate vm1 from cpupool Pool-0 to cpupool test.
#xl cpupool-migrate vm1 test

//now vm1 is in cpupool test
# xl sched-rtds
pupool Pool-0: sched=EDF
Name                                ID    Period    Budget
Domain-0                             0     20000     10000
Cpupool test: sched=EDF
Name                                ID    Period    Budget
vm1                                  1     10000      4000

-----------------------------------------------------------------------------------------------------------------------------
Any comment, question, and concerns are more than welcome! :-)

Thank you very much!

Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
@ 2014-09-14 21:37 ` Meng Xu
  2014-09-15 10:40   ` Jan Beulich
                     ` (2 more replies)
  2014-09-14 21:37 ` [PATCH v3 2/4] libxc: add rtds scheduler Meng Xu
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-14 21:37 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, ptxlinh, xumengpanda, Meng Xu,
	JBeulich, chaowang, lichong659, dgolomb

This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.

Note: cpumask and cpupool is supported.

This is an experimental scheduler.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
---
 xen/common/Makefile         |    1 +
 xen/common/sched_rt.c       | 1126 +++++++++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c       |    1 +
 xen/include/public/domctl.h |    6 +
 xen/include/public/trace.h  |    1 +
 xen/include/xen/sched-if.h  |    1 +
 6 files changed, 1136 insertions(+)
 create mode 100644 xen/common/sched_rt.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3683ae3..5a23aa4 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -26,6 +26,7 @@ obj-y += sched_credit.o
 obj-y += sched_credit2.o
 obj-y += sched_sedf.o
 obj-y += sched_arinc653.o
+obj-y += sched_rt.o
 obj-y += schedule.o
 obj-y += shutdown.o
 obj-y += softirq.o
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
new file mode 100644
index 0000000..7ee2500
--- /dev/null
+++ b/xen/common/sched_rt.c
@@ -0,0 +1,1126 @@
+/*****************************************************************************
+ * Preemptive Global Earliest Deadline First  (EDF) scheduler for Xen
+ * EDF scheduling is a real-time scheduling algorithm used in embedded field.
+ *
+ * by Sisu Xi, 2013, Washington University in Saint Louis
+ * and Meng Xu, 2014, University of Pennsylvania
+ *
+ * based on the code of credit Scheduler
+ */
+
+#include <xen/config.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/delay.h>
+#include <xen/event.h>
+#include <xen/time.h>
+#include <xen/perfc.h>
+#include <xen/sched-if.h>
+#include <xen/softirq.h>
+#include <asm/atomic.h>
+#include <xen/errno.h>
+#include <xen/trace.h>
+#include <xen/cpu.h>
+#include <xen/keyhandler.h>
+#include <xen/trace.h>
+#include <xen/guest_access.h>
+
+/*
+ * TODO:
+ *
+ * Migration compensation and resist like credit2 to better use cache;
+ * Lock Holder Problem, using yield?
+ * Self switch problem: VCPUs of the same domain may preempt each other;
+ */
+
+/*
+ * Design:
+ *
+ * This scheduler follows the Preemptive Global Earliest Deadline First (EDF)
+ * theory in real-time field.
+ * At any scheduling point, the VCPU with earlier deadline has higher priority.
+ * The scheduler always picks highest priority VCPU to run on a feasible PCPU.
+ * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
+ * has a lower-priority VCPU running on it.)
+ *
+ * Each VCPU has a dedicated period and budget.
+ * The deadline of a VCPU is at the end of each period;
+ * A VCPU has its budget replenished at the beginning of each period;
+ * While scheduled, a VCPU burns its budget.
+ * The VCPU needs to finish its budget before its deadline in each period;
+ * The VCPU discards its unused budget at the end of each period.
+ * If a VCPU runs out of budget in a period, it has to wait until next period.
+ *
+ * Each VCPU is implemented as a deferable server.
+ * When a VCPU has a task running on it, its budget is continuously burned;
+ * When a VCPU has no task but with budget left, its budget is preserved.
+ *
+ * Queue scheme:
+ * A global runqueue and a global depletedqueue for each CPU pool.
+ * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The depletedqueue holds all VCPUs without budget, unsorted;
+ *
+ * Note: cpumask and cpupool is supported.
+ */
+
+/*
+ * Locking:
+ * A global system lock is used to protect the RunQ and DepletedQ.
+ * The global lock is referenced by schedule_data.schedule_lock
+ * from all physical cpus.
+ *
+ * The lock is already grabbed when calling wake/sleep/schedule/ functions
+ * in schedule.c
+ *
+ * The functions involes RunQ and needs to grab locks are:
+ *    vcpu_insert, vcpu_remove, context_saved, __runq_insert
+ */
+
+
+/*
+ * Default parameters:
+ * Period and budget in default is 10 and 4 ms, respectively
+ */
+#define RTDS_DEFAULT_PERIOD     (MICROSECS(10000))
+#define RTDS_DEFAULT_BUDGET     (MICROSECS(4000))
+
+/*
+ * Flags
+ */
+/*
+ * RTDS_scheduled: Is this vcpu either running on, or context-switching off,
+ * a phyiscal cpu?
+ * + Accessed only with global lock held.
+ * + Set when chosen as next in rt_schedule().
+ * + Cleared after context switch has been saved in rt_context_saved()
+ * + Checked in vcpu_wake to see if we can add to the Runqueue, or if we should
+ *   set RTDS_delayed_runq_add
+ * + Checked to be false in runq_insert.
+ */
+#define __RTDS_scheduled            1
+#define RTDS_scheduled (1<<__RTDS_scheduled)
+/*
+ * RTDS_delayed_runq_add: Do we need to add this to the RunQ/DepletedQ
+ * once it's done being context switching out?
+ * + Set when scheduling out in rt_schedule() if prev is runable
+ * + Set in rt_vcpu_wake if it finds RTDS_scheduled set
+ * + Read in rt_context_saved(). If set, it adds prev to the Runqueue/DepletedQ
+ *   and clears the bit.
+ */
+#define __RTDS_delayed_runq_add     2
+#define RTDS_delayed_runq_add (1<<__RTDS_delayed_runq_add)
+
+/*
+ * rt tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_RTDS_TICKLE           TRC_SCHED_CLASS_EVT(RTDS, 1)
+#define TRC_RTDS_RUNQ_PICK        TRC_SCHED_CLASS_EVT(RTDS, 2)
+#define TRC_RTDS_BUDGET_BURN      TRC_SCHED_CLASS_EVT(RTDS, 3)
+#define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
+#define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
+#define TRC_RTDS_VCPU_CREATE      TRC_SCHED_CLASS_EVT(RTDS, 6)
+
+/*
+ * Systme-wide private data, include global RunQueue/DepletedQ
+ * Global lock is referenced by schedule_data.schedule_lock from all
+ * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
+ */
+struct rt_private {
+    spinlock_t lock;            /* the global coarse grand lock */
+    struct list_head sdom;      /* list of availalbe domains, used for dump */
+    struct list_head runq;      /* ordered list of runnable vcpus */
+    struct list_head depletedq; /* unordered list of depleted vcpus */
+    cpumask_t tickled;          /* cpus been tickled */
+};
+
+/*
+ * Virtual CPU
+ */
+struct rt_vcpu {
+    struct list_head q_elem;    /* on the runq/depletedq list */
+    struct list_head sdom_elem; /* on the domain VCPU list */
+
+    /* Up-pointers */
+    struct rt_dom *sdom;
+    struct vcpu *vcpu;
+
+    /* VCPU parameters, in nanoseconds */
+    s_time_t period;
+    s_time_t budget;
+
+    /* VCPU current infomation in nanosecond */
+    s_time_t cur_budget;        /* current budget */
+    s_time_t last_start;        /* last start time */
+    s_time_t cur_deadline;      /* current deadline for EDF */
+
+    unsigned flags;             /* mark __RTDS_scheduled, etc.. */
+};
+
+/*
+ * Domain
+ */
+struct rt_dom {
+    struct list_head vcpu;      /* link its VCPUs */
+    struct list_head sdom_elem; /* link list on rt_priv */
+    struct domain *dom;         /* pointer to upper domain */
+};
+
+/*
+ * Useful inline functions
+ */
+static inline struct rt_private *rt_priv(const struct scheduler *ops)
+{
+    return ops->sched_data;
+}
+
+static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
+{
+    return vcpu->sched_priv;
+}
+
+static inline struct rt_dom *rt_dom(const struct domain *dom)
+{
+    return dom->sched_priv;
+}
+
+static inline struct list_head *rt_runq(const struct scheduler *ops)
+{
+    return &rt_priv(ops)->runq;
+}
+
+static inline struct list_head *rt_depletedq(const struct scheduler *ops)
+{
+    return &rt_priv(ops)->depletedq;
+}
+
+/*
+ * Queue helper functions for runq and depletedq
+ */
+static int
+__vcpu_on_q(const struct rt_vcpu *svc)
+{
+   return !list_empty(&svc->q_elem);
+}
+
+static struct rt_vcpu *
+__q_elem(struct list_head *elem)
+{
+    return list_entry(elem, struct rt_vcpu, q_elem);
+}
+
+/*
+ * Debug related code, dump vcpu/cpu information
+ */
+static void
+rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
+{
+    char cpustr[1024];
+    cpumask_t *cpupool_mask;
+
+    ASSERT(svc != NULL);
+    /* flag vcpu */
+    if( svc->sdom == NULL )
+        return;
+
+    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
+    printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
+           " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
+           " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ",
+            svc->vcpu->domain->domain_id,
+            svc->vcpu->vcpu_id,
+            svc->vcpu->processor,
+            svc->period,
+            svc->budget,
+            svc->cur_budget,
+            svc->cur_deadline,
+            svc->last_start,
+            __vcpu_on_q(svc),
+            vcpu_runnable(svc->vcpu),
+            cpustr);
+    memset(cpustr, 0, sizeof(cpustr));
+    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
+    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask);
+    printk("cpupool=%s\n", cpustr);
+}
+
+static void
+rt_dump_pcpu(const struct scheduler *ops, int cpu)
+{
+    struct rt_vcpu *svc = rt_vcpu(curr_on_cpu(cpu));
+
+    rt_dump_vcpu(ops, svc);
+}
+
+static void
+rt_dump(const struct scheduler *ops)
+{
+    struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
+    struct rt_private *prv = rt_priv(ops);
+    struct rt_vcpu *svc;
+    unsigned int cpu = 0;
+    cpumask_t *online;
+    struct rt_dom *sdom;
+    unsigned long flags;
+
+    ASSERT(!list_empty(&prv->sdom));
+    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
+    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+    runq = rt_runq(ops);
+    depletedq = rt_depletedq(ops);
+
+    printk("PCPU info:\n");
+    for_each_cpu(cpu, online)
+        rt_dump_pcpu(ops, cpu);
+
+    printk("Global RunQueue info:\n");
+    spin_lock_irqsave(&prv->lock, flags);
+    list_for_each( iter, runq )
+    {
+        svc = __q_elem(iter);
+        rt_dump_vcpu(ops, svc);
+    }
+
+    printk("Global DepletedQueue info:\n");
+    list_for_each( iter, depletedq )
+    {
+        svc = __q_elem(iter);
+        rt_dump_vcpu(ops, svc);
+    }
+
+    printk("Domain info:\n");
+    list_for_each( iter_sdom, &prv->sdom )
+    {
+        sdom = list_entry(iter_sdom, struct rt_dom, sdom_elem);
+        printk("\tdomain: %d\n", sdom->dom->domain_id);
+
+        list_for_each( iter_svc, &sdom->vcpu )
+        {
+            svc = list_entry(iter_svc, struct rt_vcpu, sdom_elem);
+            rt_dump_vcpu(ops, svc);
+        }
+    }
+
+    spin_unlock_irqrestore(&prv->lock, flags);
+}
+
+/*
+ * update deadline and budget when now >= cur_deadline
+ * it need to be updated to the deadline of the current period
+ */
+static void
+rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
+{
+    ASSERT(now >= svc->cur_deadline);
+    ASSERT(svc->period != 0);
+
+    do
+        svc->cur_deadline += svc->period;
+    while ( svc->cur_deadline <= now );
+
+    svc->cur_budget = svc->budget;
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16,vcpu:16;
+            unsigned cur_deadline_lo, cur_deadline_hi;
+            unsigned cur_budget_lo, cur_budget_hi;
+        } d;
+        d.dom = svc->vcpu->domain->domain_id;
+        d.vcpu = svc->vcpu->vcpu_id;
+        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
+        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
+        d.cur_budget_lo = (unsigned) svc->cur_budget;
+        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+        trace_var(TRC_RTDS_BUDGET_REPLENISH, 1,
+                  sizeof(d),
+                  (unsigned char *) &d);
+    }
+
+    return;
+}
+
+static inline void
+__q_remove(struct rt_vcpu *svc)
+{
+    if ( __vcpu_on_q(svc) )
+        list_del_init(&svc->q_elem);
+}
+
+/*
+ * Insert svc with budget in RunQ according to EDF:
+ * vcpus with smaller deadlines go first.
+ * Insert svc without budget in DepletedQ unsorted;
+ */
+static void
+__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+{
+    struct rt_private *prv = rt_priv(ops);
+    struct list_head *runq = rt_runq(ops);
+    struct list_head *iter;
+
+    ASSERT( spin_is_locked(&prv->lock) );
+
+    ASSERT( !__vcpu_on_q(svc) );
+
+    /* add svc to runq if svc still has budget */
+    if ( svc->cur_budget > 0 )
+    {
+        list_for_each(iter, runq)
+        {
+            struct rt_vcpu * iter_svc = __q_elem(iter);
+            if ( svc->cur_deadline <= iter_svc->cur_deadline )
+                    break;
+         }
+        list_add_tail(&svc->q_elem, iter);
+    }
+    else
+    {
+        list_add(&svc->q_elem, &prv->depletedq);
+    }
+}
+
+/*
+ * Init/Free related code
+ */
+static int
+rt_init(struct scheduler *ops)
+{
+    struct rt_private *prv = xzalloc(struct rt_private);
+
+    printk("Initializing RTDS scheduler\n" \
+           "WARNING: This is experimental software in development.\n" \
+           "Use at your own risk.\n");
+
+    if ( prv == NULL )
+        return -ENOMEM;
+
+    spin_lock_init(&prv->lock);
+    INIT_LIST_HEAD(&prv->sdom);
+    INIT_LIST_HEAD(&prv->runq);
+    INIT_LIST_HEAD(&prv->depletedq);
+
+    cpumask_clear(&prv->tickled);
+
+    ops->sched_data = prv;
+
+    return 0;
+}
+
+static void
+rt_deinit(const struct scheduler *ops)
+{
+    struct rt_private *prv = rt_priv(ops);
+
+    xfree(prv);
+}
+
+/*
+ * Point per_cpu spinlock to the global system lock;
+ * All cpu have same global system lock
+ */
+static void *
+rt_alloc_pdata(const struct scheduler *ops, int cpu)
+{
+    struct rt_private *prv = rt_priv(ops);
+
+    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+
+    /* 1 indicates alloc. succeed in schedule.c */
+    return (void *)1;
+}
+
+static void
+rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+{
+    per_cpu(schedule_data, cpu).schedule_lock = NULL;
+}
+
+static void *
+rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
+{
+    unsigned long flags;
+    struct rt_dom *sdom;
+    struct rt_private * prv = rt_priv(ops);
+
+    sdom = xzalloc(struct rt_dom);
+    if ( sdom == NULL )
+        return NULL;
+
+    INIT_LIST_HEAD(&sdom->vcpu);
+    INIT_LIST_HEAD(&sdom->sdom_elem);
+    sdom->dom = dom;
+
+    /* spinlock here to insert the dom */
+    spin_lock_irqsave(&prv->lock, flags);
+    list_add_tail(&sdom->sdom_elem, &(prv->sdom));
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    return sdom;
+}
+
+static void
+rt_free_domdata(const struct scheduler *ops, void *data)
+{
+    unsigned long flags;
+    struct rt_dom *sdom = data;
+    struct rt_private *prv = rt_priv(ops);
+
+    spin_lock_irqsave(&prv->lock, flags);
+    list_del_init(&sdom->sdom_elem);
+    spin_unlock_irqrestore(&prv->lock, flags);
+    xfree(data);
+}
+
+static int
+rt_dom_init(const struct scheduler *ops, struct domain *dom)
+{
+    struct rt_dom *sdom;
+
+    /* IDLE Domain does not link on rt_private */
+    if ( is_idle_domain(dom) )
+        return 0;
+
+    sdom = rt_alloc_domdata(ops, dom);
+    if ( sdom == NULL )
+        return -ENOMEM;
+
+    dom->sched_priv = sdom;
+
+    return 0;
+}
+
+static void
+rt_dom_destroy(const struct scheduler *ops, struct domain *dom)
+{
+    rt_free_domdata(ops, rt_dom(dom));
+}
+
+static void *
+rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+{
+    struct rt_vcpu *svc;
+    long count;
+    s_time_t now = NOW();
+
+    /* Allocate per-VCPU info */
+    svc = xzalloc(struct rt_vcpu);
+    if ( svc == NULL )
+        return NULL;
+
+    INIT_LIST_HEAD(&svc->q_elem);
+    INIT_LIST_HEAD(&svc->sdom_elem);
+    svc->flags = 0U;
+    svc->sdom = dd;
+    svc->vcpu = vc;
+    svc->last_start = 0;
+
+    svc->period = RTDS_DEFAULT_PERIOD;
+    if ( !is_idle_vcpu(vc) )
+        svc->budget = RTDS_DEFAULT_BUDGET;
+
+    ASSERT( now >= svc->cur_deadline );
+
+    count = now / svc->period + 1;
+    svc->cur_deadline = count * svc->period;
+    svc->cur_budget = svc->budget;
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16,vcpu:16;
+            unsigned cur_deadline_lo, cur_deadline_hi;
+            unsigned cur_budget_lo, cur_budget_hi;
+        } d;
+        d.dom = svc->vcpu->domain->domain_id;
+        d.vcpu = svc->vcpu->vcpu_id;
+        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
+        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
+        d.cur_budget_lo = (unsigned) svc->cur_budget;
+        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+        trace_var(TRC_RTDS_VCPU_CREATE, 1,
+                  sizeof(d),
+                  (unsigned char *) &d);
+    }
+
+    return svc;
+}
+
+static void
+rt_free_vdata(const struct scheduler *ops, void *priv)
+{
+    struct rt_vcpu *svc = priv;
+
+    xfree(svc);
+}
+
+/*
+ * This function is called in sched_move_domain() in schedule.c
+ * When move a domain to a new cpupool.
+ * It inserts vcpus of moving domain to the scheduler's RunQ in
+ * dest. cpupool; and insert rt_vcpu svc to scheduler-specific
+ * vcpu list of the dom
+ */
+static void
+rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_private *prv = rt_priv(ops);
+    struct rt_vcpu *svc = rt_vcpu(vc);
+
+    /* not addlocate idle vcpu to dom vcpu list */
+    if ( is_idle_vcpu(vc) )
+        return;
+
+    if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) && !vc->is_running )
+    {
+        if ( svc->cur_budget > 0 )
+            __runq_insert(ops, svc);
+        else
+            list_add(&svc->q_elem, &prv->depletedq);
+    }
+
+    /* add rt_vcpu svc to scheduler-specific vcpu list of the dom */
+    list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu);
+}
+
+/*
+ * Remove rt_vcpu svc from the old scheduler in source cpupool; and
+ * Remove rt_vcpu svc from scheduler-specific vcpu list of the dom
+ */
+static void
+rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_dom * const sdom = svc->sdom;
+
+    BUG_ON( sdom == NULL );
+
+    if ( __vcpu_on_q(svc) )
+        __q_remove(svc);
+
+    if ( !is_idle_vcpu(vc) )
+        list_del_init(&svc->sdom_elem);
+}
+
+/*
+ * Pick a valid CPU for the vcpu vc
+ * Valid CPU of a vcpu is intesection of vcpu's affinity
+ * and available cpus
+ */
+static int
+rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+{
+    cpumask_t cpus;
+    cpumask_t *online;
+    int cpu;
+
+    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
+    cpumask_and(&cpus, online, vc->cpu_hard_affinity);
+
+    cpu = cpumask_test_cpu(vc->processor, &cpus)
+            ? vc->processor
+            : cpumask_cycle(vc->processor, &cpus);
+    ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
+
+    return cpu;
+}
+
+/*
+ * Burn budget in nanosecond granularity
+ */
+static void
+burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
+{
+    s_time_t delta;
+
+    /* don't burn budget for idle VCPU */
+    if ( is_idle_vcpu(svc->vcpu) )
+        return;
+
+    /* burn at nanoseconds level */
+    delta = now - svc->last_start;
+    /*
+     * delta < 0 only happens in nested virtualization;
+     * TODO: how should we handle delta < 0 in a better way?
+     */
+    if ( delta < 0 )
+    {
+        printk("%s, ATTENTION: now is behind last_start! delta = %ld\n",
+                __func__, delta);
+        svc->last_start = now;
+        return;
+    }
+
+    svc->cur_budget -= delta;
+
+    if ( svc->cur_budget < 0 )
+        svc->cur_budget = 0;
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16, vcpu:16;
+            unsigned cur_budget_lo;
+            unsigned cur_budget_hi;
+            int delta;
+        } d;
+        d.dom = svc->vcpu->domain->domain_id;
+        d.vcpu = svc->vcpu->vcpu_id;
+        d.cur_budget_lo = (unsigned) svc->cur_budget;
+        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+        d.delta = delta;
+        trace_var(TRC_RTDS_BUDGET_BURN, 1,
+                  sizeof(d),
+                  (unsigned char *) &d);
+    }
+}
+
+/*
+ * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
+ * lock is grabbed before calling this function
+ */
+static struct rt_vcpu *
+__runq_pick(const struct scheduler *ops, cpumask_t *mask)
+{
+    struct list_head *runq = rt_runq(ops);
+    struct list_head *iter;
+    struct rt_vcpu *svc = NULL;
+    struct rt_vcpu *iter_svc = NULL;
+    cpumask_t cpu_common;
+    cpumask_t *online;
+
+    list_for_each(iter, runq)
+    {
+        iter_svc = __q_elem(iter);
+
+        /* mask cpu_hard_affinity & cpupool & mask */
+        online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
+        cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
+        cpumask_and(&cpu_common, mask, &cpu_common);
+        if ( cpumask_empty(&cpu_common) )
+            continue;
+
+        ASSERT( iter_svc->cur_budget > 0 );
+
+        svc = iter_svc;
+        break;
+    }
+
+    /* TRACE */
+    {
+        if( svc != NULL )
+        {
+            struct {
+                unsigned dom:16, vcpu:16;
+                unsigned cur_deadline_lo, cur_deadline_hi;
+                unsigned cur_budget_lo, cur_budget_hi;
+            } d;
+            d.dom = svc->vcpu->domain->domain_id;
+            d.vcpu = svc->vcpu->vcpu_id;
+            d.cur_deadline_lo = (unsigned) svc->cur_deadline;
+            d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
+            d.cur_budget_lo = (unsigned) svc->cur_budget;
+            d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+            trace_var(TRC_RTDS_RUNQ_PICK, 1,
+                      sizeof(d),
+                      (unsigned char *) &d);
+        }
+        else
+            trace_var(TRC_RTDS_RUNQ_PICK, 1, 0, NULL);
+    }
+
+    return svc;
+}
+
+/*
+ * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq
+ * lock is grabbed before calling this function
+ */
+static void
+__repl_update(const struct scheduler *ops, s_time_t now)
+{
+    struct list_head *runq = rt_runq(ops);
+    struct list_head *depletedq = rt_depletedq(ops);
+    struct list_head *iter;
+    struct list_head *tmp;
+    struct rt_vcpu *svc = NULL;
+
+    list_for_each_safe(iter, tmp, runq)
+    {
+        svc = __q_elem(iter);
+
+        if ( now >= svc->cur_deadline )
+        {
+            rt_update_deadline(now, svc);
+            /* reinsert the vcpu if its deadline is updated */
+            __q_remove(svc);
+            __runq_insert(ops, svc);
+        }
+        else
+            break;
+    }
+
+    list_for_each_safe(iter, tmp, depletedq)
+    {
+        svc = __q_elem(iter);
+        if ( now >= svc->cur_deadline )
+        {
+            rt_update_deadline(now, svc);
+            __q_remove(svc); /* remove from depleted queue */
+            __runq_insert(ops, svc); /* add to runq */
+        }
+    }
+}
+
+/*
+ * schedule function for rt scheduler.
+ * The lock is already grabbed in schedule.c, no need to lock here
+ */
+static struct task_slice
+rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+{
+    const int cpu = smp_processor_id();
+    struct rt_private *prv = rt_priv(ops);
+    struct rt_vcpu *const scurr = rt_vcpu(current);
+    struct rt_vcpu *snext = NULL;
+    struct task_slice ret = { .migrated = 0 };
+
+    /* clear ticked bit now that we've been scheduled */
+    cpumask_clear_cpu(cpu, &prv->tickled);
+
+    /* burn_budget would return for IDLE VCPU */
+    burn_budget(ops, scurr, now);
+
+    __repl_update(ops, now);
+
+    if ( tasklet_work_scheduled )
+    {
+        snext = rt_vcpu(idle_vcpu[cpu]);
+    }
+    else
+    {
+        cpumask_t cur_cpu;
+        cpumask_clear(&cur_cpu);
+        cpumask_set_cpu(cpu, &cur_cpu);
+        snext = __runq_pick(ops, &cur_cpu);
+        if ( snext == NULL )
+            snext = rt_vcpu(idle_vcpu[cpu]);
+
+        /* if scurr has higher priority and budget, still pick scurr */
+        if ( !is_idle_vcpu(current) &&
+             vcpu_runnable(current) &&
+             scurr->cur_budget > 0 &&
+             ( is_idle_vcpu(snext->vcpu) ||
+               scurr->cur_deadline <= snext->cur_deadline ) )
+            snext = scurr;
+    }
+
+    if ( snext != scurr &&
+         !is_idle_vcpu(current) &&
+         vcpu_runnable(current) )
+        set_bit(__RTDS_delayed_runq_add, &scurr->flags);
+
+    snext->last_start = now;
+    if ( !is_idle_vcpu(snext->vcpu) )
+    {
+        if ( snext != scurr )
+        {
+            __q_remove(snext);
+            set_bit(__RTDS_scheduled, &snext->flags);
+        }
+        if ( snext->vcpu->processor != cpu )
+        {
+            snext->vcpu->processor = cpu;
+            ret.migrated = 1;
+        }
+    }
+
+    ret.time = MILLISECS(1); /* sched quantum */
+    ret.task = snext->vcpu;
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16,vcpu:16;
+            unsigned cur_deadline_lo, cur_deadline_hi;
+            unsigned cur_budget_lo, cur_budget_hi;
+        } d;
+        d.dom = snext->vcpu->domain->domain_id;
+        d.vcpu = snext->vcpu->vcpu_id;
+        d.cur_deadline_lo = (unsigned) snext->cur_deadline;
+        d.cur_deadline_hi = (unsigned) (snext->cur_deadline >> 32);
+        d.cur_budget_lo = (unsigned) snext->cur_budget;
+        d.cur_budget_hi = (unsigned) (snext->cur_budget >> 32);
+        trace_var(TRC_RTDS_SCHED_TASKLET, 1,
+                  sizeof(d),
+                  (unsigned char *)&d);
+    }
+
+    return ret;
+}
+
+/*
+ * Remove VCPU from RunQ
+ * The lock is already grabbed in schedule.c, no need to lock here
+ */
+static void
+rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * const svc = rt_vcpu(vc);
+
+    BUG_ON( is_idle_vcpu(vc) );
+
+    if ( curr_on_cpu(vc->processor) == vc )
+        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    else if ( __vcpu_on_q(svc) )
+        __q_remove(svc);
+    else if ( test_bit(__RTDS_delayed_runq_add, &svc->flags) )
+        clear_bit(__RTDS_delayed_runq_add, &svc->flags);
+}
+
+/*
+ * Pick a cpu where to run a vcpu, possibly kicking out the vcpu running there
+ * Called by wake() and context_saved()
+ * We have a running candidate here, the kick logic is:
+ * Among all the cpus that are within the cpu affinity
+ * 1) if the new->cpu is idle, kick it. This could benefit cache hit
+ * 2) if there are any idle vcpu, kick it.
+ * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
+ *    if snext has higher priority, kick it.
+ *
+ * TODO:
+ * 1) what if these two vcpus belongs to the same domain?
+ *    replace a vcpu belonging to the same domain introduces more overhead
+ *
+ * lock is grabbed before calling this function
+ */
+static void
+runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
+{
+    struct rt_private *prv = rt_priv(ops);
+    struct rt_vcpu *latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
+    struct rt_vcpu *iter_svc;
+    struct vcpu *iter_vc;
+    int cpu = 0, cpu_to_tickle = 0;
+    cpumask_t not_tickled;
+    cpumask_t *online;
+
+    if ( new == NULL || is_idle_vcpu(new->vcpu) )
+        return;
+
+    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
+    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
+    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
+
+    /* 1) if new's previous cpu is idle, kick it for cache benefit */
+    if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
+    {
+        cpu_to_tickle = new->vcpu->processor;
+        goto out;
+    }
+
+    /* 2) if there are any idle pcpu, kick it */
+    /* The same loop also find the one with lowest priority */
+    for_each_cpu(cpu, &not_tickled)
+    {
+        iter_vc = curr_on_cpu(cpu);
+        if ( is_idle_vcpu(iter_vc) )
+        {
+            cpu_to_tickle = cpu;
+            goto out;
+        }
+        iter_svc = rt_vcpu(iter_vc);
+        if ( latest_deadline_vcpu == NULL ||
+             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
+            latest_deadline_vcpu = iter_svc;
+    }
+
+    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
+    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
+    {
+        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
+        goto out;
+    }
+
+out:
+    /* TRACE */
+    {
+        struct {
+            unsigned cpu:8, pad:24;
+        } d;
+        d.cpu = cpu_to_tickle;
+        d.pad = 0;
+        trace_var(TRC_RTDS_TICKLE, 0,
+                  sizeof(d),
+                  (unsigned char *)&d);
+    }
+
+    cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
+    cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
+    return;
+}
+
+/*
+ * Should always wake up runnable vcpu, put it back to RunQ.
+ * Check priority to raise interrupt
+ * The lock is already grabbed in schedule.c, no need to lock here
+ * TODO: what if these two vcpus belongs to the same domain?
+ */
+static void
+rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * const svc = rt_vcpu(vc);
+    s_time_t now = NOW();
+    struct rt_private *prv = rt_priv(ops);
+    struct rt_vcpu *snext = NULL;        /* highest priority on RunQ */
+    struct rt_dom *sdom = NULL;
+    cpumask_t *online;
+
+    BUG_ON( is_idle_vcpu(vc) );
+
+    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+        return;
+
+    /* on RunQ/DepletedQ, just update info is ok */
+    if ( unlikely(__vcpu_on_q(svc)) )
+        return;
+
+    /* If context hasn't been saved for this vcpu yet, we can't put it on
+     * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
+     * put on the Runqueue/DepletedQ after the context has been saved.
+     */
+    if ( unlikely(test_bit(__RTDS_scheduled, &svc->flags)) )
+    {
+        set_bit(__RTDS_delayed_runq_add, &svc->flags);
+        return;
+    }
+
+    if ( now >= svc->cur_deadline)
+        rt_update_deadline(now, svc);
+
+    /* insert svc to runq/depletedq because svc is not in queue now */
+    if ( svc->cur_budget > 0 )
+        __runq_insert(ops, svc);
+    else
+        list_add(&svc->q_elem, &prv->depletedq);
+
+    __repl_update(ops, now);
+
+    ASSERT(!list_empty(&prv->sdom));
+    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
+    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+    snext = __runq_pick(ops, online);    /* pick snext from ALL valid cpus */
+
+    runq_tickle(ops, snext);
+
+    return;
+}
+
+/*
+ * scurr has finished context switch, insert it back to the RunQ,
+ * and then pick the highest priority vcpu from runq to run
+ */
+static void
+rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_vcpu *snext = NULL;
+    struct rt_dom *sdom = NULL;
+    struct rt_private *prv = rt_priv(ops);
+    cpumask_t *online;
+    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+
+    clear_bit(__RTDS_scheduled, &svc->flags);
+    /* not insert idle vcpu to runq */
+    if ( is_idle_vcpu(vc) )
+        goto out;
+
+    if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
+         likely(vcpu_runnable(vc)) )
+    {
+        __runq_insert(ops, svc);
+        __repl_update(ops, NOW());
+
+        ASSERT(!list_empty(&prv->sdom));
+        sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
+        online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+        snext = __runq_pick(ops, online);    /* pick snext from ALL cpus */
+
+        runq_tickle(ops, snext);
+    }
+out:
+    vcpu_schedule_unlock_irq(lock, vc);
+}
+
+/*
+ * set/get each vcpu info of each domain
+ */
+static int
+rt_dom_cntl(
+    const struct scheduler *ops,
+    struct domain *d,
+    struct xen_domctl_scheduler_op *op)
+{
+    struct rt_dom * const sdom = rt_dom(d);
+    struct rt_vcpu *svc;
+    struct list_head *iter;
+    int rc = 0;
+
+    switch ( op->cmd )
+    {
+    case XEN_DOMCTL_SCHEDOP_getinfo:
+        /* for debug use, whenever adjust Dom0 parameter, do global dump */
+        if ( d->domain_id == 0 )
+            rt_dump(ops);
+
+        svc = list_entry(sdom->vcpu.next, struct rt_vcpu, sdom_elem);
+        op->u.rtds.period = svc->period / MICROSECS(1); /* transfer to us */
+        op->u.rtds.budget = svc->budget / MICROSECS(1);
+        break;
+    case XEN_DOMCTL_SCHEDOP_putinfo:
+        list_for_each( iter, &sdom->vcpu )
+        {
+            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem);
+            svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
+            svc->budget = MICROSECS(op->u.rtds.budget);
+        }
+        break;
+    }
+
+    return rc;
+}
+
+static struct rt_private _rt_priv;
+
+const struct scheduler sched_rtds_def = {
+    .name           = "SMP RTDS Scheduler",
+    .opt_name       = "rtds",
+    .sched_id       = XEN_SCHEDULER_RTDS,
+    .sched_data     = &_rt_priv,
+
+    .dump_cpu_state = rt_dump_pcpu,
+    .dump_settings  = rt_dump,
+    .init           = rt_init,
+    .deinit         = rt_deinit,
+    .alloc_pdata    = rt_alloc_pdata,
+    .free_pdata     = rt_free_pdata,
+    .alloc_domdata  = rt_alloc_domdata,
+    .free_domdata   = rt_free_domdata,
+    .init_domain    = rt_dom_init,
+    .destroy_domain = rt_dom_destroy,
+    .alloc_vdata    = rt_alloc_vdata,
+    .free_vdata     = rt_free_vdata,
+    .insert_vcpu    = rt_vcpu_insert,
+    .remove_vcpu    = rt_vcpu_remove,
+
+    .adjust         = rt_dom_cntl,
+
+    .pick_cpu       = rt_cpu_pick,
+    .do_schedule    = rt_schedule,
+    .sleep          = rt_vcpu_sleep,
+    .wake           = rt_vcpu_wake,
+    .context_saved  = rt_context_saved,
+};
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 73cc2ea..6285a6e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
     &sched_credit_def,
     &sched_credit2_def,
     &sched_arinc653_def,
+    &sched_rtds_def,
 };
 
 static struct scheduler __read_mostly ops;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 69a8b44..951289b 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -347,6 +347,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
 #define XEN_SCHEDULER_CREDIT   5
 #define XEN_SCHEDULER_CREDIT2  6
 #define XEN_SCHEDULER_ARINC653 7
+#define XEN_SCHEDULER_RTDS     8
+
 /* Set or get info? */
 #define XEN_DOMCTL_SCHEDOP_putinfo 0
 #define XEN_DOMCTL_SCHEDOP_getinfo 1
@@ -368,6 +370,10 @@ struct xen_domctl_scheduler_op {
         struct xen_domctl_sched_credit2 {
             uint16_t weight;
         } credit2;
+        struct xen_domctl_sched_rtds{
+            uint32_t period;
+            uint32_t budget;
+        } rtds;
     } u;
 };
 typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
index cfcf4aa..5211ae7 100644
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -77,6 +77,7 @@
 #define TRC_SCHED_CSCHED2  1
 #define TRC_SCHED_SEDF     2
 #define TRC_SCHED_ARINC653 3
+#define TRC_SCHED_RTDS     4
 
 /* Per-scheduler tracing */
 #define TRC_SCHED_CLASS_EVT(_c, _e) \
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 4164dff..7cc25c6 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -169,6 +169,7 @@ extern const struct scheduler sched_sedf_def;
 extern const struct scheduler sched_credit_def;
 extern const struct scheduler sched_credit2_def;
 extern const struct scheduler sched_arinc653_def;
+extern const struct scheduler sched_rtds_def;
 
 
 struct cpupool
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/4] libxc: add rtds scheduler
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
  2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
@ 2014-09-14 21:37 ` Meng Xu
  2014-09-18 16:15   ` George Dunlap
  2014-09-14 21:37 ` [PATCH v3 3/4] libxl: " Meng Xu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-14 21:37 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, ptxlinh, xumengpanda, Meng Xu,
	JBeulich, chaowang, lichong659, dgolomb

Add xc_sched_rtds_* functions to interact with Xen to set/get domain's
parameters for rtds scheduler.
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/Makefile  |    1 +
 tools/libxc/xc_rt.c   |   65 +++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h |    7 ++++++
 3 files changed, 73 insertions(+)
 create mode 100644 tools/libxc/xc_rt.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 3b04027..8db0d97 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -20,6 +20,7 @@ CTRL_SRCS-y       += xc_sedf.c
 CTRL_SRCS-y       += xc_csched.c
 CTRL_SRCS-y       += xc_csched2.c
 CTRL_SRCS-y       += xc_arinc653.c
+CTRL_SRCS-y       += xc_rt.c
 CTRL_SRCS-y       += xc_tbuf.c
 CTRL_SRCS-y       += xc_pm.c
 CTRL_SRCS-y       += xc_cpu_hotplug.c
diff --git a/tools/libxc/xc_rt.c b/tools/libxc/xc_rt.c
new file mode 100644
index 0000000..b2d1cc5
--- /dev/null
+++ b/tools/libxc/xc_rt.c
@@ -0,0 +1,65 @@
+/****************************************************************************
+ *
+ *        File: xc_rt.c
+ *      Author: Sisu Xi
+ *              Meng Xu
+ *
+ * Description: XC Interface to the rtds scheduler
+ * Note: VCPU's parameter (period, budget) is in microsecond (us).
+ *       All VCPUs of the same domain have same period and budget.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "xc_private.h"
+
+int xc_sched_rtds_domain_set(xc_interface *xch,
+                           uint32_t domid,
+                           struct xen_domctl_sched_rtds *sdom)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_scheduler_op;
+    domctl.domain = (domid_t) domid;
+    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RTDS;
+    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo;
+    domctl.u.scheduler_op.u.rtds.period = sdom->period;
+    domctl.u.scheduler_op.u.rtds.budget = sdom->budget;
+
+    rc = do_domctl(xch, &domctl);
+
+    return rc;
+}
+
+int xc_sched_rtds_domain_get(xc_interface *xch,
+                           uint32_t domid,
+                           struct xen_domctl_sched_rtds *sdom)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_scheduler_op;
+    domctl.domain = (domid_t) domid;
+    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RTDS;
+    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_getinfo;
+
+    rc = do_domctl(xch, &domctl);
+
+    if ( rc == 0 )
+        *sdom = domctl.u.scheduler_op.u.rtds;
+
+    return rc;
+}
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1c8aa42..453ed2f 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -875,6 +875,13 @@ int xc_sched_credit2_domain_get(xc_interface *xch,
                                uint32_t domid,
                                struct xen_domctl_sched_credit2 *sdom);
 
+int xc_sched_rtds_domain_set(xc_interface *xch,
+                            uint32_t domid,
+                            struct xen_domctl_sched_rtds *sdom);
+int xc_sched_rtds_domain_get(xc_interface *xch,
+                            uint32_t domid,
+                            struct xen_domctl_sched_rtds *sdom);
+
 int
 xc_sched_arinc653_schedule_set(
     xc_interface *xch,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
  2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
  2014-09-14 21:37 ` [PATCH v3 2/4] libxc: add rtds scheduler Meng Xu
@ 2014-09-14 21:37 ` Meng Xu
  2014-09-15 22:07   ` Ian Campbell
                     ` (2 more replies)
  2014-09-14 21:37 ` [PATCH v3 4/4] xl: introduce " Meng Xu
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-14 21:37 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, ptxlinh, xumengpanda, Meng Xu,
	JBeulich, chaowang, lichong659, dgolomb

Add libxl functions to set/get domain's parameters for rtds scheduler
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
---
 tools/libxl/libxl.c         |   72 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h         |    1 +
 tools/libxl/libxl_types.idl |    2 ++
 3 files changed, 75 insertions(+)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ad3495a..e202c46 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5217,6 +5217,72 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
     return 0;
 }
 
+static int sched_rtds_domain_get(libxl__gc *gc, uint32_t domid,
+                               libxl_domain_sched_params *scinfo)
+{
+    struct xen_domctl_sched_rtds sdom;
+    int rc;
+
+    rc = xc_sched_rtds_domain_get(CTX->xch, domid, &sdom);
+    if (rc != 0) {
+        LOGE(ERROR, "getting domain sched rtds");
+        return ERROR_FAIL;
+    }
+
+    libxl_domain_sched_params_init(scinfo);
+
+    scinfo->sched = LIBXL_SCHEDULER_RTDS;
+    scinfo->period = sdom.period;
+    scinfo->budget = sdom.budget;
+
+    return 0;
+}
+
+static int sched_rtds_domain_set(libxl__gc *gc, uint32_t domid,
+                               const libxl_domain_sched_params *scinfo)
+{
+    struct xen_domctl_sched_rtds sdom;
+    int rc;
+
+    rc = xc_sched_rtds_domain_get(CTX->xch, domid, &sdom);
+    if (rc != 0) {
+        LOGE(ERROR, "getting domain sched rtds");
+        return ERROR_FAIL;
+    }
+
+    if (scinfo->period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) {
+        if (scinfo->period < 1) {
+            LOG(ERROR, "VCPU period is not set or out of range, "
+                       "valid values are larger than 1");
+            return ERROR_INVAL;
+        }
+        sdom.period = scinfo->period;
+    }
+
+    if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT) {
+        if (scinfo->budget < 1) {
+            LOG(ERROR, "VCPU budget is not set or out of range, "
+                       "valid values are larger than 1");
+            return ERROR_INVAL;
+        }
+        sdom.budget = scinfo->budget;
+    }
+
+    if (sdom.budget > sdom.period) {
+        LOG(ERROR, "VCPU budget is larger than VCPU period, "
+                   "VCPU budget should be no larger than VCPU period");
+        return ERROR_INVAL;
+    }
+
+    rc = xc_sched_rtds_domain_set(CTX->xch, domid, &sdom);
+    if (rc < 0) {
+        LOGE(ERROR, "setting domain sched rtds");
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
                                   const libxl_domain_sched_params *scinfo)
 {
@@ -5240,6 +5306,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_ARINC653:
         ret=sched_arinc653_domain_set(gc, domid, scinfo);
         break;
+    case LIBXL_SCHEDULER_RTDS:
+        ret=sched_rtds_domain_set(gc, domid, scinfo);
+        break;
     default:
         LOG(ERROR, "Unknown scheduler");
         ret=ERROR_INVAL;
@@ -5270,6 +5339,9 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_CREDIT2:
         ret=sched_credit2_domain_get(gc, domid, scinfo);
         break;
+    case LIBXL_SCHEDULER_RTDS:
+        ret=sched_rtds_domain_get(gc, domid, scinfo);
+        break;
     default:
         LOG(ERROR, "Unknown scheduler");
         ret=ERROR_INVAL;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5136d02..00badcc 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1293,6 +1293,7 @@ int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid,
 #define LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT     -1
 #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT   -1
 #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1
+#define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT    -1
 
 int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
                                   libxl_domain_sched_params *params);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f1fcbc3..15234d6 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -156,6 +156,7 @@ libxl_scheduler = Enumeration("scheduler", [
     (5, "credit"),
     (6, "credit2"),
     (7, "arinc653"),
+    (8, "rtds"),
     ])
 
 # Consistent with SHUTDOWN_* in sched.h (apart from UNKNOWN)
@@ -318,6 +319,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
     ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
     ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
     ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
+    ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
     ])
 
 libxl_domain_build_info = Struct("domain_build_info",[
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/4] xl: introduce rtds scheduler
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
                   ` (2 preceding siblings ...)
  2014-09-14 21:37 ` [PATCH v3 3/4] libxl: " Meng Xu
@ 2014-09-14 21:37 ` Meng Xu
  2014-09-15 22:18   ` Ian Campbell
                     ` (2 more replies)
  2014-09-16  7:43 ` Introduce rtds real-time scheduler for Xen Dario Faggioli
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-14 21:37 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, ptxlinh, xumengpanda, Meng Xu,
	JBeulich, chaowang, lichong659, dgolomb

Add xl command for rtds scheduler
Note: VCPU's parameter (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
---
 docs/man/xl.pod.1         |   35 +++++++++++++
 tools/libxl/xl.h          |    1 +
 tools/libxl/xl_cmdimpl.c  |  121 +++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c |    8 +++
 4 files changed, 165 insertions(+)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 9d1c2a5..4c87292 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1035,6 +1035,41 @@ Restrict output to domains in the specified cpupool.
 
 =back
 
+=item B<sched-rtds> [I<OPTIONS>]
+
+Set or get rtds (Real Time Deferrable Server) scheduler parameters.
+This rt scheduler applies Preemptive Global Earliest Deadline First
+real-time scheduling algorithm to schedule VCPUs in the system.
+Each VCPU has a dedicated period and budget.
+VCPUs in the same domain have the same period and budget (in Xen 4.5).
+While scheduled, a VCPU burns its budget.
+A VCPU has its budget replenished at the beginning of each period;
+Unused budget is discarded at the end of each period.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-d DOMAIN>, B<--domain=DOMAIN>
+
+Specify domain for which scheduler parameters are to be modified or retrieved.
+Mandatory for modifying scheduler parameters.
+
+=item B<-p PERIOD>, B<--period=PERIOD>
+
+Period of time, in microseconds, over which to replenish the budget.
+
+=item B<-b BUDGET>, B<--budget=BUDGET>
+
+Amount of time, in microseconds, that the VCPU will be allowed
+to run every period.
+
+=item B<-c CPUPOOL>, B<--cpupool=CPUPOOL>
+
+Restrict output to domains in the specified cpupool.
+
+=back
+
 =back
 
 =head1 CPUPOOLS COMMANDS
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..6a6a0f9 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -67,6 +67,7 @@ int main_memset(int argc, char **argv);
 int main_sched_credit(int argc, char **argv);
 int main_sched_credit2(int argc, char **argv);
 int main_sched_sedf(int argc, char **argv);
+int main_sched_rtds(int argc, char **argv);
 int main_domid(int argc, char **argv);
 int main_domname(int argc, char **argv);
 int main_rename(int argc, char **argv);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 26492fc..296599d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5222,6 +5222,47 @@ static int sched_sedf_domain_output(
     return 0;
 }
 
+static int sched_rtds_domain_output(
+    int domid)
+{
+    char *domname;
+    libxl_domain_sched_params scinfo;
+    int rc = 0;
+
+    if (domid < 0) {
+        printf("%-33s %4s %9s %9s\n", "Name", "ID", "Period", "Budget");
+        return 0;
+    }
+
+    libxl_domain_sched_params_init(&scinfo);
+    rc = sched_domain_get(LIBXL_SCHEDULER_RTDS, domid, &scinfo);
+    if (rc)
+        goto out;
+
+    domname = libxl_domid_to_name(ctx, domid);
+    printf("%-33s %4d %9d %9d\n",
+        domname,
+        domid,
+        scinfo.period,
+        scinfo.budget);
+    free(domname);
+
+out:
+    libxl_domain_sched_params_dispose(&scinfo);
+    return rc;
+}
+
+static int sched_rtds_pool_output(uint32_t poolid)
+{
+    char *poolname;
+
+    poolname = libxl_cpupoolid_to_name(ctx, poolid);
+    printf("Cpupool %s: sched=RTDS algorithm=EDF\n", poolname);
+
+    free(poolname);
+    return 0;
+}
+
 static int sched_default_pool_output(uint32_t poolid)
 {
     char *poolname;
@@ -5589,6 +5630,86 @@ int main_sched_sedf(int argc, char **argv)
     return 0;
 }
 
+/*
+ * <nothing>            : List all domain paramters and sched params
+ * -d [domid]           : List domain params for domain
+ * -d [domid] [params]  : Set domain params for domain
+ */
+int main_sched_rtds(int argc, char **argv)
+{
+    const char *dom = NULL;
+    const char *cpupool = NULL;
+    int period = 0; /* period is in microsecond */
+    int budget = 0; /* budget is in microsecond */
+    bool opt_p = false;
+    bool opt_b = false;
+    int opt, rc;
+    static struct option opts[] = {
+        {"domain", 1, 0, 'd'},
+        {"period", 1, 0, 'p'},
+        {"budget", 1, 0, 'b'},
+        {"cpupool", 1, 0, 'c'},
+        COMMON_LONG_OPTS,
+        {0, 0, 0, 0}
+    };
+
+    SWITCH_FOREACH_OPT(opt, "d:p:b:c:h", opts, "sched-rtds", 0) {
+    case 'd':
+        dom = optarg;
+        break;
+    case 'p':
+        period = strtol(optarg, NULL, 10);
+        opt_p = 1;
+        break;
+    case 'b':
+        budget = strtol(optarg, NULL, 10);
+        opt_b = 1;
+        break;
+    case 'c':
+        cpupool = optarg;
+        break;
+    }
+
+    if (cpupool && (dom || opt_p || opt_b)) {
+        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");
+        return 1;
+    }
+    if (!dom && (opt_p || opt_b)) {
+        fprintf(stderr, "Must specify a domain.\n");
+        return 1;
+    }
+    if (opt_p != opt_b) {
+        fprintf(stderr, "Must specify period and budget\n");
+        return 1;
+    }
+
+    if (!dom) { /* list all domain's rt scheduler info */
+        return -sched_domain_output(LIBXL_SCHEDULER_RTDS,
+                                    sched_rtds_domain_output,
+                                    sched_rtds_pool_output,
+                                    cpupool);
+    } else {
+        uint32_t domid = find_domain(dom);
+        if (!opt_p && !opt_b) { /* output rt scheduler info */
+            sched_rtds_domain_output(-1);
+            return -sched_rtds_domain_output(domid);
+        } else { /* set rt scheduler paramaters */
+            libxl_domain_sched_params scinfo;
+            libxl_domain_sched_params_init(&scinfo);
+            scinfo.sched = LIBXL_SCHEDULER_RTDS;
+            scinfo.period = period;
+            scinfo.budget = budget;
+
+            rc = sched_domain_set(domid, &scinfo);
+            libxl_domain_sched_params_dispose(&scinfo);
+            if (rc)
+                return -rc;
+        }
+    }
+
+    return 0;
+}
+
 int main_domid(int argc, char **argv)
 {
     uint32_t domid;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7b7fa92..1420b86 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -277,6 +277,14 @@ struct cmd_spec cmd_table[] = {
       "                               --period/--slice)\n"
       "-c CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
     },
+    { "sched-rtds",
+      &main_sched_rtds, 0, 1,
+      "Get/set rtds scheduler parameters",
+      "[-d <Domain> [-p[=PERIOD]] [-b[=BUDGET]]]",
+      "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
+      "-p PERIOD, --period=PERIOD     Period (us)\n"
+      "-b BUDGET, --budget=BUDGET     Budget (us)\n"
+    },
     { "domid",
       &main_domid, 0, 0,
       "Convert a domain name to domain id",
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
@ 2014-09-15 10:40   ` Jan Beulich
  2014-09-16  8:42   ` Dario Faggioli
  2014-09-18 16:08   ` George Dunlap
  2 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-09-15 10:40 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	dario.faggioli, ian.jackson, xen-devel, ptxlinh, xumengpanda,
	chaowang, lichong659, dgolomb

>>> On 14.09.14 at 23:37, <mengxu@cis.upenn.edu> wrote:
> This scheduler follows the Preemptive Global Earliest Deadline First
> (EDF) theory in real-time field.
> At any scheduling point, the VCPU with earlier deadline has higher
> priority. The scheduler always picks the highest priority VCPU to run on a
> feasible PCPU.
> A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
> idle or has a lower-priority VCPU running on it.)
> 
> Each VCPU has a dedicated period and budget.
> The deadline of a VCPU is at the end of each period;
> A VCPU has its budget replenished at the beginning of each period;
> While scheduled, a VCPU burns its budget.
> The VCPU needs to finish its budget before its deadline in each period;
> The VCPU discards its unused budget at the end of each period.
> If a VCPU runs out of budget in a period, it has to wait until next period.
> 
> Each VCPU is implemented as a deferable server.
> When a VCPU has a task running on it, its budget is continuously burned;
> When a VCPU has no task but with budget left, its budget is preserved.
> 
> Queue scheme:
> A global runqueue and a global depletedq for each CPU pool.
> The runqueue holds all runnable VCPUs with budget and sorted by deadline;
> The depletedq holds all VCPUs without budget and unsorted.
> 
> Note: cpumask and cpupool is supported.
> 
> This is an experimental scheduler.
> 
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>

For everything except the meat of it (xen/common/sched_rt.c)
Acked-by: Jan Beulich <jbeulich@suse.com>
with one coding style nit:

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -347,6 +347,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>  #define XEN_SCHEDULER_CREDIT   5
>  #define XEN_SCHEDULER_CREDIT2  6
>  #define XEN_SCHEDULER_ARINC653 7
> +#define XEN_SCHEDULER_RTDS     8
> +
>  /* Set or get info? */
>  #define XEN_DOMCTL_SCHEDOP_putinfo 0
>  #define XEN_DOMCTL_SCHEDOP_getinfo 1
> @@ -368,6 +370,10 @@ struct xen_domctl_scheduler_op {
>          struct xen_domctl_sched_credit2 {
>              uint16_t weight;
>          } credit2;
> +        struct xen_domctl_sched_rtds{

Missing blank before opening brace.

Jan

> +            uint32_t period;
> +            uint32_t budget;
> +        } rtds;
>      } u;
>  };
>  typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 3/4] libxl: " Meng Xu
@ 2014-09-15 22:07   ` Ian Campbell
  2014-09-16  1:11     ` Meng Xu
  2014-09-16  8:04   ` Dario Faggioli
  2014-09-18 16:24   ` George Dunlap
  2 siblings, 1 reply; 49+ messages in thread
From: Ian Campbell @ 2014-09-15 22:07 UTC (permalink / raw)
  To: Meng Xu
  Cc: xisisu, stefano.stabellini, george.dunlap, lu, dario.faggioli,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> Add libxl functions to set/get domain's parameters for rtds scheduler
> Note: VCPU's information (period, budget) is in microsecond (us).
> 
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>

Mostly looks good.

The main thing which is missing is a #define LIBXL_HAVE_XXX in libxl to
advertise this new feature (there are a bunch of existing examples which
you can copy).

> +    if (scinfo->period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) {
> +        if (scinfo->period < 1) {
> +            LOG(ERROR, "VCPU period is not set or out of range, "
> +                       "valid values are larger than 1");

There's no upper limit? Likewise for the budget further down?

Ian.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] xl: introduce rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 4/4] xl: introduce " Meng Xu
@ 2014-09-15 22:18   ` Ian Campbell
  2014-09-16  7:49   ` Dario Faggioli
  2014-09-18 16:35   ` George Dunlap
  2 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2014-09-15 22:18 UTC (permalink / raw)
  To: Meng Xu
  Cc: xisisu, stefano.stabellini, george.dunlap, lu, dario.faggioli,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> Add xl command for rtds scheduler
> Note: VCPU's parameter (period, budget) is in microsecond (us).
> 
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>

Looks good: Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-15 22:07   ` Ian Campbell
@ 2014-09-16  1:11     ` Meng Xu
  2014-09-16  1:49       ` Ian Campbell
  0 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-16  1:11 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1704 bytes --]

2014-09-15 18:07 GMT-04:00 Ian Campbell <ian.campbell@citrix.com>:

> On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> > Add libxl functions to set/get domain's parameters for rtds scheduler
> > Note: VCPU's information (period, budget) is in microsecond (us).
> >
> > Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> > Signed-off-by: Sisu Xi <xisisu@gmail.com>
>
> Mostly looks good.
>
> The main thing which is missing is a #define LIBXL_HAVE_XXX in libxl to
> advertise this new feature (there are a bunch of existing examples which
> you can copy).
>

​Will add it inside libxl.h​



>
> > +    if (scinfo->period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) {
> > +        if (scinfo->period < 1) {
> > +            LOG(ERROR, "VCPU period is not set or out of range, "
> > +                       "valid values are larger than 1");
>
> There's no upper limit? Likewise for the budget further down?
>

​Yes. In the previous version, I used the UINT_MAX as the upper bound.
Because period is int, so it will never reach the UINT_MAX. Even if I
change the UINT_MAX to INT_MAX as the upper bound, period may still not
able to reach it. So George suggests to remove the upper bound check if it
uses the range of type of period.

Since period is int, the largest period we support should be 2^31-1us =
0.5965 hours. That's why we don't have the upper bound check. Of course, I
can add the upper bound as INT_MAX (2^31-1) and reports error when users
try to assign a large value to period and budget.

What do you think?

Thanks,

Meng
​


-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 3167 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-16  1:11     ` Meng Xu
@ 2014-09-16  1:49       ` Ian Campbell
  2014-09-16  3:32         ` Meng Xu
  0 siblings, 1 reply; 49+ messages in thread
From: Ian Campbell @ 2014-09-16  1:49 UTC (permalink / raw)
  To: Meng Xu
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Mon, 2014-09-15 at 21:11 -0400, Meng Xu wrote:

>         > +    if (scinfo->period !=
>         LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) {
>         > +        if (scinfo->period < 1) {
>         > +            LOG(ERROR, "VCPU period is not set or out of
>         range, "
>         > +                       "valid values are larger than 1");
>         
>         There's no upper limit? Likewise for the budget further down?
> 
> 
> ​Yes. In the previous version, I used the UINT_MAX as the upper bound.
> Because period is int, so it will never reach the UINT_MAX. Even if I
> change the UINT_MAX to INT_MAX as the upper bound, period may still
> not able to reach it. So George suggests to remove the upper bound
> check if it uses the range of type of period. 
> 
> 
> Since period is int, the largest period we support should be 2^31-1us
> = 0.5965 hours. That's why we don't have the upper bound check. Of
> course, I can add the upper bound as INT_MAX (2^31-1) and reports
> error when users try to assign a large value to period and budget.
> 
> 
> What do you think?

If the upper limit is the same as the maximum value of the field then
there is no need for a check.

Is a 0.5hr maximum period considered to be larger than any practically
useful value?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-16  1:49       ` Ian Campbell
@ 2014-09-16  3:32         ` Meng Xu
  2014-09-16  7:27           ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-16  3:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1715 bytes --]

2014-09-15 21:49 GMT-04:00 Ian Campbell <ian.campbell@citrix.com>:

> On Mon, 2014-09-15 at 21:11 -0400, Meng Xu wrote:
>
> >         > +    if (scinfo->period !=
> >         LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) {
> >         > +        if (scinfo->period < 1) {
> >         > +            LOG(ERROR, "VCPU period is not set or out of
> >         range, "
> >         > +                       "valid values are larger than 1");
> >
> >         There's no upper limit? Likewise for the budget further down?
> >
> >
> > ​Yes. In the previous version, I used the UINT_MAX as the upper bound.
> > Because period is int, so it will never reach the UINT_MAX. Even if I
> > change the UINT_MAX to INT_MAX as the upper bound, period may still
> > not able to reach it. So George suggests to remove the upper bound
> > check if it uses the range of type of period.
> >
> >
> > Since period is int, the largest period we support should be 2^31-1us
> > = 0.5965 hours. That's why we don't have the upper bound check. Of
> > course, I can add the upper bound as INT_MAX (2^31-1) and reports
> > error when users try to assign a large value to period and budget.
> >
> >
> > What do you think?
>
> If the upper limit is the same as the maximum value of the field then
> there is no need for a check.
>
> Is a 0.5hr maximum period considered to be larger than any practically
> useful value?
>

I think so. If the period is larger than 0.5hr, they can scale it down to a
smaller value with the same bandwidth. The guest domain should be more
repsonsive.​


​Meng​

-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 2544 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-16  3:32         ` Meng Xu
@ 2014-09-16  7:27           ` Dario Faggioli
  2014-09-16 16:54             ` Ian Campbell
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2014-09-16  7:27 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1011 bytes --]

On Mon, 2014-09-15 at 23:32 -0400, Meng Xu wrote:

> 2014-09-15 21:49 GMT-04:00 Ian Campbell <ian.campbell@citrix.com>:

>         Is a 0.5hr maximum period considered to be larger than any
>         practically
>         useful value?
> 
> 
> I think so. 
>
Yes, it is. Most of the people using this scheduler will be interested
in "a few [decades, hundreds] millisecond" period and budget. Someone
might want to try microsecs, someone may be fine with seconds.

For anyone that is interested in anything else, this scheduler is the
wrong solution. Then, of course, they can have a go with it (never say
never! :-D), but no point in optimizing --neither the algorithm nor the
interface-- for such use cases.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
                   ` (3 preceding siblings ...)
  2014-09-14 21:37 ` [PATCH v3 4/4] xl: introduce " Meng Xu
@ 2014-09-16  7:43 ` Dario Faggioli
  2014-09-17 14:15 ` Dario Faggioli
  2014-09-23 13:50 ` Ian Campbell
  6 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-16  7:43 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 497 bytes --]

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:

> //list VCPUs' parameters of each domain in cpu pools using rtds scheduler
> #xl sched-rtds
> Cpupool Pool-0: sched=EDF
>
This needs to become 'sched=RTDS'

Regards, Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] xl: introduce rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 4/4] xl: introduce " Meng Xu
  2014-09-15 22:18   ` Ian Campbell
@ 2014-09-16  7:49   ` Dario Faggioli
  2014-09-18 16:35   ` George Dunlap
  2 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-16  7:49 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 2311 bytes --]

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> Add xl command for rtds scheduler
> Note: VCPU's parameter (period, budget) is in microsecond (us).
> 
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

I only have two minor comments.

> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1

> +static int sched_rtds_pool_output(uint32_t poolid)
> +{
> +    char *poolname;
> +
> +    poolname = libxl_cpupoolid_to_name(ctx, poolid);
> +    printf("Cpupool %s: sched=RTDS algorithm=EDF\n", poolname);
>
Oh, so you are printing RTDS. Perhaps the cover letter is outdated
then...

In any case, just use "sched=RTDS", no need for "algorithm=FOO"

> +
> +    free(poolname);
> +    return 0;
> +}

> +int main_sched_rtds(int argc, char **argv)
> +{
> +    const char *dom = NULL;
> +    const char *cpupool = NULL;
> +    int period = 0; /* period is in microsecond */
> +    int budget = 0; /* budget is in microsecond */
> +    bool opt_p = false;
> +    bool opt_b = false;
> +    int opt, rc;
> +    static struct option opts[] = {
> +        {"domain", 1, 0, 'd'},
> +        {"period", 1, 0, 'p'},
> +        {"budget", 1, 0, 'b'},
> +        {"cpupool", 1, 0, 'c'},
> +        COMMON_LONG_OPTS,
> +        {0, 0, 0, 0}
> +    };
> +
> +    SWITCH_FOREACH_OPT(opt, "d:p:b:c:h", opts, "sched-rtds", 0) {
> +    case 'd':
> +        dom = optarg;
> +        break;
> +    case 'p':
> +        period = strtol(optarg, NULL, 10);
> +        opt_p = 1;
> +        break;
> +    case 'b':
> +        budget = strtol(optarg, NULL, 10);
> +        opt_b = 1;
> +        break;
> +    case 'c':
> +        cpupool = optarg;
> +        break;
> +    }
> +
> +    if (cpupool && (dom || opt_p || opt_b)) {
> +        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");
>
Long line.  We probably can live with this, but since I spotted it...

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 3/4] libxl: " Meng Xu
  2014-09-15 22:07   ` Ian Campbell
@ 2014-09-16  8:04   ` Dario Faggioli
  2014-09-16 16:56     ` Ian Campbell
  2014-09-18 16:24   ` George Dunlap
  2 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2014-09-16  8:04 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 1041 bytes --]

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> Add libxl functions to set/get domain's parameters for rtds scheduler
> Note: VCPU's information (period, budget) is in microsecond (us).
> 
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
>
With the proper LIBXL_HAVE_XXX added:

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

As Ian said, look at libxl.h. You're adding a valid enum value ("rtds")
in libxl_scheduler, and a "budget" field in libxl_domain_sched_params.

I'm not sure whether I'd go for something like

#define LIBXL_HAVE_SCHED_RTDS 1

or

#define LIBXL_HAVE_SCHEDPARAMS_BUDGET 1

If it were me, I probably would have added both, but perhaps both is too
much...

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
  2014-09-15 10:40   ` Jan Beulich
@ 2014-09-16  8:42   ` Dario Faggioli
  2014-09-16  8:52     ` Jan Beulich
  2014-09-18 16:08   ` George Dunlap
  2 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2014-09-16  8:42 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 7420 bytes --]

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:

> This is an experimental scheduler.
> 
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
>
So, I jusst finished looking at this patch too, and it really looks fine
this time. Thanks Meng and Sisu for the good work!

I'm leaving a few comments below, but they really are nothing more than
nits, nothing that requires resending, IMO.

That being said, I'd like to quickly test running Xen with this
scheduler, in a few configurations, and I can't access my test boxes
until the afternoon.

I'll give it a go in no more than a few hours and, if everything goes
fine as I think, I'll reply again with the proper tag.

> --- /dev/null
> +++ b/xen/common/sched_rt.c

> +/*
> + * Init/Free related code
> + */
> +static int
> +rt_init(struct scheduler *ops)
> +{
> +    struct rt_private *prv = xzalloc(struct rt_private);
> +
> +    printk("Initializing RTDS scheduler\n" \
> +           "WARNING: This is experimental software in development.\n" \
> +           "Use at your own risk.\n");
>
I don't think you need the '\' in this case...

> +
> +    if ( prv == NULL )
> +        return -ENOMEM;
> +
> +    spin_lock_init(&prv->lock);
> +    INIT_LIST_HEAD(&prv->sdom);
> +    INIT_LIST_HEAD(&prv->runq);
> +    INIT_LIST_HEAD(&prv->depletedq);
> +
> +    cpumask_clear(&prv->tickled);
> +
> +    ops->sched_data = prv;
> +
> +    return 0;
> +}

> +/*
> + * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq
> + * lock is grabbed before calling this function
> + */
> +static void
> +__repl_update(const struct scheduler *ops, s_time_t now)
> +{
> +    struct list_head *runq = rt_runq(ops);
> +    struct list_head *depletedq = rt_depletedq(ops);
> +    struct list_head *iter;
> +    struct list_head *tmp;
> +    struct rt_vcpu *svc = NULL;
> +
> +    list_for_each_safe(iter, tmp, runq)
> +    {
> +        svc = __q_elem(iter);
> +
> +        if ( now >= svc->cur_deadline )
> +        {
> +            rt_update_deadline(now, svc);
> +            /* reinsert the vcpu if its deadline is updated */
> +            __q_remove(svc);
> +            __runq_insert(ops, svc);
> +        }
> +        else
> +            break;
>
Just from an aesthetic perspective, I think I'd have inverted the
condition and, hence, the two brances (i.e., "if ( < ) break; else {}").

But, please, *do* *not* *resend* only for the sake of this!! :-D :-D

> +    }
> +
> +    list_for_each_safe(iter, tmp, depletedq)
> +    {
> +        svc = __q_elem(iter);
> +        if ( now >= svc->cur_deadline )
> +        {
> +            rt_update_deadline(now, svc);
> +            __q_remove(svc); /* remove from depleted queue */
> +            __runq_insert(ops, svc); /* add to runq */
> +        }
> +    }
> +}

> +/*
> + * Pick a cpu where to run a vcpu, possibly kicking out the vcpu running there
> + * Called by wake() and context_saved()
> + * We have a running candidate here, the kick logic is:
> + * Among all the cpus that are within the cpu affinity
> + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> + * 2) if there are any idle vcpu, kick it.					
> + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
>
Long line.

> + *    if snext has higher priority, kick it.
> + *
> + * TODO:
> + * 1) what if these two vcpus belongs to the same domain?
> + *    replace a vcpu belonging to the same domain introduces more overhead
> + *
> + * lock is grabbed before calling this function
> + */
> +static void
> +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
> +{
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
> +    struct rt_vcpu *iter_svc;
> +    struct vcpu *iter_vc;
> +    int cpu = 0, cpu_to_tickle = 0;
> +    cpumask_t not_tickled;
> +    cpumask_t *online;
> +
> +    if ( new == NULL || is_idle_vcpu(new->vcpu) )
> +        return;
> +
> +    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
> +    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
> +    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
> +
> +    /* 1) if new's previous cpu is idle, kick it for cache benefit */
> +    if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> +    {
> +        cpu_to_tickle = new->vcpu->processor;
> +        goto out;
> +    }
> +
> +    /* 2) if there are any idle pcpu, kick it */
> +    /* The same loop also find the one with lowest priority */
> +    for_each_cpu(cpu, &not_tickled)
> +    {
> +        iter_vc = curr_on_cpu(cpu);
> +        if ( is_idle_vcpu(iter_vc) )
> +        {
> +            cpu_to_tickle = cpu;
> +            goto out;
> +        }
> +        iter_svc = rt_vcpu(iter_vc);
> +        if ( latest_deadline_vcpu == NULL ||
> +             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
> +            latest_deadline_vcpu = iter_svc;
> +    }
> +
> +    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
> +    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
>
Long line again.

> +    {
> +        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
> +        goto out;
> +    }
> +
> +out:
> +    /* TRACE */
> +    {
> +        struct {
> +            unsigned cpu:8, pad:24;
> +        } d;
> +        d.cpu = cpu_to_tickle;
> +        d.pad = 0;
> +        trace_var(TRC_RTDS_TICKLE, 0,
> +                  sizeof(d),
> +                  (unsigned char *)&d);
> +    }
> +
> +    cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
> +    cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
> +    return;
> +}
> +
> +/*
> + * Should always wake up runnable vcpu, put it back to RunQ.
> + * Check priority to raise interrupt
> + * The lock is already grabbed in schedule.c, no need to lock here
> + * TODO: what if these two vcpus belongs to the same domain?
> + */
> +static void
> +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_vcpu * const svc = rt_vcpu(vc);
> +    s_time_t now = NOW();
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *snext = NULL;        /* highest priority on RunQ */
> +    struct rt_dom *sdom = NULL;
> +    cpumask_t *online;
> +
> +    BUG_ON( is_idle_vcpu(vc) );
> +
> +    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
> +        return;
> +
> +    /* on RunQ/DepletedQ, just update info is ok */
> +    if ( unlikely(__vcpu_on_q(svc)) )
> +        return;
> +
How frequent is that, that we wake up a running or already woken and
queued vcpu? That should not happen too often, right?

Again, this is fine as it is right now, and there's nothing you should
do. However, and I'm just mentioning it here in order not to forget
about it :-P, in future we may want to add at least a trace point here.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-16  8:42   ` Dario Faggioli
@ 2014-09-16  8:52     ` Jan Beulich
  2014-09-16  8:59       ` Dario Faggioli
  2014-09-16 16:38       ` Meng Xu
  0 siblings, 2 replies; 49+ messages in thread
From: Jan Beulich @ 2014-09-16  8:52 UTC (permalink / raw)
  To: Meng Xu, Dario Faggioli
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, chaowang,
	lichong659, dgolomb

>>> On 16.09.14 at 10:42, <dario.faggioli@citrix.com> wrote:
> On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
>> --- /dev/null
>> +++ b/xen/common/sched_rt.c
> 
>> +/*
>> + * Init/Free related code
>> + */
>> +static int
>> +rt_init(struct scheduler *ops)
>> +{
>> +    struct rt_private *prv = xzalloc(struct rt_private);
>> +
>> +    printk("Initializing RTDS scheduler\n" \
>> +           "WARNING: This is experimental software in development.\n" \
>> +           "Use at your own risk.\n");
>>
> I don't think you need the '\' in this case...

Definitely not. Please drop.

>> +    list_for_each_safe(iter, tmp, runq)
>> +    {
>> +        svc = __q_elem(iter);
>> +
>> +        if ( now >= svc->cur_deadline )
>> +        {
>> +            rt_update_deadline(now, svc);
>> +            /* reinsert the vcpu if its deadline is updated */
>> +            __q_remove(svc);
>> +            __runq_insert(ops, svc);
>> +        }
>> +        else
>> +            break;
>>
> Just from an aesthetic perspective, I think I'd have inverted the
> condition and, hence, the two brances (i.e., "if ( < ) break; else {}").

In which case the "else" and with it one level of indentation
should go away.

>> +/*
>> + * Pick a cpu where to run a vcpu, possibly kicking out the vcpu running there
>> + * Called by wake() and context_saved()
>> + * We have a running candidate here, the kick logic is:
>> + * Among all the cpus that are within the cpu affinity
>> + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
>> + * 2) if there are any idle vcpu, kick it.					
>> + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
>>
> Long line.

This ...

>> +    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
>> +    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
>>
> Long line again.

... and even more so this warrant re-sending, but perhaps only
after George also had a chance to give review feedback.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-16  8:52     ` Jan Beulich
@ 2014-09-16  8:59       ` Dario Faggioli
  2014-09-16 16:38       ` Meng Xu
  1 sibling, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-16  8:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, Meng Xu, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 1280 bytes --]

On Tue, 2014-09-16 at 09:52 +0100, Jan Beulich wrote:
> >>> On 16.09.14 at 10:42, <dario.faggioli@citrix.com> wrote:
> > On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:

> >> +    list_for_each_safe(iter, tmp, runq)
> >> +    {
> >> +        svc = __q_elem(iter);
> >> +
> >> +        if ( now >= svc->cur_deadline )
> >> +        {
> >> +            rt_update_deadline(now, svc);
> >> +            /* reinsert the vcpu if its deadline is updated */
> >> +            __q_remove(svc);
> >> +            __runq_insert(ops, svc);
> >> +        }
> >> +        else
> >> +            break;
> >>
> > Just from an aesthetic perspective, I think I'd have inverted the
> > condition and, hence, the two brances (i.e., "if ( < ) break; else {}").
> 
> In which case the "else" and with it one level of indentation
> should go away.
> 
Indeed. That was the idea behind my comment in the first place and in my
mind, then, while putting that into written words, I messed up! :-/

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-16  8:52     ` Jan Beulich
  2014-09-16  8:59       ` Dario Faggioli
@ 2014-09-16 16:38       ` Meng Xu
  2014-09-17  9:22         ` Jan Beulich
  1 sibling, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-16 16:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Chao Wang, Chong Li, Dagaen Golomb

2014-09-16 4:52 GMT-04:00 Jan Beulich <JBeulich@suse.com>:
>
> >>> On 16.09.14 at 10:42, <dario.faggioli@citrix.com> wrote:
> > On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> >> --- /dev/null
> >> +++ b/xen/common/sched_rt.c
> >
> >> +/*
> >> + * Init/Free related code
> >> + */
> >> +static int
> >> +rt_init(struct scheduler *ops)
> >> +{
> >> +    struct rt_private *prv = xzalloc(struct rt_private);
> >> +
> >> +    printk("Initializing RTDS scheduler\n" \
> >> +           "WARNING: This is experimental software in development.\n" \
> >> +           "Use at your own risk.\n");
> >>
> > I don't think you need the '\' in this case...
>
> Definitely not. Please drop.
>
> >> +    list_for_each_safe(iter, tmp, runq)
> >> +    {
> >> +        svc = __q_elem(iter);
> >> +
> >> +        if ( now >= svc->cur_deadline )
> >> +        {
> >> +            rt_update_deadline(now, svc);
> >> +            /* reinsert the vcpu if its deadline is updated */
> >> +            __q_remove(svc);
> >> +            __runq_insert(ops, svc);
> >> +        }
> >> +        else
> >> +            break;
> >>
> > Just from an aesthetic perspective, I think I'd have inverted the
> > condition and, hence, the two brances (i.e., "if ( < ) break; else {}").
>
> In which case the "else" and with it one level of indentation
> should go away.


So the code will be like this:

if ( < )
    break;

rt_update_deadline(now, svc);
 /* reinsert the vcpu if its deadline is updated */
 __q_remove(svc);
__runq_insert(ops, svc);


Am I correct? (Just confirm I understand what you said correctly. :-) )

Thanks,

Meng



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-16  7:27           ` Dario Faggioli
@ 2014-09-16 16:54             ` Ian Campbell
  2014-09-17 10:19               ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Ian Campbell @ 2014-09-16 16:54 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Tue, 2014-09-16 at 09:27 +0200, Dario Faggioli wrote:
> On Mon, 2014-09-15 at 23:32 -0400, Meng Xu wrote:
> 
> > 2014-09-15 21:49 GMT-04:00 Ian Campbell <ian.campbell@citrix.com>:
> 
> >         Is a 0.5hr maximum period considered to be larger than any
> >         practically
> >         useful value?
> > 
> > 
> > I think so. 
> >
> Yes, it is. Most of the people using this scheduler will be interested
> in "a few [decades, hundreds] millisecond" period and budget. Someone
> might want to try microsecs, someone may be fine with seconds.

OK, so no need for me to ask if it should be a 64bit value then.

Ian.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-16  8:04   ` Dario Faggioli
@ 2014-09-16 16:56     ` Ian Campbell
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2014-09-16 16:56 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xisisu, stefano.stabellini, george.dunlap, lu, ian.jackson,
	xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich, chaowang,
	lichong659, dgolomb

On Tue, 2014-09-16 at 10:04 +0200, Dario Faggioli wrote:
> On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> > Add libxl functions to set/get domain's parameters for rtds scheduler
> > Note: VCPU's information (period, budget) is in microsecond (us).
> > 
> > Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> > Signed-off-by: Sisu Xi <xisisu@gmail.com>
> >
> With the proper LIBXL_HAVE_XXX added:
> 
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> As Ian said, look at libxl.h. You're adding a valid enum value ("rtds")
> in libxl_scheduler, and a "budget" field in libxl_domain_sched_params.
> 
> I'm not sure whether I'd go for something like
> 
> #define LIBXL_HAVE_SCHED_RTDS 1
> 
> or
> 
> #define LIBXL_HAVE_SCHEDPARAMS_BUDGET 1
> 
> If it were me, I probably would have added both, but perhaps both is too
> much...

Just the former is sufficient, having advertised the generic feature it
can then be assumed that the associated set of initially avaiable fields
and interfaces are tere, which is what the application developer cares
about.

If you like you can spell out exactly which fields that involves in the
associated comment (most of them do today, although in many cases IMHO
it isn't strictly needed)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-16 16:38       ` Meng Xu
@ 2014-09-17  9:22         ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-09-17  9:22 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Chao Wang, Chong Li, Dagaen Golomb

>>> On 16.09.14 at 18:38, <xumengpanda@gmail.com> wrote:
> 2014-09-16 4:52 GMT-04:00 Jan Beulich <JBeulich@suse.com>:
>>
>> >>> On 16.09.14 at 10:42, <dario.faggioli@citrix.com> wrote:
>> > On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
>> >> --- /dev/null
>> >> +++ b/xen/common/sched_rt.c
>> >
>> >> +/*
>> >> + * Init/Free related code
>> >> + */
>> >> +static int
>> >> +rt_init(struct scheduler *ops)
>> >> +{
>> >> +    struct rt_private *prv = xzalloc(struct rt_private);
>> >> +
>> >> +    printk("Initializing RTDS scheduler\n" \
>> >> +           "WARNING: This is experimental software in development.\n" \
>> >> +           "Use at your own risk.\n");
>> >>
>> > I don't think you need the '\' in this case...
>>
>> Definitely not. Please drop.
>>
>> >> +    list_for_each_safe(iter, tmp, runq)
>> >> +    {
>> >> +        svc = __q_elem(iter);
>> >> +
>> >> +        if ( now >= svc->cur_deadline )
>> >> +        {
>> >> +            rt_update_deadline(now, svc);
>> >> +            /* reinsert the vcpu if its deadline is updated */
>> >> +            __q_remove(svc);
>> >> +            __runq_insert(ops, svc);
>> >> +        }
>> >> +        else
>> >> +            break;
>> >>
>> > Just from an aesthetic perspective, I think I'd have inverted the
>> > condition and, hence, the two brances (i.e., "if ( < ) break; else {}").
>>
>> In which case the "else" and with it one level of indentation
>> should go away.
> 
> 
> So the code will be like this:
> 
> if ( < )
>     break;
> 
> rt_update_deadline(now, svc);
>  /* reinsert the vcpu if its deadline is updated */
>  __q_remove(svc);
> __runq_insert(ops, svc);
> 
> 
> Am I correct? (Just confirm I understand what you said correctly. :-) )

Yes.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-16 16:54             ` Ian Campbell
@ 2014-09-17 10:19               ` Dario Faggioli
  0 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-17 10:19 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2231 bytes --]

On mar, 2014-09-16 at 17:54 +0100, Ian Campbell wrote:
> On Tue, 2014-09-16 at 09:27 +0200, Dario Faggioli wrote:
> > On Mon, 2014-09-15 at 23:32 -0400, Meng Xu wrote:
> > 
> > > 2014-09-15 21:49 GMT-04:00 Ian Campbell <ian.campbell@citrix.com>:
> > 
> > >         Is a 0.5hr maximum period considered to be larger than any
> > >         practically
> > >         useful value?
> > > 
> > > 
> > > I think so. 
> > >
> > Yes, it is. Most of the people using this scheduler will be interested
> > in "a few [decades, hundreds] millisecond" period and budget. Someone
> > might want to try microsecs, someone may be fine with seconds.
> 
> OK, so no need for me to ask if it should be a 64bit value then.
> 
It actually was. Then Jan asked whether it really needed to be, and we
all agreed it did not.

TBF, what we were talking in tha thread was the xc and hv interface.
Nothing forbids the two interfaces to differ in the types of this field,
I think. I guess the question here is whether we would like, at the
libxl level, to be a bit, let's say, 'less strict'.

That would mean, e.g., using something like int64 or uint64 in libxl
(type declarations and all the related functions), and then do some
range checking, in the libxl implementation, before passing down the
values to libxc. That would be because, while libxc and Xen interface
can change, libxl should not. One possible reason could be, for example,
if we at some point will want for the budget and period to be specified
in ns, rather than us.

Honestly, I don't foresee such a thing becoming any useful, not even in
mid or long term future. It is what at least Linux sort of does (POSIX's
timespec-s are used, there), though, so it at least is worth some
thinking?

As I said already, I think this is a fine interface, even with stability
in mind, but I'd be fine with a int64_t vesion, here in libxl, if people
feels it may be better. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
                   ` (4 preceding siblings ...)
  2014-09-16  7:43 ` Introduce rtds real-time scheduler for Xen Dario Faggioli
@ 2014-09-17 14:15 ` Dario Faggioli
  2014-09-17 14:33   ` Meng Xu
  2014-09-18 16:00   ` Meng Xu
  2014-09-23 13:50 ` Ian Campbell
  6 siblings, 2 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-17 14:15 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 8872 bytes --]

On dom, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> This serie of patches adds rtds real-time scheduler to Xen.
> 
I gave this series some testing, and the behavior of the scheduler is as
expected, so again, Meng and Sisu, good work.

While doing it, I've also put the series in this git repo/branch:

  git://xenbits.xen.org/people/dariof/xen.git  sched/rt/rtds-v3
  http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/sched/rt/rtds-v3


There are a couple of issue, though, one minor and one serious, which
I'd like you to fix, if possible, before freeze date. More info below.

> //list VCPUs' parameters of each domain in cpu pools using rtds scheduler
> #xl sched-rtds
> Cpupool Pool-0: sched=EDF
> Name                                ID    Period    Budget
> Domain-0                             0     10000      4000
> vm1                                  1     10000      4000
> 
So, when I boot Xen with sched=rtds, issueing this command (`xl
sched-rtds') produces a lot of printk on the serial console, basically
outputting the dump of the scheduler information.

I guess there is one call to rt_sched_dump() (or whatever that was) left
somewhere. Could you please check?

This is not a serious issue, but since you're resending anyway...

> //create a cpupool test
> #xl cpupool-cpu-remove Pool-0 3
> #xl cpupool-cpu-remove Pool-0 2
> #xl cpupool-create name=\"test\" sched=\"rtds\"
> #xl cpupool-cpu-add test 3
> #xl cpupool-cpu-add test 2
> #xl cpupool-list
> Name               CPUs   Sched     Active   Domain count
> Pool-0               2     rtds       y          2
> test                 2     rtds       y          0
> 
This works for me too.

Booting with sched=credit, creating an rtds cpupool and migruting
domains there also works here too.

However, booting with sched=rtds, and issuing the following commands:
# xl cpupool-cpu-remove Pool-0 20
# xl cpupool-cpu-remove Pool-0 21
# xl cpupool-create /etc/xen/be-cpupool

Where /etc/xen/be-cpupool looks like this:
 name = "be"
 sched = "credit"
 cpus = ["20", "21"]
 sched="credit"

Makes Xen *CRASH* with the following trace:

(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    20
(XEN) CPU:    21
(XEN) RIP:    e008:[<ffff82d08012bb1e>]RIP:    e008:[<ffff82d08012bb1e>] check_lock+0x1e/0x3b check_lock+0x1e/0x3b
(XEN) RFLAGS: 0000000000010002   
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
(XEN) CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
(XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
(XEN) rdx: 0000000000000001   rsi: ffff830917fc4c80   rdi: 0000000000000004
(XEN) rdx: 0000000000000001   rsi: ffff830917fc2c80   rdi: 0000000000000004
(XEN) rbp: ffff830917fb7e08   rsp: ffff830917fb7e08   r8:  0000001d52034e80
(XEN) rbp: ffff830917fafe08   rsp: ffff830917fafe08   r8:  0000000000000000
(XEN) r9:  ffff830828a47978   r10: 00000000deadbeef   r11: 0000000000000246
(XEN) r9:  ffff830917fe3ea8   r10: 00000000deadbeef   r11: 0000000000000246
(XEN) r12: 0000001d51efbd89   r13: ffff82d080320de0   r14: 0000000000000000
(XEN) r12: 0000001d51efbb99   r13: ffff82d080320de0   r14: 0000000000000000
(XEN) r15: 0000000000000014   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) r15: 0000000000000015   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
(XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830917fb7e08:
(XEN)   Xen stack trace from rsp=ffff830917fafe08:
(XEN)    ffff830917fb7e20 ffff830917fafe20 ffff82d08012bbc4 ffff82d08012bbc4 ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7eb0 ffff830917fafeb0
(XEN)   
(XEN)    ffff82d080128175 ffff82d080128175 ffff830917fb7e40 ffff830917fafe40 ffff82d0801879ef ffff82d0801879ef 0000001400fb7e60 0000001500fafe60
(XEN)   
(XEN)    ffff830917fc4060 ffff830917fc2060 ffff830917fb7e60 ffff830917fafe60 ffff830917fc4200 ffff830917fc2200 ffff830917fb7eb0 ffff830917fafeb0
(XEN)   
(XEN)    ffff82d08012e983 ffff82d08012e983 ffff830917fb7ef0 ffff830917fafef0 ffff82d0801aa941 ffff82d0801aa941 ffff830917fb7e90 ffff830917fafe90
(XEN)   
(XEN)    ffff82d0802f8980 ffff82d0802f8a00 ffff82d0802f7f80 ffff82d0802f7f80 ffffffffffffffff ffffffffffffffff ffff830917fb0000 ffff830917fa8000
(XEN)   
(XEN)    00000000000f4240 00000000000f4240 ffff830917fb7ee0 ffff830917fafee0 ffff82d08012b539 ffff82d08012b539 ffff830917fb0000 ffff830917fa8000
(XEN)   
(XEN)    0000001d51dfd294 0000001d51dfcb8f ffff8300cf12d000 ffff8300cf12c000 ffff83092d6a0990 ffff83092d6a0990 ffff830917fb7ef0 ffff830917fafef0
(XEN)   
(XEN)    ffff82d08012b591 ffff82d08012b591 ffff830917fb7f10 ffff830917faff10 ffff82d080160425 ffff82d080160425 ffff82d08012b591 ffff82d08012b591
(XEN)   
(XEN)    ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7e10 ffff830917fafe10 0000000000000000 0000000000000000 ffff88003a0cbfd8 ffff88003a0edfd8
(XEN)   
(XEN)    ffff88003a0cbfd8 ffff88003a0edfd8 0000000000000007 0000000000000014 ffff88003a0cbec0 ffff88003a0edec0 0000000000000000 0000000000000000
(XEN)   
(XEN)    0000000000000246 0000000000000246 0000001c6462ff88 0000001c986bebc8 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)   
(XEN)    0000000000000000 0000000000000000 ffffffff810013aa ffffffff810013aa ffffffff81c31160 ffffffff81c31160 00000000deadbeef 00000000deadbeef
(XEN)   
(XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 0000010000000000 ffffffff810013aa ffffffff810013aa 000000000000e033 000000000000e033
(XEN)   
(XEN)    0000000000000246 0000000000000246 ffff88003a0cbea8 ffff88003a0edea8 000000000000e02b 000000000000e02b c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN)   
(XEN)    c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c200000014 c2c2c2c200000015
(XEN)   
(XEN)    ffff8300cf12d000 ffff8300cf12c000 0000003897ca3280 0000003897ca1280 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) 
(XEN) Xen call trace:
(XEN) Xen call trace:
(XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
(XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
(XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
(XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
(XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
(XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
(XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
(XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
(XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
(XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
(XEN) 
(XEN) 
(XEN) Pagetable walk from 0000000000000004:
(XEN) Pagetable walk from 0000000000000004:
(XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
(XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
(XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
(XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
(XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff 
(XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff 
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 20:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000004
(XEN) ****************************************
(XEN) 
(XEN) Manual reset required ('noreboot' specified)
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 21:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000004
(XEN) ****************************************
(XEN) 
(XEN) Manual reset required ('noreboot' specified)

Can you please check, try to reproduce, and fix ASAP?

This, IMO, does not alter the prognosis, wrt 4.5 inclusion, at least not
per-se. In fact, it's ok for some bugs to be around at feature freeze
time, for the features we said we want. What we need to know is that
we're likely going to be able to fix them without making the release
slip.

So you should really either fix this, or provide here enough insights,
to convince people you're on the way to that. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-17 14:15 ` Dario Faggioli
@ 2014-09-17 14:33   ` Meng Xu
  2014-09-18 16:00   ` Meng Xu
  1 sibling, 0 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-17 14:33 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 9352 bytes --]

Hi Dario,

First, thank you so much for testing it. I really appreciate it. :-)

2014-09-17 10:15 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> On dom, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> > This serie of patches adds rtds real-time scheduler to Xen.
> >
> I gave this series some testing, and the behavior of the scheduler is as
> expected, so again, Meng and Sisu, good work.
>
> While doing it, I've also put the series in this git repo/branch:
>
>   git://xenbits.xen.org/people/dariof/xen.git  sched/rt/rtds-v3
>
> http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/sched/rt/rtds-v3
>
>
> There are a couple of issue, though, one minor and one serious, which
> I'd like you to fix, if possible, before freeze date. More info below.
>
> > //list VCPUs' parameters of each domain in cpu pools using rtds scheduler
> > #xl sched-rtds
> > Cpupool Pool-0: sched=EDF
> > Name                                ID    Period    Budget
> > Domain-0                             0     10000      4000
> > vm1                                  1     10000      4000
> >
> So, when I boot Xen with sched=rtds, issueing this command (`xl
> sched-rtds') produces a lot of printk on the serial console, basically
> outputting the dump of the scheduler information.
>
> I guess there is one call to rt_sched_dump() (or whatever that was) left
> somewhere. Could you please check?
>
> This is not a serious issue, but since you're resending anyway...
>
> > //create a cpupool test
> > #xl cpupool-cpu-remove Pool-0 3
> > #xl cpupool-cpu-remove Pool-0 2
> > #xl cpupool-create name=\"test\" sched=\"rtds\"
> > #xl cpupool-cpu-add test 3
> > #xl cpupool-cpu-add test 2
> > #xl cpupool-list
> > Name               CPUs   Sched     Active   Domain count
> > Pool-0               2     rtds       y          2
> > test                 2     rtds       y          0
> >
> This works for me too.
>
> Booting with sched=credit, creating an rtds cpupool and migruting
> domains there also works here too.
>
> However, booting with sched=rtds, and issuing the following commands:
> # xl cpupool-cpu-remove Pool-0 20
> # xl cpupool-cpu-remove Pool-0 21
> # xl cpupool-create /etc/xen/be-cpupool
>
> Where /etc/xen/be-cpupool looks like this:
>  name = "be"
>  sched = "credit"
>  cpus = ["20", "21"]
>  sched="credit"
>
> Makes Xen *CRASH* with the following trace:
>
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    20
> (XEN) CPU:    21
> (XEN) RIP:    e008:[<ffff82d08012bb1e>]RIP:    e008:[<ffff82d08012bb1e>]
> check_lock+0x1e/0x3b check_lock+0x1e/0x3b
> (XEN) RFLAGS: 0000000000010002
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
> (XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
> (XEN) rdx: 0000000000000001   rsi: ffff830917fc4c80   rdi: 0000000000000004
> (XEN) rdx: 0000000000000001   rsi: ffff830917fc2c80   rdi: 0000000000000004
> (XEN) rbp: ffff830917fb7e08   rsp: ffff830917fb7e08   r8:  0000001d52034e80
> (XEN) rbp: ffff830917fafe08   rsp: ffff830917fafe08   r8:  0000000000000000
> (XEN) r9:  ffff830828a47978   r10: 00000000deadbeef   r11: 0000000000000246
> (XEN) r9:  ffff830917fe3ea8   r10: 00000000deadbeef   r11: 0000000000000246
> (XEN) r12: 0000001d51efbd89   r13: ffff82d080320de0   r14: 0000000000000000
> (XEN) r12: 0000001d51efbb99   r13: ffff82d080320de0   r14: 0000000000000000
> (XEN) r15: 0000000000000014   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) r15: 0000000000000015   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
> (XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff830917fb7e08:
> (XEN)   Xen stack trace from rsp=ffff830917fafe08:
> (XEN)    ffff830917fb7e20 ffff830917fafe20 ffff82d08012bbc4
> ffff82d08012bbc4 ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7eb0
> ffff830917fafeb0
> (XEN)
> (XEN)    ffff82d080128175 ffff82d080128175 ffff830917fb7e40
> ffff830917fafe40 ffff82d0801879ef ffff82d0801879ef 0000001400fb7e60
> 0000001500fafe60
> (XEN)
> (XEN)    ffff830917fc4060 ffff830917fc2060 ffff830917fb7e60
> ffff830917fafe60 ffff830917fc4200 ffff830917fc2200 ffff830917fb7eb0
> ffff830917fafeb0
> (XEN)
> (XEN)    ffff82d08012e983 ffff82d08012e983 ffff830917fb7ef0
> ffff830917fafef0 ffff82d0801aa941 ffff82d0801aa941 ffff830917fb7e90
> ffff830917fafe90
> (XEN)
> (XEN)    ffff82d0802f8980 ffff82d0802f8a00 ffff82d0802f7f80
> ffff82d0802f7f80 ffffffffffffffff ffffffffffffffff ffff830917fb0000
> ffff830917fa8000
> (XEN)
> (XEN)    00000000000f4240 00000000000f4240 ffff830917fb7ee0
> ffff830917fafee0 ffff82d08012b539 ffff82d08012b539 ffff830917fb0000
> ffff830917fa8000
> (XEN)
> (XEN)    0000001d51dfd294 0000001d51dfcb8f ffff8300cf12d000
> ffff8300cf12c000 ffff83092d6a0990 ffff83092d6a0990 ffff830917fb7ef0
> ffff830917fafef0
> (XEN)
> (XEN)    ffff82d08012b591 ffff82d08012b591 ffff830917fb7f10
> ffff830917faff10 ffff82d080160425 ffff82d080160425 ffff82d08012b591
> ffff82d08012b591
> (XEN)
> (XEN)    ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7e10
> ffff830917fafe10 0000000000000000 0000000000000000 ffff88003a0cbfd8
> ffff88003a0edfd8
> (XEN)
> (XEN)    ffff88003a0cbfd8 ffff88003a0edfd8 0000000000000007
> 0000000000000014 ffff88003a0cbec0 ffff88003a0edec0 0000000000000000
> 0000000000000000
> (XEN)
> (XEN)    0000000000000246 0000000000000246 0000001c6462ff88
> 0000001c986bebc8 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)
> (XEN)    0000000000000000 0000000000000000 ffffffff810013aa
> ffffffff810013aa ffffffff81c31160 ffffffff81c31160 00000000deadbeef
> 00000000deadbeef
> (XEN)
> (XEN)    00000000deadbeef 00000000deadbeef 0000010000000000
> 0000010000000000 ffffffff810013aa ffffffff810013aa 000000000000e033
> 000000000000e033
> (XEN)
> (XEN)    0000000000000246 0000000000000246 ffff88003a0cbea8
> ffff88003a0edea8 000000000000e02b 000000000000e02b c2c2c2c2c2c2c2c2
> c2c2c2c2c2c2c2c2
> (XEN)
> (XEN)    c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
> c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c200000014
> c2c2c2c200000015
> (XEN)
> (XEN)    ffff8300cf12d000 ffff8300cf12c000 0000003897ca3280
> 0000003897ca1280 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
> (XEN)
> (XEN) Xen call trace:
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
> (XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
> (XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
> (XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
> (XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
> (XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
> (XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
> (XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
> (XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
> (XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
> (XEN)
> (XEN)
> (XEN) Pagetable walk from 0000000000000004:
> (XEN) Pagetable walk from 0000000000000004:
> (XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
> (XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 20:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000004
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 21:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000004
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
>
> Can you please check, try to reproduce, and fix ASAP?
>

​Sure! Start doing it now. I will try to reproduce it and let you know.​


>
> This, IMO, does not alter the prognosis, wrt 4.5 inclusion, at least not
> per-se. In fact, it's ok for some bugs to be around at feature freeze
> time, for the features we said we want. What we need to know is that
> we're likely going to be able to fix them without making the release
> slip.
>
> So you should really either fix this, or provide here enough insights,
> to convince people you're on the way to that. :-)
>

​Sure! Thanks!

Meng​




-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 11419 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-17 14:15 ` Dario Faggioli
  2014-09-17 14:33   ` Meng Xu
@ 2014-09-18 16:00   ` Meng Xu
  1 sibling, 0 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-18 16:00 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

Hi Dario,

I think I fixed the bug and could you please test it again. :-) I will
comment the error for the detailed explanation.

2014-09-17 10:15 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On dom, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
>> This serie of patches adds rtds real-time scheduler to Xen.
>>
> I gave this series some testing, and the behavior of the scheduler is as
> expected, so again, Meng and Sisu, good work.
>
> While doing it, I've also put the series in this git repo/branch:
>
>   git://xenbits.xen.org/people/dariof/xen.git  sched/rt/rtds-v3
>   http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/sched/rt/rtds-v3
>
>
> There are a couple of issue, though, one minor and one serious, which
> I'd like you to fix, if possible, before freeze date. More info below.
>
>> //list VCPUs' parameters of each domain in cpu pools using rtds scheduler
>> #xl sched-rtds
>> Cpupool Pool-0: sched=EDF
>> Name                                ID    Period    Budget
>> Domain-0                             0     10000      4000
>> vm1                                  1     10000      4000
>>
> So, when I boot Xen with sched=rtds, issueing this command (`xl
> sched-rtds') produces a lot of printk on the serial console, basically
> outputting the dump of the scheduler information.
>
> I guess there is one call to rt_sched_dump() (or whatever that was) left
> somewhere. Could you please check?
>
> This is not a serious issue, but since you're resending anyway...
>
>> //create a cpupool test
>> #xl cpupool-cpu-remove Pool-0 3
>> #xl cpupool-cpu-remove Pool-0 2
>> #xl cpupool-create name=\"test\" sched=\"rtds\"
>> #xl cpupool-cpu-add test 3
>> #xl cpupool-cpu-add test 2
>> #xl cpupool-list
>> Name               CPUs   Sched     Active   Domain count
>> Pool-0               2     rtds       y          2
>> test                 2     rtds       y          0
>>
> This works for me too.
>
> Booting with sched=credit, creating an rtds cpupool and migruting
> domains there also works here too.
>
> However, booting with sched=rtds, and issuing the following commands:
> # xl cpupool-cpu-remove Pool-0 20
> # xl cpupool-cpu-remove Pool-0 21
> # xl cpupool-create /etc/xen/be-cpupool
>
> Where /etc/xen/be-cpupool looks like this:
>  name = "be"
>  sched = "credit"
>  cpus = ["20", "21"]
>  sched="credit"
>
> Makes Xen *CRASH* with the following trace:
>
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    20
> (XEN) CPU:    21
> (XEN) RIP:    e008:[<ffff82d08012bb1e>]RIP:    e008:[<ffff82d08012bb1e>] check_lock+0x1e/0x3b check_lock+0x1e/0x3b
> (XEN) RFLAGS: 0000000000010002
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
> (XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
> (XEN) rdx: 0000000000000001   rsi: ffff830917fc4c80   rdi: 0000000000000004
> (XEN) rdx: 0000000000000001   rsi: ffff830917fc2c80   rdi: 0000000000000004
> (XEN) rbp: ffff830917fb7e08   rsp: ffff830917fb7e08   r8:  0000001d52034e80
> (XEN) rbp: ffff830917fafe08   rsp: ffff830917fafe08   r8:  0000000000000000
> (XEN) r9:  ffff830828a47978   r10: 00000000deadbeef   r11: 0000000000000246
> (XEN) r9:  ffff830917fe3ea8   r10: 00000000deadbeef   r11: 0000000000000246
> (XEN) r12: 0000001d51efbd89   r13: ffff82d080320de0   r14: 0000000000000000
> (XEN) r12: 0000001d51efbb99   r13: ffff82d080320de0   r14: 0000000000000000
> (XEN) r15: 0000000000000014   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) r15: 0000000000000015   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
> (XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff830917fb7e08:
> (XEN)   Xen stack trace from rsp=ffff830917fafe08:
> (XEN)    ffff830917fb7e20 ffff830917fafe20 ffff82d08012bbc4 ffff82d08012bbc4 ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7eb0 ffff830917fafeb0
> (XEN)
> (XEN)    ffff82d080128175 ffff82d080128175 ffff830917fb7e40 ffff830917fafe40 ffff82d0801879ef ffff82d0801879ef 0000001400fb7e60 0000001500fafe60
> (XEN)
> (XEN)    ffff830917fc4060 ffff830917fc2060 ffff830917fb7e60 ffff830917fafe60 ffff830917fc4200 ffff830917fc2200 ffff830917fb7eb0 ffff830917fafeb0
> (XEN)
> (XEN)    ffff82d08012e983 ffff82d08012e983 ffff830917fb7ef0 ffff830917fafef0 ffff82d0801aa941 ffff82d0801aa941 ffff830917fb7e90 ffff830917fafe90
> (XEN)
> (XEN)    ffff82d0802f8980 ffff82d0802f8a00 ffff82d0802f7f80 ffff82d0802f7f80 ffffffffffffffff ffffffffffffffff ffff830917fb0000 ffff830917fa8000
> (XEN)
> (XEN)    00000000000f4240 00000000000f4240 ffff830917fb7ee0 ffff830917fafee0 ffff82d08012b539 ffff82d08012b539 ffff830917fb0000 ffff830917fa8000
> (XEN)
> (XEN)    0000001d51dfd294 0000001d51dfcb8f ffff8300cf12d000 ffff8300cf12c000 ffff83092d6a0990 ffff83092d6a0990 ffff830917fb7ef0 ffff830917fafef0
> (XEN)
> (XEN)    ffff82d08012b591 ffff82d08012b591 ffff830917fb7f10 ffff830917faff10 ffff82d080160425 ffff82d080160425 ffff82d08012b591 ffff82d08012b591
> (XEN)
> (XEN)    ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7e10 ffff830917fafe10 0000000000000000 0000000000000000 ffff88003a0cbfd8 ffff88003a0edfd8
> (XEN)
> (XEN)    ffff88003a0cbfd8 ffff88003a0edfd8 0000000000000007 0000000000000014 ffff88003a0cbec0 ffff88003a0edec0 0000000000000000 0000000000000000
> (XEN)
> (XEN)    0000000000000246 0000000000000246 0000001c6462ff88 0000001c986bebc8 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)
> (XEN)    0000000000000000 0000000000000000 ffffffff810013aa ffffffff810013aa ffffffff81c31160 ffffffff81c31160 00000000deadbeef 00000000deadbeef
> (XEN)
> (XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 0000010000000000 ffffffff810013aa ffffffff810013aa 000000000000e033 000000000000e033
> (XEN)
> (XEN)    0000000000000246 0000000000000246 ffff88003a0cbea8 ffff88003a0edea8 000000000000e02b 000000000000e02b c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
> (XEN)
> (XEN)    c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c200000014 c2c2c2c200000015
> (XEN)
> (XEN)    ffff8300cf12d000 ffff8300cf12c000 0000003897ca3280 0000003897ca1280 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
> (XEN)
> (XEN) Xen call trace:
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
> (XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
> (XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
> (XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
> (XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
> (XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
> (XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
> (XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
> (XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
> (XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
> (XEN)
> (XEN)
> (XEN) Pagetable walk from 0000000000000004:
> (XEN) Pagetable walk from 0000000000000004:
> (XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
> (XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 20:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000004
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 21:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000004
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
>
> Can you please check, try to reproduce, and fix ASAP?

This is a lock/null pointer issue. When we remove a cpu from a
cpupool, rt_vcpu_remove() @xen/common/sched_rt.c will be called. In
that function, I just set the schedule_lock of schedule_data to NULL
in version 3' patch, which causes null pointer bug when schedule() @
xen/common/schedule.c tries to access the lock by "lock =
pcpu_schedule_lock_irq(cpu);".

I fixed this by changing the rt_vcpu_remove() @ sched_rt.c. The
changed rt_vcpu_remove() function is at LINE 436 at
https://github.com/PennPanda/xenproject/blob/rtxen-v1.90-patch-v4/xen/common/sched_rt.c

I also tested the scenario you gave as above on my 12cores machine. No bugs now

After booting up the system with rtds, I did:
# xl cpupool-cpu-remove Pool-0 11
# xl cpupool-cpu-remove Pool-0 10
# xl cpupool-create /etc/xen/be-cpupool

Where /etc/xen/be-cpupool looks like this:
    name = "be"
    sched = "credit"
    cpus = ["10", "11"]
    sched="credit"


Could you please pull the code and test on your machine to confirm it
works on your 24 cores machine as well?
The latest code is at https://github.com/PennPanda/xenproject ,
branch: rtxen-v1.90-patch-v4

(Of course, I can send a next version of patch with this fix, but I
hope to make sure this is working on your machine so that people won't
get too many patches. Then I "hope" the next patch set could get
committed. :-))

Once I receive the confirmation that the bug does not exist on your
machine any more, I will release the next version with all other
comments fixed.


>
> This, IMO, does not alter the prognosis, wrt 4.5 inclusion, at least not
> per-se. In fact, it's ok for some bugs to be around at feature freeze
> time, for the features we said we want. What we need to know is that
> we're likely going to be able to fix them without making the release
> slip.
>
> So you should really either fix this, or provide here enough insights,
> to convince people you're on the way to that. :-)

I think I figured out what happened, so the bug should have been solved. :-)

Thanks,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
  2014-09-15 10:40   ` Jan Beulich
  2014-09-16  8:42   ` Dario Faggioli
@ 2014-09-18 16:08   ` George Dunlap
  2014-09-18 18:08     ` Dario Faggioli
                       ` (3 more replies)
  2 siblings, 4 replies; 49+ messages in thread
From: George Dunlap @ 2014-09-18 16:08 UTC (permalink / raw)
  To: Meng Xu, xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, lu, dario.faggioli,
	ian.jackson, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

On 09/14/2014 10:37 PM, Meng Xu wrote:
> This scheduler follows the Preemptive Global Earliest Deadline First
> (EDF) theory in real-time field.
> At any scheduling point, the VCPU with earlier deadline has higher
> priority. The scheduler always picks the highest priority VCPU to run on a
> feasible PCPU.
> A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
> idle or has a lower-priority VCPU running on it.)
>
> Each VCPU has a dedicated period and budget.
> The deadline of a VCPU is at the end of each period;
> A VCPU has its budget replenished at the beginning of each period;
> While scheduled, a VCPU burns its budget.
> The VCPU needs to finish its budget before its deadline in each period;
> The VCPU discards its unused budget at the end of each period.
> If a VCPU runs out of budget in a period, it has to wait until next period.
>
> Each VCPU is implemented as a deferable server.
> When a VCPU has a task running on it, its budget is continuously burned;
> When a VCPU has no task but with budget left, its budget is preserved.
>
> Queue scheme:
> A global runqueue and a global depletedq for each CPU pool.
> The runqueue holds all runnable VCPUs with budget and sorted by deadline;
> The depletedq holds all VCPUs without budget and unsorted.
>
> Note: cpumask and cpupool is supported.
>
> This is an experimental scheduler.
>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>

Getting there, but unfortunately I've got a number more comments.

Konrad, I think this is very close to being ready -- when is the 
deadline again, and how hard is it?  Would it be better to check it in 
before the deadline, and then address the things I'm bringing up here?  
Or would it be better to wait until all the issues are sorted and then 
check it in (even if it's after the deadline)?

+/*
+ * Debug related code, dump vcpu/cpu information
+ */
+static void
+rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
+{
+    char cpustr[1024];
+    cpumask_t *cpupool_mask;
+
+    ASSERT(svc != NULL);
+    /* flag vcpu */
+    if( svc->sdom == NULL )
+        return;
+
+    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
+    printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
+           " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
+           " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ",
+            svc->vcpu->domain->domain_id,
+            svc->vcpu->vcpu_id,
+            svc->vcpu->processor,
+            svc->period,
+            svc->budget,
+            svc->cur_budget,
+            svc->cur_deadline,
+            svc->last_start,
+            __vcpu_on_q(svc),
+            vcpu_runnable(svc->vcpu),
+            cpustr);
+    memset(cpustr, 0, sizeof(cpustr));
+    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
+    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask);
+    printk("cpupool=%s\n", cpustr);
+}
+
+static void
+rt_dump_pcpu(const struct scheduler *ops, int cpu)
+{
+    struct rt_vcpu *svc = rt_vcpu(curr_on_cpu(cpu));
+
+    rt_dump_vcpu(ops, svc);
+}


These svc structures are allocated dynamically and may disappear at any 
time... I think you need the lock to cover this as well.

And since this is called both externally (via ops->dump_cpu_state) and 
internally (below), you probably need a locking version and a 
non-locking version (normally you'd make rt_dump_pcpu() a locking 
"wrapper" around __rt_dump_pcpu()).

> +
> +static void
> +rt_dump(const struct scheduler *ops)
> +{
> +    struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *svc;
> +    unsigned int cpu = 0;
> +    cpumask_t *online;
> +    struct rt_dom *sdom;
> +    unsigned long flags;
> +
> +    ASSERT(!list_empty(&prv->sdom));
> +    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> +    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
> +    runq = rt_runq(ops);
> +    depletedq = rt_depletedq(ops);

Same thing with all these -- other CPUs may be modifying prv->sdom.

> +
> +    printk("PCPU info:\n");
> +    for_each_cpu(cpu, online)
> +        rt_dump_pcpu(ops, cpu);
> +
> +    printk("Global RunQueue info:\n");
> +    spin_lock_irqsave(&prv->lock, flags);
> +    list_for_each( iter, runq )
> +    {
> +        svc = __q_elem(iter);
> +        rt_dump_vcpu(ops, svc);
> +    }
> +
> +    printk("Global DepletedQueue info:\n");
> +    list_for_each( iter, depletedq )
> +    {
> +        svc = __q_elem(iter);
> +        rt_dump_vcpu(ops, svc);
> +    }
> +
> +    printk("Domain info:\n");
> +    list_for_each( iter_sdom, &prv->sdom )
> +    {
> +        sdom = list_entry(iter_sdom, struct rt_dom, sdom_elem);
> +        printk("\tdomain: %d\n", sdom->dom->domain_id);
> +
> +        list_for_each( iter_svc, &sdom->vcpu )
> +        {
> +            svc = list_entry(iter_svc, struct rt_vcpu, sdom_elem);
> +            rt_dump_vcpu(ops, svc);
> +        }
> +    }
> +
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +}
> +
> +/*
> + * update deadline and budget when now >= cur_deadline
> + * it need to be updated to the deadline of the current period
> + */
> +static void
> +rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
> +{
> +    ASSERT(now >= svc->cur_deadline);
> +    ASSERT(svc->period != 0);
> +
> +    do
> +        svc->cur_deadline += svc->period;
> +    while ( svc->cur_deadline <= now );

Is there any possibility that his loop could run on for an unusually 
long time?  Suppose a vcpu ran, then was put to sleep for a few months, 
then woke up again?  Or suppose that there was some other condition that 
could be triggered that would make this take a large number of 
iterations -- it might end up with a DoS at some point down the line.

Would it make sense to add some kind of a failsafe here to fall back to 
the slow method (using division and multiplication) if the difference is 
too large?  We should be able to make the check really simple, like this:

if ( svc->cur_deadline + (svc->period << UPDATE_LIMIT_SHIFT) > now )
  [while loop];
else
  [multiplication and division];

Where UPDATE_LIMIT_SHIFT could be something that would limit the loop to 
a reasonable number of iterations; say, 10 (which would give you 1024 
iterations).

Another option would be to scrap the multiplication and addition 
altogether, and just set svc->cur_deadline = now + svc->period.  If so 
much time has passed, it's unlikely that being aligned will be critical.

Another advantage of this is that we could use this update function both 
here and below when creating the vcpu (and setting the initial deadline).

Thoughts?

This isn't critical, so if we can't come to a consensus quickly it's 
probably fine to go in as it is; but if nobody has any objections, I 
think we should put in some kind of a check like this.

> +
> +    svc->cur_budget = svc->budget;
> +
> +    /* TRACE */
> +    {
> +        struct {
> +            unsigned dom:16,vcpu:16;
> +            unsigned cur_deadline_lo, cur_deadline_hi;
> +            unsigned cur_budget_lo, cur_budget_hi;
> +        } d;
> +        d.dom = svc->vcpu->domain->domain_id;
> +        d.vcpu = svc->vcpu->vcpu_id;
> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
> +        trace_var(TRC_RTDS_BUDGET_REPLENISH, 1,
> +                  sizeof(d),
> +                  (unsigned char *) &d);
> +    }
> +
> +    return;
> +}
> +
[snip]
>
> +/*
> + * This function is called in sched_move_domain() in schedule.c
> + * When move a domain to a new cpupool.
> + * It inserts vcpus of moving domain to the scheduler's RunQ in
> + * dest. cpupool; and insert rt_vcpu svc to scheduler-specific
> + * vcpu list of the dom
> + */
> +static void
> +rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *svc = rt_vcpu(vc);
> +
> +    /* not addlocate idle vcpu to dom vcpu list */
> +    if ( is_idle_vcpu(vc) )
> +        return;
> +
> +    if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) && !vc->is_running )
> +    {
> +        if ( svc->cur_budget > 0 )
> +            __runq_insert(ops, svc);
> +        else
> +            list_add(&svc->q_elem, &prv->depletedq);

This check is done in __runq_insert(), you don't need to duplicate it here.

[snip]
> +/*
> + * schedule function for rt scheduler.
> + * The lock is already grabbed in schedule.c, no need to lock here
> + */
> +static struct task_slice
> +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
> +{
> +    const int cpu = smp_processor_id();
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *const scurr = rt_vcpu(current);
> +    struct rt_vcpu *snext = NULL;
> +    struct task_slice ret = { .migrated = 0 };
> +
> +    /* clear ticked bit now that we've been scheduled */
> +    cpumask_clear_cpu(cpu, &prv->tickled);
> +
> +    /* burn_budget would return for IDLE VCPU */
> +    burn_budget(ops, scurr, now);
> +
> +    __repl_update(ops, now);
> +
> +    if ( tasklet_work_scheduled )
> +    {
> +        snext = rt_vcpu(idle_vcpu[cpu]);
> +    }
> +    else
> +    {
> +        cpumask_t cur_cpu;
> +        cpumask_clear(&cur_cpu);
> +        cpumask_set_cpu(cpu, &cur_cpu);
> +        snext = __runq_pick(ops, &cur_cpu);
> +        if ( snext == NULL )
> +            snext = rt_vcpu(idle_vcpu[cpu]);
> +
> +        /* if scurr has higher priority and budget, still pick scurr */
> +        if ( !is_idle_vcpu(current) &&
> +             vcpu_runnable(current) &&
> +             scurr->cur_budget > 0 &&
> +             ( is_idle_vcpu(snext->vcpu) ||
> +               scurr->cur_deadline <= snext->cur_deadline ) )
> +            snext = scurr;
> +    }
> +
> +    if ( snext != scurr &&
> +         !is_idle_vcpu(current) &&
> +         vcpu_runnable(current) )
> +        set_bit(__RTDS_delayed_runq_add, &scurr->flags);
> +
> +    snext->last_start = now;
> +    if ( !is_idle_vcpu(snext->vcpu) )
> +    {
> +        if ( snext != scurr )
> +        {
> +            __q_remove(snext);
> +            set_bit(__RTDS_scheduled, &snext->flags);
> +        }
> +        if ( snext->vcpu->processor != cpu )
> +        {
> +            snext->vcpu->processor = cpu;
> +            ret.migrated = 1;
> +        }
> +    }
> +
> +    ret.time = MILLISECS(1); /* sched quantum */
> +    ret.task = snext->vcpu;

Isn't having a fixed 1ms quantum going to break with 
microsecond-granularity period and bugdet?  If someone sets their period 
sub-millisecond, won't it be allowed to run for a full ms (even though 
its budget is smaller), and then, because the period has passed, be 
given a new budget again?  This will allow vcpus with sub-millisecond 
periods to starve out those with more, won't it? I'm not sure how two 
sub-millisecond-period vms will interact, but I doubt it would be what 
anybody's expecting.

What about #define MAX_SCHEDULE MILLISECS(1), then set ret.time = 
MIN(snext->bugdet, MAX_SCHEDULE)

It might also make sense to do some kind of experiments to determine a 
minimum reliable period / budget and not allow people to set values 
below that.

(We could consider allowing MAX_SCHEDULE to be configurable at some 
point as well.)

> +
> +    /* TRACE */
> +    {
> +        struct {
> +            unsigned dom:16,vcpu:16;
> +            unsigned cur_deadline_lo, cur_deadline_hi;
> +            unsigned cur_budget_lo, cur_budget_hi;
> +        } d;
> +        d.dom = snext->vcpu->domain->domain_id;
> +        d.vcpu = snext->vcpu->vcpu_id;
> +        d.cur_deadline_lo = (unsigned) snext->cur_deadline;
> +        d.cur_deadline_hi = (unsigned) (snext->cur_deadline >> 32);
> +        d.cur_budget_lo = (unsigned) snext->cur_budget;
> +        d.cur_budget_hi = (unsigned) (snext->cur_budget >> 32);
> +        trace_var(TRC_RTDS_SCHED_TASKLET, 1,
> +                  sizeof(d),
> +                  (unsigned char *)&d);
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * Remove VCPU from RunQ
> + * The lock is already grabbed in schedule.c, no need to lock here
> + */
> +static void
> +rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_vcpu * const svc = rt_vcpu(vc);
> +
> +    BUG_ON( is_idle_vcpu(vc) );
> +
> +    if ( curr_on_cpu(vc->processor) == vc )
> +        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
> +    else if ( __vcpu_on_q(svc) )
> +        __q_remove(svc);
> +    else if ( test_bit(__RTDS_delayed_runq_add, &svc->flags) )
> +        clear_bit(__RTDS_delayed_runq_add, &svc->flags);
> +}
> +
> +/*
> + * Pick a cpu where to run a vcpu, possibly kicking out the vcpu running there
> + * Called by wake() and context_saved()
> + * We have a running candidate here, the kick logic is:
> + * Among all the cpus that are within the cpu affinity
> + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> + * 2) if there are any idle vcpu, kick it.
> + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
> + *    if snext has higher priority, kick it.
> + *
> + * TODO:
> + * 1) what if these two vcpus belongs to the same domain?
> + *    replace a vcpu belonging to the same domain introduces more overhead
> + *
> + * lock is grabbed before calling this function
> + */
> +static void
> +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
> +{
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
> +    struct rt_vcpu *iter_svc;
> +    struct vcpu *iter_vc;
> +    int cpu = 0, cpu_to_tickle = 0;
> +    cpumask_t not_tickled;
> +    cpumask_t *online;
> +
> +    if ( new == NULL || is_idle_vcpu(new->vcpu) )
> +        return;
> +
> +    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
> +    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
> +    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
> +
> +    /* 1) if new's previous cpu is idle, kick it for cache benefit */
> +    if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> +    {
> +        cpu_to_tickle = new->vcpu->processor;
> +        goto out;
> +    }
> +
> +    /* 2) if there are any idle pcpu, kick it */
> +    /* The same loop also find the one with lowest priority */
> +    for_each_cpu(cpu, &not_tickled)
> +    {
> +        iter_vc = curr_on_cpu(cpu);
> +        if ( is_idle_vcpu(iter_vc) )
> +        {
> +            cpu_to_tickle = cpu;
> +            goto out;
> +        }
> +        iter_svc = rt_vcpu(iter_vc);
> +        if ( latest_deadline_vcpu == NULL ||
> +             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
> +            latest_deadline_vcpu = iter_svc;
> +    }
> +
> +    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
> +    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
> +    {
> +        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
> +        goto out;
> +    }
> +
> +out:
> +    /* TRACE */
> +    {
> +        struct {
> +            unsigned cpu:8, pad:24;
> +        } d;
> +        d.cpu = cpu_to_tickle;
> +        d.pad = 0;
> +        trace_var(TRC_RTDS_TICKLE, 0,
> +                  sizeof(d),
> +                  (unsigned char *)&d);
> +    }
> +
> +    cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
> +    cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);

Won't the "fall-through" here mean that if it doesn't find a suitable 
cpu to place next, that it will *always* pointlessly tickle cpu 0 (the 
initial value of cpu_to_tickle)?

It seems like you should either put a return before the out: label 
(possibly with another trace to say the tickle didn't do anything), or 
set cpu_to_tickle to -1 and then only set the tickled bit and raise the 
softirq if cpu_to_tickle >= 0.

Also, I think at the moment Xen supports up to 4096 CPUs on some 
systems, so you should probably make cpu at least 16 bits (if not just a 
full word).

> +    return;
> +}
> +
> +/*
> + * Should always wake up runnable vcpu, put it back to RunQ.
> + * Check priority to raise interrupt
> + * The lock is already grabbed in schedule.c, no need to lock here
> + * TODO: what if these two vcpus belongs to the same domain?
> + */
> +static void
> +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_vcpu * const svc = rt_vcpu(vc);
> +    s_time_t now = NOW();
> +    struct rt_private *prv = rt_priv(ops);
> +    struct rt_vcpu *snext = NULL;        /* highest priority on RunQ */
> +    struct rt_dom *sdom = NULL;
> +    cpumask_t *online;
> +
> +    BUG_ON( is_idle_vcpu(vc) );
> +
> +    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
> +        return;
> +
> +    /* on RunQ/DepletedQ, just update info is ok */
> +    if ( unlikely(__vcpu_on_q(svc)) )
> +        return;
> +
> +    /* If context hasn't been saved for this vcpu yet, we can't put it on
> +     * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
> +     * put on the Runqueue/DepletedQ after the context has been saved.
> +     */
> +    if ( unlikely(test_bit(__RTDS_scheduled, &svc->flags)) )
> +    {
> +        set_bit(__RTDS_delayed_runq_add, &svc->flags);
> +        return;
> +    }
> +
> +    if ( now >= svc->cur_deadline)
> +        rt_update_deadline(now, svc);
> +
> +    /* insert svc to runq/depletedq because svc is not in queue now */
> +    if ( svc->cur_budget > 0 )
> +        __runq_insert(ops, svc);
> +    else
> +        list_add(&svc->q_elem, &prv->depletedq);

This check is already done inside of __runq_insert() -- you don't need 
to duplicate it here.

> +
> +    __repl_update(ops, now);
> +
> +    ASSERT(!list_empty(&prv->sdom));
> +    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> +    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
> +    snext = __runq_pick(ops, online);    /* pick snext from ALL valid cpus */
> +
> +    runq_tickle(ops, snext);

Is there a need to check the status of the "front of the queue" on every 
wake?  In theory the only vcpu which we want to check is the one we just 
woke up.  Would it make sense instead to do something like if(snext == 
svc) runq_tickle()?

I was about to say that since our scheduling quantum is so short (1ms), 
it almost seems like we could get away without tickling entirely; or, 
that we could just check to see if it's got a high enough priority to 
run on the cpu it's currently running on, and just run it there, without 
checking everywhere else.

But then I realized that you can specify budget and period in 
microseconds, which would completely not work without tickling; that's 
when I realized the problem with us-granularity period with a 1-ms 
scheduling quantum.

  -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/4] libxc: add rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 2/4] libxc: add rtds scheduler Meng Xu
@ 2014-09-18 16:15   ` George Dunlap
  0 siblings, 0 replies; 49+ messages in thread
From: George Dunlap @ 2014-09-18 16:15 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Sun, Sep 14, 2014 at 10:37 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> Add xc_sched_rtds_* functions to interact with Xen to set/get domain's
> parameters for rtds scheduler.
> Note: VCPU's information (period, budget) is in microsecond (us).
>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 3/4] libxl: " Meng Xu
  2014-09-15 22:07   ` Ian Campbell
  2014-09-16  8:04   ` Dario Faggioli
@ 2014-09-18 16:24   ` George Dunlap
  2014-09-18 17:19     ` Ian Campbell
  2 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2014-09-18 16:24 UTC (permalink / raw)
  To: Meng Xu, xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, lu, dario.faggioli,
	ian.jackson, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

On 09/14/2014 10:37 PM, Meng Xu wrote:
> Add libxl functions to set/get domain's parameters for rtds scheduler
> Note: VCPU's information (period, budget) is in microsecond (us).
>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>

Looks good -- once you have the LIBXL_HAS_foo then:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
>   tools/libxl/libxl.c         |   72 +++++++++++++++++++++++++++++++++++++++++++
>   tools/libxl/libxl.h         |    1 +
>   tools/libxl/libxl_types.idl |    2 ++
>   3 files changed, 75 insertions(+)
>
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index ad3495a..e202c46 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5217,6 +5217,72 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
>       return 0;
>   }
>   
> +static int sched_rtds_domain_get(libxl__gc *gc, uint32_t domid,
> +                               libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rtds sdom;
> +    int rc;
> +
> +    rc = xc_sched_rtds_domain_get(CTX->xch, domid, &sdom);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rtds");
> +        return ERROR_FAIL;
> +    }
> +
> +    libxl_domain_sched_params_init(scinfo);
> +
> +    scinfo->sched = LIBXL_SCHEDULER_RTDS;
> +    scinfo->period = sdom.period;
> +    scinfo->budget = sdom.budget;
> +
> +    return 0;
> +}
> +
> +static int sched_rtds_domain_set(libxl__gc *gc, uint32_t domid,
> +                               const libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rtds sdom;
> +    int rc;
> +
> +    rc = xc_sched_rtds_domain_get(CTX->xch, domid, &sdom);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rtds");
> +        return ERROR_FAIL;
> +    }
> +
> +    if (scinfo->period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) {
> +        if (scinfo->period < 1) {
> +            LOG(ERROR, "VCPU period is not set or out of range, "
> +                       "valid values are larger than 1");
> +            return ERROR_INVAL;
> +        }
> +        sdom.period = scinfo->period;
> +    }
> +
> +    if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT) {
> +        if (scinfo->budget < 1) {
> +            LOG(ERROR, "VCPU budget is not set or out of range, "
> +                       "valid values are larger than 1");
> +            return ERROR_INVAL;
> +        }
> +        sdom.budget = scinfo->budget;
> +    }
> +
> +    if (sdom.budget > sdom.period) {
> +        LOG(ERROR, "VCPU budget is larger than VCPU period, "
> +                   "VCPU budget should be no larger than VCPU period");
> +        return ERROR_INVAL;
> +    }
> +
> +    rc = xc_sched_rtds_domain_set(CTX->xch, domid, &sdom);
> +    if (rc < 0) {
> +        LOGE(ERROR, "setting domain sched rtds");
> +        return ERROR_FAIL;
> +    }
> +
> +    return 0;
> +}
> +
>   int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
>                                     const libxl_domain_sched_params *scinfo)
>   {
> @@ -5240,6 +5306,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
>       case LIBXL_SCHEDULER_ARINC653:
>           ret=sched_arinc653_domain_set(gc, domid, scinfo);
>           break;
> +    case LIBXL_SCHEDULER_RTDS:
> +        ret=sched_rtds_domain_set(gc, domid, scinfo);
> +        break;
>       default:
>           LOG(ERROR, "Unknown scheduler");
>           ret=ERROR_INVAL;
> @@ -5270,6 +5339,9 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
>       case LIBXL_SCHEDULER_CREDIT2:
>           ret=sched_credit2_domain_get(gc, domid, scinfo);
>           break;
> +    case LIBXL_SCHEDULER_RTDS:
> +        ret=sched_rtds_domain_get(gc, domid, scinfo);
> +        break;
>       default:
>           LOG(ERROR, "Unknown scheduler");
>           ret=ERROR_INVAL;
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 5136d02..00badcc 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1293,6 +1293,7 @@ int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid,
>   #define LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT     -1
>   #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT   -1
>   #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1
> +#define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT    -1
>   
>   int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
>                                     libxl_domain_sched_params *params);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index f1fcbc3..15234d6 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -156,6 +156,7 @@ libxl_scheduler = Enumeration("scheduler", [
>       (5, "credit"),
>       (6, "credit2"),
>       (7, "arinc653"),
> +    (8, "rtds"),
>       ])
>   
>   # Consistent with SHUTDOWN_* in sched.h (apart from UNKNOWN)
> @@ -318,6 +319,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
>       ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
>       ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
>       ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
> +    ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>       ])
>   
>   libxl_domain_build_info = Struct("domain_build_info",[

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] xl: introduce rtds scheduler
  2014-09-14 21:37 ` [PATCH v3 4/4] xl: introduce " Meng Xu
  2014-09-15 22:18   ` Ian Campbell
  2014-09-16  7:49   ` Dario Faggioli
@ 2014-09-18 16:35   ` George Dunlap
  2 siblings, 0 replies; 49+ messages in thread
From: George Dunlap @ 2014-09-18 16:35 UTC (permalink / raw)
  To: Meng Xu, xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, lu, dario.faggioli,
	ian.jackson, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

On 09/14/2014 10:37 PM, Meng Xu wrote:
> Add xl command for rtds scheduler
> Note: VCPU's parameter (period, budget) is in microsecond (us).
>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>

Minor thing:

> ---
>   docs/man/xl.pod.1         |   35 +++++++++++++
>   tools/libxl/xl.h          |    1 +
>   tools/libxl/xl_cmdimpl.c  |  121 +++++++++++++++++++++++++++++++++++++++++++++
>   tools/libxl/xl_cmdtable.c |    8 +++
>   4 files changed, 165 insertions(+)
>
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 9d1c2a5..4c87292 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1035,6 +1035,41 @@ Restrict output to domains in the specified cpupool.
>   
>   =back
>   
> +=item B<sched-rtds> [I<OPTIONS>]
> +
> +Set or get rtds (Real Time Deferrable Server) scheduler parameters.
> +This rt scheduler applies Preemptive Global Earliest Deadline First
> +real-time scheduling algorithm to schedule VCPUs in the system.
> +Each VCPU has a dedicated period and budget.
> +VCPUs in the same domain have the same period and budget (in Xen 4.5).

I think we normally don't include information about specific versions in 
the man pages included in that version; we just describe how that 
specific version works.  (An exception might be something like, "Users 
upgrading from Xen N.M should...", but that wouldn't apply here.)  
Information about specific versions would be included in the wiki or 
other on-line howtos, which are meant to cover multiple versions.

With that change:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] libxl: add rtds scheduler
  2014-09-18 16:24   ` George Dunlap
@ 2014-09-18 17:19     ` Ian Campbell
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2014-09-18 17:19 UTC (permalink / raw)
  To: George Dunlap
  Cc: xisisu, stefano.stabellini, lu, dario.faggioli, ian.jackson,
	xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich, chaowang,
	lichong659, dgolomb

On Thu, 2014-09-18 at 17:24 +0100, George Dunlap wrote:
> On 09/14/2014 10:37 PM, Meng Xu wrote:
> > Add libxl functions to set/get domain's parameters for rtds scheduler
> > Note: VCPU's information (period, budget) is in microsecond (us).
> >
> > Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> > Signed-off-by: Sisu Xi <xisisu@gmail.com>
> 
> Looks good -- once you have the LIBXL_HAS_foo then:

s/HAS/HAVE/.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-18 16:08   ` George Dunlap
@ 2014-09-18 18:08     ` Dario Faggioli
  2014-09-19 13:11       ` Meng Xu
  2014-09-19  9:45     ` Dario Faggioli
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2014-09-18 18:08 UTC (permalink / raw)
  To: George Dunlap
  Cc: ian.campbell, xisisu, stefano.stabellini, lu, ian.jackson,
	xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 15108 bytes --]

On gio, 2014-09-18 at 17:08 +0100, George Dunlap wrote:
> On 09/14/2014 10:37 PM, Meng Xu wrote:

> > +/*
> > + * update deadline and budget when now >= cur_deadline
> > + * it need to be updated to the deadline of the current period
> > + */
> > +static void
> > +rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
> > +{
> > +    ASSERT(now >= svc->cur_deadline);
> > +    ASSERT(svc->period != 0);
> > +
> > +    do
> > +        svc->cur_deadline += svc->period;
> > +    while ( svc->cur_deadline <= now );
> 
> Is there any possibility that his loop could run on for an unusually 
> long time?  Suppose a vcpu ran, then was put to sleep for a few months, 
> then woke up again?  Or suppose that there was some other condition that 
> could be triggered that would make this take a large number of 
> iterations -- it might end up with a DoS at some point down the line.
> 
> Would it make sense to add some kind of a failsafe here to fall back to 
> the slow method (using division and multiplication) if the difference is 
> too large?  We should be able to make the check really simple, like this:
> 
> if ( svc->cur_deadline + (svc->period << UPDATE_LIMIT_SHIFT) > now )
>   [while loop];
> else
>   [multiplication and division];
> 
> Where UPDATE_LIMIT_SHIFT could be something that would limit the loop to 
> a reasonable number of iterations; say, 10 (which would give you 1024 
> iterations).
> 
I guess this makes sense. I'm not sure whether I'd put the failsafe here
or somewhere else (e.g., directly in the vcpu wakeup code), nor I'm sure
whether this is a real DoS risk, but I can't exclude that, not out of
the top of my head, at least, so, I'd say, let's go for it.

> Another option would be to scrap the multiplication and addition 
> altogether, and just set svc->cur_deadline = now + svc->period.  If so 
> much time has passed, it's unlikely that being aligned will be critical.
> 
Indeed.

> Another advantage of this is that we could use this update function both 
> here and below when creating the vcpu (and setting the initial deadline).
> 
I'm fine with the failsafe and, wrt what to do, I'm fine with both the
"reset (i.e., using now+deadline) and the occasional division.

> This isn't critical, so if we can't come to a consensus quickly it's 
> probably fine to go in as it is; but if nobody has any objections, I 
> think we should put in some kind of a check like this.
> 
No objections. :-)

> > +/*
> > + * schedule function for rt scheduler.
> > + * The lock is already grabbed in schedule.c, no need to lock here
> > + */
> > +static struct task_slice
> > +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
> > +{
> > +    const int cpu = smp_processor_id();
> > +    struct rt_private *prv = rt_priv(ops);
> > +    struct rt_vcpu *const scurr = rt_vcpu(current);
> > +    struct rt_vcpu *snext = NULL;
> > +    struct task_slice ret = { .migrated = 0 };
> > +
> > +    /* clear ticked bit now that we've been scheduled */
> > +    cpumask_clear_cpu(cpu, &prv->tickled);
> > +
> > +    /* burn_budget would return for IDLE VCPU */
> > +    burn_budget(ops, scurr, now);
> > +
> > +    __repl_update(ops, now);
> > +
> > +    if ( tasklet_work_scheduled )
> > +    {
> > +        snext = rt_vcpu(idle_vcpu[cpu]);
> > +    }
> > +    else
> > +    {
> > +        cpumask_t cur_cpu;
> > +        cpumask_clear(&cur_cpu);
> > +        cpumask_set_cpu(cpu, &cur_cpu);
> > +        snext = __runq_pick(ops, &cur_cpu);
> > +        if ( snext == NULL )
> > +            snext = rt_vcpu(idle_vcpu[cpu]);
> > +
> > +        /* if scurr has higher priority and budget, still pick scurr */
> > +        if ( !is_idle_vcpu(current) &&
> > +             vcpu_runnable(current) &&
> > +             scurr->cur_budget > 0 &&
> > +             ( is_idle_vcpu(snext->vcpu) ||
> > +               scurr->cur_deadline <= snext->cur_deadline ) )
> > +            snext = scurr;
> > +    }
> > +
> > +    if ( snext != scurr &&
> > +         !is_idle_vcpu(current) &&
> > +         vcpu_runnable(current) )
> > +        set_bit(__RTDS_delayed_runq_add, &scurr->flags);
> > +
> > +    snext->last_start = now;
> > +    if ( !is_idle_vcpu(snext->vcpu) )
> > +    {
> > +        if ( snext != scurr )
> > +        {
> > +            __q_remove(snext);
> > +            set_bit(__RTDS_scheduled, &snext->flags);
> > +        }
> > +        if ( snext->vcpu->processor != cpu )
> > +        {
> > +            snext->vcpu->processor = cpu;
> > +            ret.migrated = 1;
> > +        }
> > +    }
> > +
> > +    ret.time = MILLISECS(1); /* sched quantum */
> > +    ret.task = snext->vcpu;
> 
> Isn't having a fixed 1ms quantum going to break with 
> microsecond-granularity period and bugdet?  If someone sets their period 
> sub-millisecond, won't it be allowed to run for a full ms (even though 
> its budget is smaller), and then, because the period has passed, be 
> given a new budget again?  This will allow vcpus with sub-millisecond 
> periods to starve out those with more, won't it? I'm not sure how two 
> sub-millisecond-period vms will interact, but I doubt it would be what 
> anybody's expecting.
> 
Me and Meng discussed quite a bit about the need for this scheduler to
become a lot more, let's say, time driven, a pretty key feature for a
real-time scheduler! :-P

This is, IMO, an example of that. Another one is the fact that
replenishments and other operations can be done by setting timers,
rather than via "polling", avoiding having to scan the full RunQ in hot
paths like this function, and other places (I said this on xen-devel
too, while reviewing one of the RFCs).

Actually, what you're saying about the budget, points at another lack of
the current implementation, i.e., payback for budget overrun (also
mentioned in the same email, I think). That basically means, in the case
you're describing, recognizing that the vcpu in question overran, by
allowing its budget to go negative, and making it pay a price for the
budget to become positive again, in terms on how far its deadline will
be set (this can be implemented quite easily, by slightly changing the
while loop we were discussing above).

My view is that these things can come as future development, especially
considering we're thinking of checking this in as an experimental
feature. However...

> What about #define MAX_SCHEDULE MILLISECS(1), then set ret.time = 
> MIN(snext->bugdet, MAX_SCHEDULE)
> 
... yes, something like this should at least help, without requiring too
much code restructuring, so, yes, I'd go for it.

> It might also make sense to do some kind of experiments to determine a 
> minimum reliable period / budget and not allow people to set values 
> below that.
> 
Yeah, experiments are always a good thing... :-)

> > +/*
> > + * Pick a cpu where to run a vcpu, possibly kicking out the vcpu running there
> > + * Called by wake() and context_saved()
> > + * We have a running candidate here, the kick logic is:
> > + * Among all the cpus that are within the cpu affinity
> > + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> > + * 2) if there are any idle vcpu, kick it.
> > + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
> > + *    if snext has higher priority, kick it.
> > + *
> > + * TODO:
> > + * 1) what if these two vcpus belongs to the same domain?
> > + *    replace a vcpu belonging to the same domain introduces more overhead
> > + *
> > + * lock is grabbed before calling this function
> > + */
> > +static void
> > +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
> > +{
> > +    struct rt_private *prv = rt_priv(ops);
> > +    struct rt_vcpu *latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
> > +    struct rt_vcpu *iter_svc;
> > +    struct vcpu *iter_vc;
> > +    int cpu = 0, cpu_to_tickle = 0;
> > +    cpumask_t not_tickled;
> > +    cpumask_t *online;
> > +
> > +    if ( new == NULL || is_idle_vcpu(new->vcpu) )
> > +        return;
> > +
> > +    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
> > +    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
> > +    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
> > +
> > +    /* 1) if new's previous cpu is idle, kick it for cache benefit */
> > +    if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> > +    {
> > +        cpu_to_tickle = new->vcpu->processor;
> > +        goto out;
> > +    }
> > +
> > +    /* 2) if there are any idle pcpu, kick it */
> > +    /* The same loop also find the one with lowest priority */
> > +    for_each_cpu(cpu, &not_tickled)
> > +    {
> > +        iter_vc = curr_on_cpu(cpu);
> > +        if ( is_idle_vcpu(iter_vc) )
> > +        {
> > +            cpu_to_tickle = cpu;
> > +            goto out;
> > +        }
> > +        iter_svc = rt_vcpu(iter_vc);
> > +        if ( latest_deadline_vcpu == NULL ||
> > +             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
> > +            latest_deadline_vcpu = iter_svc;
> > +    }
> > +
> > +    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
> > +    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
> > +    {
> > +        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    /* TRACE */
> > +    {
> > +        struct {
> > +            unsigned cpu:8, pad:24;
> > +        } d;
> > +        d.cpu = cpu_to_tickle;
> > +        d.pad = 0;
> > +        trace_var(TRC_RTDS_TICKLE, 0,
> > +                  sizeof(d),
> > +                  (unsigned char *)&d);
> > +    }
> > +
> > +    cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
> > +    cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
> 
> Won't the "fall-through" here mean that if it doesn't find a suitable 
> cpu to place next, that it will *always* pointlessly tickle cpu 0 (the 
> initial value of cpu_to_tickle)?
> 
Yep, I think you're right (and, yes, I missed this in my own
review. :-))

> It seems like you should either put a return before the out: label 
> (possibly with another trace to say the tickle didn't do anything), or 
> set cpu_to_tickle to -1 and then only set the tickled bit and raise the 
> softirq if cpu_to_tickle >= 0.
> 
+1 (both ways look fine).

> > +/*
> > + * Should always wake up runnable vcpu, put it back to RunQ.
> > + * Check priority to raise interrupt
> > + * The lock is already grabbed in schedule.c, no need to lock here
> > + * TODO: what if these two vcpus belongs to the same domain?
> > + */
> > +static void
> > +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> > +{
> > +    struct rt_vcpu * const svc = rt_vcpu(vc);
> > +    s_time_t now = NOW();
> > +    struct rt_private *prv = rt_priv(ops);
> > +    struct rt_vcpu *snext = NULL;        /* highest priority on RunQ */
> > +    struct rt_dom *sdom = NULL;
> > +    cpumask_t *online;
> > +
> > +    BUG_ON( is_idle_vcpu(vc) );
> > +
> > +    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
> > +        return;
> > +
> > +    /* on RunQ/DepletedQ, just update info is ok */
> > +    if ( unlikely(__vcpu_on_q(svc)) )
> > +        return;
> > +
> > +    /* If context hasn't been saved for this vcpu yet, we can't put it on
> > +     * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
> > +     * put on the Runqueue/DepletedQ after the context has been saved.
> > +     */
> > +    if ( unlikely(test_bit(__RTDS_scheduled, &svc->flags)) )
> > +    {
> > +        set_bit(__RTDS_delayed_runq_add, &svc->flags);
> > +        return;
> > +    }
> > +
> > +    if ( now >= svc->cur_deadline)
> > +        rt_update_deadline(now, svc);
> > +
> > +    /* insert svc to runq/depletedq because svc is not in queue now */
> > +    if ( svc->cur_budget > 0 )
> > +        __runq_insert(ops, svc);
> > +    else
> > +        list_add(&svc->q_elem, &prv->depletedq);
> 
> This check is already done inside of __runq_insert() -- you don't need 
> to duplicate it here.
> 
> > +
> > +    __repl_update(ops, now);
> > +
> > +    ASSERT(!list_empty(&prv->sdom));
> > +    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> > +    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
> > +    snext = __runq_pick(ops, online);    /* pick snext from ALL valid cpus */
> > +
> > +    runq_tickle(ops, snext);
> 
> Is there a need to check the status of the "front of the queue" on every 
> wake?  In theory the only vcpu which we want to check is the one we just 
> woke up.  Would it make sense instead to do something like if(snext == 
> svc) runq_tickle()?
> 
Well, since __repl_update() is called, I think there is, as things may
have changed in the RunQ, not only (at least potentially) for the task
that is waking up.

Or perhaps you were suggesting not to call __repl_update()?

What Meng is doing here, I think, is basically to use the wakeup of a
vcpu as a potential "scheduling point", i.e., an he's taking the chance
to update the vcpus parameter, look at the RunQ, and see if anything
changed and a re-schedule is necessary.

This implies some overhead, but it also increases the precision... It's,
IIUIC, another way of introducing a varying scheduling quantum, similar
to the one you proposed above, when commenting to do_schedule().

I really don't like the fact that we scan the RunQ so often. By making
(clever) use of timers and other tricks, I'm sure we can improve, but
probably not until the code is structured this way.

As said before, I'm fine with such improvements to come later, as
further development, after 4.5.

> I was about to say that since our scheduling quantum is so short (1ms), 
> it almost seems like we could get away without tickling entirely; or, 
> that we could just check to see if it's got a high enough priority to 
> run on the cpu it's currently running on, and just run it there, without 
> checking everywhere else.
> 
Maybe we can use something different than tickling (in future!), but I
don't think we can do without something that achieves the same, not if
we want to stick to implement EDF properly, as I think we should.

Or, if you want, since we're dealing with (potentially) very small
numbers already, a scheduling quantum that would allow us to avoid
actively pushing a vcpu to the a specific processor would be too small
to be practical... which, yes, is pretty much your point. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-18 16:08   ` George Dunlap
  2014-09-18 18:08     ` Dario Faggioli
@ 2014-09-19  9:45     ` Dario Faggioli
  2014-09-19 16:44     ` Konrad Rzeszutek Wilk
  2014-09-20 21:10     ` Meng Xu
  3 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-19  9:45 UTC (permalink / raw)
  To: George Dunlap
  Cc: ian.campbell, xisisu, stefano.stabellini, lu, ian.jackson,
	xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 1233 bytes --]

On Thu, 2014-09-18 at 17:08 +0100, George Dunlap wrote:
> On 09/14/2014 10:37 PM, Meng Xu wrote:

> +static void
> +rt_dump_pcpu(const struct scheduler *ops, int cpu)
> +{
> +    struct rt_vcpu *svc = rt_vcpu(curr_on_cpu(cpu));
> +
> +    rt_dump_vcpu(ops, svc);
> +}
> 
> 
> These svc structures are allocated dynamically and may disappear at any 
> time... I think you need the lock to cover this as well.
> 
About locking, I agree with George that we need these functions to be
protected. Right now, this scheduler only uses one lock for everything,
so that would mean grabbing that lock here too.

This far from optimal, e.g., with scalability in mind (although, dumping
is going to be pretty rare an operation), but I think it'd be fine for
now.

In future (after 4.5), we may want to have a dedicated lock for all the
operations that manipulates the list of domains, as pretty much all our
other schedulers are doing.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-18 18:08     ` Dario Faggioli
@ 2014-09-19 13:11       ` Meng Xu
  2014-09-22 17:11         ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-19 13:11 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

2014-09-18 14:08 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On gio, 2014-09-18 at 17:08 +0100, George Dunlap wrote:
>> On 09/14/2014 10:37 PM, Meng Xu wrote:
>
>> > +/*
>> > + * update deadline and budget when now >= cur_deadline
>> > + * it need to be updated to the deadline of the current period
>> > + */
>> > +static void
>> > +rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
>> > +{
>> > +    ASSERT(now >= svc->cur_deadline);
>> > +    ASSERT(svc->period != 0);
>> > +
>> > +    do
>> > +        svc->cur_deadline += svc->period;
>> > +    while ( svc->cur_deadline <= now );
>>
>> Is there any possibility that his loop could run on for an unusually
>> long time?  Suppose a vcpu ran, then was put to sleep for a few months,
>> then woke up again?  Or suppose that there was some other condition that
>> could be triggered that would make this take a large number of
>> iterations -- it might end up with a DoS at some point down the line.
>>
>> Would it make sense to add some kind of a failsafe here to fall back to
>> the slow method (using division and multiplication) if the difference is
>> too large?  We should be able to make the check really simple, like this:
>>
>> if ( svc->cur_deadline + (svc->period << UPDATE_LIMIT_SHIFT) > now )
>>   [while loop];
>> else
>>   [multiplication and division];
>>
>> Where UPDATE_LIMIT_SHIFT could be something that would limit the loop to
>> a reasonable number of iterations; say, 10 (which would give you 1024
>> iterations).
>>
> I guess this makes sense. I'm not sure whether I'd put the failsafe here
> or somewhere else (e.g., directly in the vcpu wakeup code), nor I'm sure
> whether this is a real DoS risk, but I can't exclude that, not out of
> the top of my head, at least, so, I'd say, let's go for it.

I like this and will change it like the above. Of course, I will also
change the code when we initianlly set the parameters of a vcpu :-)

>
>> Another option would be to scrap the multiplication and addition
>> altogether, and just set svc->cur_deadline = now + svc->period.  If so
>> much time has passed, it's unlikely that being aligned will be critical.
>>
> Indeed.

Right. I choose to go with the UPDATE_LIMIT_SHIFT as I said above.
Well, either is fine with me. :-)

>
>> Another advantage of this is that we could use this update function both
>> here and below when creating the vcpu (and setting the initial deadline).
>>
> I'm fine with the failsafe and, wrt what to do, I'm fine with both the
> "reset (i.e., using now+deadline) and the occasional division.
>
>> This isn't critical, so if we can't come to a consensus quickly it's
>> probably fine to go in as it is; but if nobody has any objections, I
>> think we should put in some kind of a check like this.
>>
> No objections. :-)

No objections.

>> > +/*
>> > + * schedule function for rt scheduler.
>> > + * The lock is already grabbed in schedule.c, no need to lock here
>> > + */
>> > +static struct task_slice
>> > +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
>> > +{
>> > +    const int cpu = smp_processor_id();
>> > +    struct rt_private *prv = rt_priv(ops);
>> > +    struct rt_vcpu *const scurr = rt_vcpu(current);
>> > +    struct rt_vcpu *snext = NULL;
>> > +    struct task_slice ret = { .migrated = 0 };
>> > +
>> > +    /* clear ticked bit now that we've been scheduled */
>> > +    cpumask_clear_cpu(cpu, &prv->tickled);
>> > +
>> > +    /* burn_budget would return for IDLE VCPU */
>> > +    burn_budget(ops, scurr, now);
>> > +
>> > +    __repl_update(ops, now);
>> > +
>> > +    if ( tasklet_work_scheduled )
>> > +    {
>> > +        snext = rt_vcpu(idle_vcpu[cpu]);
>> > +    }
>> > +    else
>> > +    {
>> > +        cpumask_t cur_cpu;
>> > +        cpumask_clear(&cur_cpu);
>> > +        cpumask_set_cpu(cpu, &cur_cpu);
>> > +        snext = __runq_pick(ops, &cur_cpu);
>> > +        if ( snext == NULL )
>> > +            snext = rt_vcpu(idle_vcpu[cpu]);
>> > +
>> > +        /* if scurr has higher priority and budget, still pick scurr */
>> > +        if ( !is_idle_vcpu(current) &&
>> > +             vcpu_runnable(current) &&
>> > +             scurr->cur_budget > 0 &&
>> > +             ( is_idle_vcpu(snext->vcpu) ||
>> > +               scurr->cur_deadline <= snext->cur_deadline ) )
>> > +            snext = scurr;
>> > +    }
>> > +
>> > +    if ( snext != scurr &&
>> > +         !is_idle_vcpu(current) &&
>> > +         vcpu_runnable(current) )
>> > +        set_bit(__RTDS_delayed_runq_add, &scurr->flags);
>> > +
>> > +    snext->last_start = now;
>> > +    if ( !is_idle_vcpu(snext->vcpu) )
>> > +    {
>> > +        if ( snext != scurr )
>> > +        {
>> > +            __q_remove(snext);
>> > +            set_bit(__RTDS_scheduled, &snext->flags);
>> > +        }
>> > +        if ( snext->vcpu->processor != cpu )
>> > +        {
>> > +            snext->vcpu->processor = cpu;
>> > +            ret.migrated = 1;
>> > +        }
>> > +    }
>> > +
>> > +    ret.time = MILLISECS(1); /* sched quantum */
>> > +    ret.task = snext->vcpu;
>>
>> Isn't having a fixed 1ms quantum going to break with
>> microsecond-granularity period and bugdet?  If someone sets their period
>> sub-millisecond, won't it be allowed to run for a full ms (even though
>> its budget is smaller), and then, because the period has passed, be
>> given a new budget again?  This will allow vcpus with sub-millisecond
>> periods to starve out those with more, won't it? I'm not sure how two
>> sub-millisecond-period vms will interact, but I doubt it would be what
>> anybody's expecting.
>>
> Me and Meng discussed quite a bit about the need for this scheduler to
> become a lot more, let's say, time driven, a pretty key feature for a
> real-time scheduler! :-P

Yes! Dario and I discussed this back and forth before. This can be
solved by adding timers to trigger replenish each vcpu's budget and
update its deadline; When we return the time, we can just return the
remaining budget of the vcpu. (We cannot simply return the remaining
budget of a vcpu without adding timers to update each vcpu's deadline;
That will make a vcpu not able to start its new period timely.)

>
> This is, IMO, an example of that. Another one is the fact that
> replenishments and other operations can be done by setting timers,
> rather than via "polling", avoiding having to scan the full RunQ in hot
> paths like this function, and other places (I said this on xen-devel
> too, while reviewing one of the RFCs).
>
> Actually, what you're saying about the budget, points at another lack of
> the current implementation, i.e., payback for budget overrun (also
> mentioned in the same email, I think). That basically means, in the case
> you're describing, recognizing that the vcpu in question overran, by
> allowing its budget to go negative, and making it pay a price for the
> budget to become positive again, in terms on how far its deadline will
> be set (this can be implemented quite easily, by slightly changing the
> while loop we were discussing above).

Hmm, I think we have other ways to pay a price for the overrun budget.
For example, we can reduce the budget from its next period to pay for
the overrun budget in the current period if it overruns. When we add
timers and avoid such MILLISECS(1) scheduling quantum, the overrun
budget should not be too much, IMO. But anyway, this is something in
the future to solve because it needs to incorporate the extra timers
we will add in the future.

>
> My view is that these things can come as future development, especially
> considering we're thinking of checking this in as an experimental
> feature. However...
>
>> What about #define MAX_SCHEDULE MILLISECS(1), then set ret.time =
>> MIN(snext->bugdet, MAX_SCHEDULE)
>>
> ... yes, something like this should at least help, without requiring too
> much code restructuring, so, yes, I'd go for it.

Roger.

>
>> It might also make sense to do some kind of experiments to determine a
>> minimum reliable period / budget and not allow people to set values
>> below that.
>>
> Yeah, experiments are always a good thing... :-)
>
>> > +/*
>> > + * Pick a cpu where to run a vcpu, possibly kicking out the vcpu running there
>> > + * Called by wake() and context_saved()
>> > + * We have a running candidate here, the kick logic is:
>> > + * Among all the cpus that are within the cpu affinity
>> > + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
>> > + * 2) if there are any idle vcpu, kick it.
>> > + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
>> > + *    if snext has higher priority, kick it.
>> > + *
>> > + * TODO:
>> > + * 1) what if these two vcpus belongs to the same domain?
>> > + *    replace a vcpu belonging to the same domain introduces more overhead
>> > + *
>> > + * lock is grabbed before calling this function
>> > + */
>> > +static void
>> > +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
>> > +{
>> > +    struct rt_private *prv = rt_priv(ops);
>> > +    struct rt_vcpu *latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
>> > +    struct rt_vcpu *iter_svc;
>> > +    struct vcpu *iter_vc;
>> > +    int cpu = 0, cpu_to_tickle = 0;
>> > +    cpumask_t not_tickled;
>> > +    cpumask_t *online;
>> > +
>> > +    if ( new == NULL || is_idle_vcpu(new->vcpu) )
>> > +        return;
>> > +
>> > +    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
>> > +    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
>> > +    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
>> > +
>> > +    /* 1) if new's previous cpu is idle, kick it for cache benefit */
>> > +    if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
>> > +    {
>> > +        cpu_to_tickle = new->vcpu->processor;
>> > +        goto out;
>> > +    }
>> > +
>> > +    /* 2) if there are any idle pcpu, kick it */
>> > +    /* The same loop also find the one with lowest priority */
>> > +    for_each_cpu(cpu, &not_tickled)
>> > +    {
>> > +        iter_vc = curr_on_cpu(cpu);
>> > +        if ( is_idle_vcpu(iter_vc) )
>> > +        {
>> > +            cpu_to_tickle = cpu;
>> > +            goto out;
>> > +        }
>> > +        iter_svc = rt_vcpu(iter_vc);
>> > +        if ( latest_deadline_vcpu == NULL ||
>> > +             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
>> > +            latest_deadline_vcpu = iter_svc;
>> > +    }
>> > +
>> > +    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
>> > +    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
>> > +    {
>> > +        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
>> > +        goto out;
>> > +    }
>> > +
>> > +out:
>> > +    /* TRACE */
>> > +    {
>> > +        struct {
>> > +            unsigned cpu:8, pad:24;
>> > +        } d;
>> > +        d.cpu = cpu_to_tickle;
>> > +        d.pad = 0;
>> > +        trace_var(TRC_RTDS_TICKLE, 0,
>> > +                  sizeof(d),
>> > +                  (unsigned char *)&d);
>> > +    }
>> > +
>> > +    cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
>> > +    cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
>>
>> Won't the "fall-through" here mean that if it doesn't find a suitable
>> cpu to place next, that it will *always* pointlessly tickle cpu 0 (the
>> initial value of cpu_to_tickle)?
>>
> Yep, I think you're right (and, yes, I missed this in my own
> review. :-))
>
>> It seems like you should either put a return before the out: label
>> (possibly with another trace to say the tickle didn't do anything), or
>> set cpu_to_tickle to -1 and then only set the tickled bit and raise the
>> softirq if cpu_to_tickle >= 0.
>>
> +1 (both ways look fine).

I will go with the first option: putting a return before the out label.

>
>> > +/*
>> > + * Should always wake up runnable vcpu, put it back to RunQ.
>> > + * Check priority to raise interrupt
>> > + * The lock is already grabbed in schedule.c, no need to lock here
>> > + * TODO: what if these two vcpus belongs to the same domain?
>> > + */
>> > +static void
>> > +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>> > +{
>> > +    struct rt_vcpu * const svc = rt_vcpu(vc);
>> > +    s_time_t now = NOW();
>> > +    struct rt_private *prv = rt_priv(ops);
>> > +    struct rt_vcpu *snext = NULL;        /* highest priority on RunQ */
>> > +    struct rt_dom *sdom = NULL;
>> > +    cpumask_t *online;
>> > +
>> > +    BUG_ON( is_idle_vcpu(vc) );
>> > +
>> > +    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
>> > +        return;
>> > +
>> > +    /* on RunQ/DepletedQ, just update info is ok */
>> > +    if ( unlikely(__vcpu_on_q(svc)) )
>> > +        return;
>> > +
>> > +    /* If context hasn't been saved for this vcpu yet, we can't put it on
>> > +     * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
>> > +     * put on the Runqueue/DepletedQ after the context has been saved.
>> > +     */
>> > +    if ( unlikely(test_bit(__RTDS_scheduled, &svc->flags)) )
>> > +    {
>> > +        set_bit(__RTDS_delayed_runq_add, &svc->flags);
>> > +        return;
>> > +    }
>> > +
>> > +    if ( now >= svc->cur_deadline)
>> > +        rt_update_deadline(now, svc);
>> > +
>> > +    /* insert svc to runq/depletedq because svc is not in queue now */
>> > +    if ( svc->cur_budget > 0 )
>> > +        __runq_insert(ops, svc);
>> > +    else
>> > +        list_add(&svc->q_elem, &prv->depletedq);
>>
>> This check is already done inside of __runq_insert() -- you don't need
>> to duplicate it here.
>>
>> > +
>> > +    __repl_update(ops, now);
>> > +
>> > +    ASSERT(!list_empty(&prv->sdom));
>> > +    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
>> > +    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>> > +    snext = __runq_pick(ops, online);    /* pick snext from ALL valid cpus */
>> > +
>> > +    runq_tickle(ops, snext);
>>
>> Is there a need to check the status of the "front of the queue" on every
>> wake?  In theory the only vcpu which we want to check is the one we just
>> woke up.  Would it make sense instead to do something like if(snext ==
>> svc) runq_tickle()?
>>
> Well, since __repl_update() is called, I think there is, as things may
> have changed in the RunQ, not only (at least potentially) for the task
> that is waking up.
>
> Or perhaps you were suggesting not to call __repl_update()?
>
> What Meng is doing here, I think, is basically to use the wakeup of a
> vcpu as a potential "scheduling point", i.e., an he's taking the chance
> to update the vcpus parameter, look at the RunQ, and see if anything
> changed and a re-schedule is necessary.
>
> This implies some overhead, but it also increases the precision... It's,
> IIUIC, another way of introducing a varying scheduling quantum, similar
> to the one you proposed above, when commenting to do_schedule().
>
> I really don't like the fact that we scan the RunQ so often. By making
> (clever) use of timers and other tricks, I'm sure we can improve, but
> probably not until the code is structured this way.
>
> As said before, I'm fine with such improvements to come later, as
> further development, after 4.5.
>
>> I was about to say that since our scheduling quantum is so short (1ms),
>> it almost seems like we could get away without tickling entirely; or,
>> that we could just check to see if it's got a high enough priority to
>> run on the cpu it's currently running on, and just run it there, without
>> checking everywhere else.
>>
> Maybe we can use something different than tickling (in future!), but I
> don't think we can do without something that achieves the same, not if
> we want to stick to implement EDF properly, as I think we should.
>
> Or, if you want, since we're dealing with (potentially) very small
> numbers already, a scheduling quantum that would allow us to avoid
> actively pushing a vcpu to the a specific processor would be too small
> to be practical... which, yes, is pretty much your point. :-)
>

I totally agree with Dario's explanation above about  the
runq_tickle(). Thank you very much Dario! :-P
vcpu wake up is an event to trigger the repl_update() function,
because we have one more vcpu in the scheduler and this newly added
vcpu could potentially affect the current scheduling vcpus. As time
also elapses, the vcpus in RunQ/DelepletedQ may also be able to update
their deadline and budget (if now > their deadline); So we need to
call __repl_update() to update all vcpus' information and pick the
current highest priority one.

If we remove runq_tickle(), we may not follow the EDF for less than
1ms, since 1ms is the scheduling quantum. In the next nearest
scheduling point, we will call the same logic in schedule() function.

Thank you very much!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-18 16:08   ` George Dunlap
  2014-09-18 18:08     ` Dario Faggioli
  2014-09-19  9:45     ` Dario Faggioli
@ 2014-09-19 16:44     ` Konrad Rzeszutek Wilk
  2014-09-22 17:26       ` Dario Faggioli
  2014-09-20 21:10     ` Meng Xu
  3 siblings, 1 reply; 49+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-19 16:44 UTC (permalink / raw)
  To: George Dunlap
  Cc: ian.campbell, xisisu, stefano.stabellini, lu, dario.faggioli,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb

On Thu, Sep 18, 2014 at 05:08:45PM +0100, George Dunlap wrote:
> On 09/14/2014 10:37 PM, Meng Xu wrote:
> >This scheduler follows the Preemptive Global Earliest Deadline First
> >(EDF) theory in real-time field.
> >At any scheduling point, the VCPU with earlier deadline has higher
> >priority. The scheduler always picks the highest priority VCPU to run on a
> >feasible PCPU.
> >A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
> >idle or has a lower-priority VCPU running on it.)
> >
> >Each VCPU has a dedicated period and budget.
> >The deadline of a VCPU is at the end of each period;
> >A VCPU has its budget replenished at the beginning of each period;
> >While scheduled, a VCPU burns its budget.
> >The VCPU needs to finish its budget before its deadline in each period;
> >The VCPU discards its unused budget at the end of each period.
> >If a VCPU runs out of budget in a period, it has to wait until next period.
> >
> >Each VCPU is implemented as a deferable server.
> >When a VCPU has a task running on it, its budget is continuously burned;
> >When a VCPU has no task but with budget left, its budget is preserved.
> >
> >Queue scheme:
> >A global runqueue and a global depletedq for each CPU pool.
> >The runqueue holds all runnable VCPUs with budget and sorted by deadline;
> >The depletedq holds all VCPUs without budget and unsorted.
> >
> >Note: cpumask and cpupool is supported.
> >
> >This is an experimental scheduler.
> >
> >Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> >Signed-off-by: Sisu Xi <xisisu@gmail.com>
> 
> Getting there, but unfortunately I've got a number more comments.
> 
> Konrad, I think this is very close to being ready -- when is the deadline
> again, and how hard is it?  Would it be better to check it in before the

Sep 23.
> deadline, and then address the things I'm bringing up here?  Or would it be
> better to wait until all the issues are sorted and then check it in (even if
> it's after the deadline)?

We can check it in after the deadline - and have those issues resolved.

I am basing this on the assumption that:
 - The risks of regressions to the rest of schedulers is nill (as this is all
   new codepaths (as this is all
 - The risks of regressions to the rest of the code-base is nill (as this is all
   new).
 - The resolution of the 'couple of things' are not going to lead to more
   'couple of things' and lead to re-design.
 - I am OK with this having bugs (ones that haven't been discovered, or just
   now discovered) - as long as we aim to have them fixed.

The common code that is touched does not look scary to me. And both of the
scheduler maintainers -  you and Dariof are OK with the design and the patchset
(minus the 'couple of things').

Are we aim to have this be experimental for Xen 4.5 or do we want this
to be on the 'stable' ?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-18 16:08   ` George Dunlap
                       ` (2 preceding siblings ...)
  2014-09-19 16:44     ` Konrad Rzeszutek Wilk
@ 2014-09-20 21:10     ` Meng Xu
  2014-09-23 10:47       ` George Dunlap
  3 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-20 21:10 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

Hi George,

2014-09-18 12:08 GMT-04:00 George Dunlap <george.dunlap@eu.citrix.com>:
> On 09/14/2014 10:37 PM, Meng Xu wrote:
>>
>> This scheduler follows the Preemptive Global Earliest Deadline First
>> (EDF) theory in real-time field.
>> At any scheduling point, the VCPU with earlier deadline has higher
>> priority. The scheduler always picks the highest priority VCPU to run on a
>> feasible PCPU.
>> A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
>> idle or has a lower-priority VCPU running on it.)
>>
>> Each VCPU has a dedicated period and budget.
>> The deadline of a VCPU is at the end of each period;
>> A VCPU has its budget replenished at the beginning of each period;
>> While scheduled, a VCPU burns its budget.
>> The VCPU needs to finish its budget before its deadline in each period;
>> The VCPU discards its unused budget at the end of each period.
>> If a VCPU runs out of budget in a period, it has to wait until next
>> period.
>>
>> Each VCPU is implemented as a deferable server.
>> When a VCPU has a task running on it, its budget is continuously burned;
>> When a VCPU has no task but with budget left, its budget is preserved.
>>
>> Queue scheme:
>> A global runqueue and a global depletedq for each CPU pool.
>> The runqueue holds all runnable VCPUs with budget and sorted by deadline;
>> The depletedq holds all VCPUs without budget and unsorted.
>>
>> Note: cpumask and cpupool is supported.
>>
>> This is an experimental scheduler.
>>
>> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
>> Signed-off-by: Sisu Xi <xisisu@gmail.com>
>
>
> Getting there, but unfortunately I've got a number more comments.
>
> Konrad, I think this is very close to being ready -- when is the deadline
> again, and how hard is it?  Would it be better to check it in before the
> deadline, and then address the things I'm bringing up here?  Or would it be
> better to wait until all the issues are sorted and then check it in (even if
> it's after the deadline)?
>
>
> +/*
> + * Debug related code, dump vcpu/cpu information
> + */
> +static void
> +rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
> +{
> +    char cpustr[1024];
> +    cpumask_t *cpupool_mask;
> +
> +    ASSERT(svc != NULL);
> +    /* flag vcpu */
> +    if( svc->sdom == NULL )
> +        return;
> +
> +    cpumask_scnprintf(cpustr, sizeof(cpustr),
> svc->vcpu->cpu_hard_affinity);
> +    printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
> +           " cur_b=%"PRI_stime" cur_d=%"PRI_stime"
> last_start=%"PRI_stime"\n"
> +           " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ",
> +            svc->vcpu->domain->domain_id,
> +            svc->vcpu->vcpu_id,
> +            svc->vcpu->processor,
> +            svc->period,
> +            svc->budget,
> +            svc->cur_budget,
> +            svc->cur_deadline,
> +            svc->last_start,
> +            __vcpu_on_q(svc),
> +            vcpu_runnable(svc->vcpu),
> +            cpustr);
> +    memset(cpustr, 0, sizeof(cpustr));
> +    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> +    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask);
> +    printk("cpupool=%s\n", cpustr);
> +}
> +
> +static void
> +rt_dump_pcpu(const struct scheduler *ops, int cpu)
> +{
> +    struct rt_vcpu *svc = rt_vcpu(curr_on_cpu(cpu));
> +
> +    rt_dump_vcpu(ops, svc);
> +}
>
>
> These svc structures are allocated dynamically and may disappear at any
> time... I think you need the lock to cover this as well.
>
> And since this is called both externally (via ops->dump_cpu_state) and
> internally (below), you probably need a locking version and a non-locking
> version (normally you'd make rt_dump_pcpu() a locking "wrapper" around
> __rt_dump_pcpu()).


I tried to add the schedule lock (prv->lock) here, which causes system
freeze. The reason is because when this function is called by
schedule_dump() in schedule.c, the lock has already been grabbed. So I
don't need the lock here, IMO. :-)

void schedule_dump(struct cpupool *c)

{

    int               i;

    struct scheduler *sched;

    cpumask_t        *cpus;



    sched = (c == NULL) ? &ops : c->sched;

    cpus = cpupool_scheduler_cpumask(c);

    printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);

    SCHED_OP(sched, dump_settings);



    for_each_cpu (i, cpus)

    {

        spinlock_t *lock = pcpu_schedule_lock(i);



        printk("CPU[%02d] ", i);

        SCHED_OP(sched, dump_cpu_state, i);

        pcpu_schedule_unlock(lock, i);

    }

}


>
>> +
>> +static void
>> +rt_dump(const struct scheduler *ops)
>> +{
>> +    struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
>> +    struct rt_private *prv = rt_priv(ops);
>> +    struct rt_vcpu *svc;
>> +    unsigned int cpu = 0;
>> +    cpumask_t *online;
>> +    struct rt_dom *sdom;
>> +    unsigned long flags;
>> +
>> +    ASSERT(!list_empty(&prv->sdom));
>> +    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
>> +    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>> +    runq = rt_runq(ops);
>> +    depletedq = rt_depletedq(ops);
>
>
> Same thing with all these -- other CPUs may be modifying prv->sdom.
>

I need lock here because when this function is called, it has no lock
protection, as you saw above:
    SCHED_OP(sched, dump_settings);
is when this function is called.

So I add a lock here for rt_dump() but not rt_dump_pcpu().

Thanks,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-19 13:11       ` Meng Xu
@ 2014-09-22 17:11         ` Dario Faggioli
  2014-09-22 20:40           ` Meng Xu
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2014-09-22 17:11 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2158 bytes --]

On ven, 2014-09-19 at 09:11 -0400, Meng Xu wrote:
> 2014-09-18 14:08 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> > Actually, what you're saying about the budget, points at another lack of
> > the current implementation, i.e., payback for budget overrun (also
> > mentioned in the same email, I think). That basically means, in the case
> > you're describing, recognizing that the vcpu in question overran, by
> > allowing its budget to go negative, and making it pay a price for the
> > budget to become positive again, in terms on how far its deadline will
> > be set (this can be implemented quite easily, by slightly changing the
> > while loop we were discussing above).
> 
> Hmm, I think we have other ways to pay a price for the overrun budget.
> For example, we can reduce the budget from its next period to pay for
> the overrun budget in the current period if it overruns. When we add
> timers and avoid such MILLISECS(1) scheduling quantum, the overrun
> budget should not be too much, IMO. But anyway, this is something in
> the future to solve because it needs to incorporate the extra timers
> we will add in the future.
> 
Just FTR, we're talking about the same thing. In fact, I of course meant
that overrun must be payed in terms of less than full budget in the
following period. The point is, what it the overuun excedes the budget?
That's when the "pushing the deadline ahead" comes into play.

Say we have a 40ms budget vcpu that it (somehow) manages tor run for
90ms, going down to -50ms. If you push the deadline one period ahead,
and gives it a recharge of 40ms (i.e., it's full budget), that will
leave it with -10ms. What I was saying was, in this case, we give it
another 40ms recharge, but at the price of having its deadline one more
period ahead.

Anyway, that's for future development, as we agreed.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-19 16:44     ` Konrad Rzeszutek Wilk
@ 2014-09-22 17:26       ` Dario Faggioli
  2014-09-22 20:45         ` Meng Xu
  2014-09-22 22:43         ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 49+ messages in thread
From: Dario Faggioli @ 2014-09-22 17:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: ian.campbell, xisisu, stefano.stabellini, George Dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 2555 bytes --]

On ven, 2014-09-19 at 12:44 -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 18, 2014 at 05:08:45PM +0100, George Dunlap wrote:

> > deadline, and then address the things I'm bringing up here?  Or would it be
> > better to wait until all the issues are sorted and then check it in (even if
> > it's after the deadline)?
> 
> We can check it in after the deadline - and have those issues resolved.
> 
FWIW, I think the series looks good now, and in fact I sent in my
Reviewed-by for all of it.

> I am basing this on the assumption that:
>  - The risks of regressions to the rest of schedulers is nill (as this is all
>    new codepaths (as this is all
>  - The risks of regressions to the rest of the code-base is nill (as this is all
>    new).
>  - The resolution of the 'couple of things' are not going to lead to more
>    'couple of things' and lead to re-design.
>
This is all true, IMO.

> The common code that is touched does not look scary to me. And both of the
> scheduler maintainers -  you and Dariof are OK with the design and the patchset
> (minus the 'couple of things').
> 
Exactly.

> Are we aim to have this be experimental for Xen 4.5 or do we want this
> to be on the 'stable' ?
>
Not sure. What I'm sure about is that
1) the interface needs to change a bit, to include support for the
   per-vcpu parameters setting (although, that can happen in a
   backward compatible way, i.e., not touching or altering neither the
   look nor the semantic or the interface we'll be checking in if we
   take v4)
2) there is _a_lot_ to gain, from a performance point of view, and Meng
   already agreed on continuing working toward that, after 4.5

Having it in is, IMO, important, especially for the new
embedded/mobile/automotive uses of Xen we're seeing in these days (in
fact, I think GlobalLogic is using RT-Xen already, so the upstreaming of
this scheduler would be quite useful at least to them [correct me if I'm
wrong]).

However, given 2 above, if we mark it as stable, we risk that people
(mostly people not yet involved into Xen development and not on this
mailing list) would try it, run into non-optimal performance, and get
upset/angry. For that reason, I think I'd go for 'experimental for 4.5'.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-22 17:11         ` Dario Faggioli
@ 2014-09-22 20:40           ` Meng Xu
  0 siblings, 0 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-22 20:40 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

2014-09-22 13:11 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On ven, 2014-09-19 at 09:11 -0400, Meng Xu wrote:
>> 2014-09-18 14:08 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
>
>> > Actually, what you're saying about the budget, points at another lack of
>> > the current implementation, i.e., payback for budget overrun (also
>> > mentioned in the same email, I think). That basically means, in the case
>> > you're describing, recognizing that the vcpu in question overran, by
>> > allowing its budget to go negative, and making it pay a price for the
>> > budget to become positive again, in terms on how far its deadline will
>> > be set (this can be implemented quite easily, by slightly changing the
>> > while loop we were discussing above).
>>
>> Hmm, I think we have other ways to pay a price for the overrun budget.
>> For example, we can reduce the budget from its next period to pay for
>> the overrun budget in the current period if it overruns. When we add
>> timers and avoid such MILLISECS(1) scheduling quantum, the overrun
>> budget should not be too much, IMO. But anyway, this is something in
>> the future to solve because it needs to incorporate the extra timers
>> we will add in the future.
>>
> Just FTR, we're talking about the same thing. In fact, I of course meant
> that overrun must be payed in terms of less than full budget in the
> following period. The point is, what it the overuun excedes the budget?
> That's when the "pushing the deadline ahead" comes into play.
>
> Say we have a 40ms budget vcpu that it (somehow) manages tor run for
> 90ms, going down to -50ms.If you push the deadline one period ahead,
> and gives it a recharge of 40ms (i.e., it's full budget), that will
> leave it with -10ms. What I was saying was, in this case, we give it
> another 40ms recharge, but at the price of having its deadline one more
> period ahead.

I see the point and agree with you, although it seems unlikely to have
such a vcpu to overrun 50ms while its budget is 40ms. But anyway, when
this situation happens, I agree with you about the solution.

>
> Anyway, that's for future development, as we agreed.

Sure. When we arrive there I will propose these options and then
settle down one solution to implement.

Thanks,

Meng

>
> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-22 17:26       ` Dario Faggioli
@ 2014-09-22 20:45         ` Meng Xu
  2014-09-22 22:43         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 49+ messages in thread
From: Meng Xu @ 2014-09-22 20:45 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

2014-09-22 13:26 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On ven, 2014-09-19 at 12:44 -0400, Konrad Rzeszutek Wilk wrote:
>> On Thu, Sep 18, 2014 at 05:08:45PM +0100, George Dunlap wrote:
>
>> > deadline, and then address the things I'm bringing up here?  Or would it be
>> > better to wait until all the issues are sorted and then check it in (even if
>> > it's after the deadline)?
>>
>> We can check it in after the deadline - and have those issues resolved.
>>
> FWIW, I think the series looks good now, and in fact I sent in my
> Reviewed-by for all of it.
>
>> I am basing this on the assumption that:
>>  - The risks of regressions to the rest of schedulers is nill (as this is all
>>    new codepaths (as this is all
>>  - The risks of regressions to the rest of the code-base is nill (as this is all
>>    new).
>>  - The resolution of the 'couple of things' are not going to lead to more
>>    'couple of things' and lead to re-design.
>>
> This is all true, IMO.
>
>> The common code that is touched does not look scary to me. And both of the
>> scheduler maintainers -  you and Dariof are OK with the design and the patchset
>> (minus the 'couple of things').
>>
> Exactly.
>
>> Are we aim to have this be experimental for Xen 4.5 or do we want this
>> to be on the 'stable' ?
>>
> Not sure. What I'm sure about is that
> 1) the interface needs to change a bit, to include support for the
>    per-vcpu parameters setting (although, that can happen in a
>    backward compatible way, i.e., not touching or altering neither the
>    look nor the semantic or the interface we'll be checking in if we
>    take v4)
> 2) there is _a_lot_ to gain, from a performance point of view, and Meng
>    already agreed on continuing working toward that, after 4.5
>
> Having it in is, IMO, important, especially for the new
> embedded/mobile/automotive uses of Xen we're seeing in these days (in
> fact, I think GlobalLogic is using RT-Xen already, so the upstreaming of
> this scheduler would be quite useful at least to them [correct me if I'm
> wrong]).
>
> However, given 2 above, if we mark it as stable, we risk that people
> (mostly people not yet involved into Xen development and not on this
> mailing list) would try it, run into non-optimal performance, and get
> upset/angry. For that reason, I think I'd go for 'experimental for 4.5'.
>

Agree. I think it's better to be experimental for 4.5 first. Then we
will extend the interface as Dario points out and improve the
performance to make the implementation more efficient.

Meng


-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-22 17:26       ` Dario Faggioli
  2014-09-22 20:45         ` Meng Xu
@ 2014-09-22 22:43         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 49+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-22 22:43 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: ian.campbell, xisisu, stefano.stabellini, George Dunlap, lu,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb

On Mon, Sep 22, 2014 at 07:26:28PM +0200, Dario Faggioli wrote:
> On ven, 2014-09-19 at 12:44 -0400, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 18, 2014 at 05:08:45PM +0100, George Dunlap wrote:
> 
> > > deadline, and then address the things I'm bringing up here?  Or would it be
> > > better to wait until all the issues are sorted and then check it in (even if
> > > it's after the deadline)?
> > 
> > We can check it in after the deadline - and have those issues resolved.
> > 
> FWIW, I think the series looks good now, and in fact I sent in my
> Reviewed-by for all of it.
> 
> > I am basing this on the assumption that:
> >  - The risks of regressions to the rest of schedulers is nill (as this is all
> >    new codepaths (as this is all
> >  - The risks of regressions to the rest of the code-base is nill (as this is all
> >    new).
> >  - The resolution of the 'couple of things' are not going to lead to more
> >    'couple of things' and lead to re-design.
> >
> This is all true, IMO.
> 
> > The common code that is touched does not look scary to me. And both of the
> > scheduler maintainers -  you and Dariof are OK with the design and the patchset
> > (minus the 'couple of things').
> > 
> Exactly.
> 
> > Are we aim to have this be experimental for Xen 4.5 or do we want this
> > to be on the 'stable' ?
> >
> Not sure. What I'm sure about is that
> 1) the interface needs to change a bit, to include support for the
>    per-vcpu parameters setting (although, that can happen in a
>    backward compatible way, i.e., not touching or altering neither the
>    look nor the semantic or the interface we'll be checking in if we
>    take v4)
> 2) there is _a_lot_ to gain, from a performance point of view, and Meng
>    already agreed on continuing working toward that, after 4.5
> 
> Having it in is, IMO, important, especially for the new
> embedded/mobile/automotive uses of Xen we're seeing in these days (in
> fact, I think GlobalLogic is using RT-Xen already, so the upstreaming of
> this scheduler would be quite useful at least to them [correct me if I'm
> wrong]).
> 
> However, given 2 above, if we mark it as stable, we risk that people
> (mostly people not yet involved into Xen development and not on this
> mailing list) would try it, run into non-optimal performance, and get
> upset/angry. For that reason, I think I'd go for 'experimental for 4.5'.

OK, will make sure that it is label as such in the release notes/Wiki.

> 
> Regards,
> Dario
> 
> -- 
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] xen: add real time scheduler rtds
  2014-09-20 21:10     ` Meng Xu
@ 2014-09-23 10:47       ` George Dunlap
  0 siblings, 0 replies; 49+ messages in thread
From: George Dunlap @ 2014-09-23 10:47 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Sat, Sep 20, 2014 at 10:10 PM, Meng Xu <xumengpanda@gmail.com> wrote:
>>
>> These svc structures are allocated dynamically and may disappear at any
>> time... I think you need the lock to cover this as well.
>>
>> And since this is called both externally (via ops->dump_cpu_state) and
>> internally (below), you probably need a locking version and a non-locking
>> version (normally you'd make rt_dump_pcpu() a locking "wrapper" around
>> __rt_dump_pcpu()).
>
>
> I tried to add the schedule lock (prv->lock) here, which causes system
> freeze. The reason is because when this function is called by
> schedule_dump() in schedule.c, the lock has already been grabbed. So I
> don't need the lock here, IMO. :-)

Oops. :-)

 -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
                   ` (5 preceding siblings ...)
  2014-09-17 14:15 ` Dario Faggioli
@ 2014-09-23 13:50 ` Ian Campbell
  2014-09-24 20:59   ` Meng Xu
  6 siblings, 1 reply; 49+ messages in thread
From: Ian Campbell @ 2014-09-23 13:50 UTC (permalink / raw)
  To: Meng Xu
  Cc: xisisu, stefano.stabellini, george.dunlap, lu, dario.faggioli,
	ian.jackson, xen-devel, ptxlinh, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
> This serie of patches adds rtds real-time scheduler to Xen.

I've applied v4 of this series, which was apparently unthreaded and had
no introductory email (hence replying to what I think is v3's intro).

For the avoidance of doubt I've applied the patches with these
message-ids in this order:

<1411251228-2093-1-git-send-email-mengxu@cis.upenn.edu>
<1411251258-2131-1-git-send-email-mengxu@cis.upenn.edu>
<1411251283-2169-1-git-send-email-mengxu@cis.upenn.edu>
<1411251302-2207-1-git-send-email-mengxu@cis.upenn.edu>

I think that's right but I suggest you check that the end result in the
staging branch corresponds with what you thought you were sending. 

Note that tools/libxc/xenctrl.h moved to tools/libxc/include/xenctrl.h
since your patch was posted and I fixed that up as I went. Also, it
didn't build for arm32 due to:

        sched_rt.c: In function ‘burn_budget’:
        sched_rt.c:630:17: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘s_time_t’ [-Werror=format]
        
I therefore folded this into the first patch:

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 4c522bb..6c8faeb 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -626,7 +626,7 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
      */
     if ( delta < 0 )
     {
-        printk("%s, ATTENTION: now is behind last_start! delta = %ld\n",
+        printk("%s, ATTENTION: now is behind last_start! delta=%"PRI_stime"\n",
                 __func__, delta);
         svc->last_start = now;
         return;

(this seemed easier than another round trip, I hope it is ok)

Apart from trying not to  forget the introductory mail and threading in
the future, please use the same $subject prefix on the introductory mail
as the rest of the series (e.g. '[PATCH for 4.5 v4 0/4]: Introduce rtds
real-time scheduler for Xen'), so I can figure out what goes with what.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-23 13:50 ` Ian Campbell
@ 2014-09-24 20:59   ` Meng Xu
  2014-09-24 21:14     ` Wei Liu
  0 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-24 20:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

2014-09-23 9:50 GMT-04:00 Ian Campbell <Ian.Campbell@citrix.com>:
> On Sun, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
>> This serie of patches adds rtds real-time scheduler to Xen.
>
> I've applied v4 of this series, which was apparently unthreaded and had
> no introductory email (hence replying to what I think is v3's intro).
>
> For the avoidance of doubt I've applied the patches with these
> message-ids in this order:
>
> <1411251228-2093-1-git-send-email-mengxu@cis.upenn.edu>
> <1411251258-2131-1-git-send-email-mengxu@cis.upenn.edu>
> <1411251283-2169-1-git-send-email-mengxu@cis.upenn.edu>
> <1411251302-2207-1-git-send-email-mengxu@cis.upenn.edu>
>
> I think that's right but I suggest you check that the end result in the
> staging branch corresponds with what you thought you were sending.

Yes, all of these are correct. I double-checked that and run them.

>
> Note that tools/libxc/xenctrl.h moved to tools/libxc/include/xenctrl.h
> since your patch was posted and I fixed that up as I went. Also, it
> didn't build for arm32 due to:
>
>         sched_rt.c: In function ‘burn_budget’:
>         sched_rt.c:630:17: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘s_time_t’ [-Werror=format]
>
> I therefore folded this into the first patch:
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 4c522bb..6c8faeb 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -626,7 +626,7 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
>       */
>      if ( delta < 0 )
>      {
> -        printk("%s, ATTENTION: now is behind last_start! delta = %ld\n",
> +        printk("%s, ATTENTION: now is behind last_start! delta=%"PRI_stime"\n",
>                  __func__, delta);
>          svc->last_start = now;
>          return;
>
> (this seemed easier than another round trip, I hope it is ok)

It's right! Thank you so much!

>
> Apart from trying not to  forget the introductory mail and threading in
> the future, please use the same $subject prefix on the introductory mail
> as the rest of the series (e.g. '[PATCH for 4.5 v4 0/4]: Introduce rtds
> real-time scheduler for Xen'), so I can figure out what goes with what.

I will remember that and avoid this. I used my gmail to send the
introductory email and used my school's email to send the patches,
that could be the reason why it's hard to find the introductory mail.
Anyway, I will avoid this and add the introductory mail as 0/4 as you
said.

Thank you very much!

Best,

Meng

>
> Ian.
>



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-24 20:59   ` Meng Xu
@ 2014-09-24 21:14     ` Wei Liu
  2014-09-25  7:39       ` Ian Campbell
  0 siblings, 1 reply; 49+ messages in thread
From: Wei Liu @ 2014-09-24 21:14 UTC (permalink / raw)
  To: Meng Xu
  Cc: wei.liu2, Ian Campbell, Sisu Xi, Stefano Stabellini,
	George Dunlap, Chenyang Lu, Dario Faggioli, Ian Jackson,
	xen-devel, Linh Thi Xuan Phan, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Wed, Sep 24, 2014 at 04:59:18PM -0400, Meng Xu wrote:
[...]
> 
> >
> > Apart from trying not to  forget the introductory mail and threading in
> > the future, please use the same $subject prefix on the introductory mail
> > as the rest of the series (e.g. '[PATCH for 4.5 v4 0/4]: Introduce rtds
> > real-time scheduler for Xen'), so I can figure out what goes with what.
> 
> I will remember that and avoid this. I used my gmail to send the
> introductory email and used my school's email to send the patches,
> that could be the reason why it's hard to find the introductory mail.
> Anyway, I will avoid this and add the introductory mail as 0/4 as you
> said.
> 

FWIW git format-patch has --cover option to nicely generate a cover
letter with some useful diffstat in it. Then you can add in your stuffs
in cover letter and send that along with other patches all in one go
with git send-email.

Wei.

> Thank you very much!
> 
> Best,
> 
> Meng
> 
> >
> > Ian.
> >
> 
> 
> 
> -- 
> 
> 
> -----------
> Meng Xu
> PhD Student in Computer and Information Science
> University of Pennsylvania
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-24 21:14     ` Wei Liu
@ 2014-09-25  7:39       ` Ian Campbell
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2014-09-25  7:39 UTC (permalink / raw)
  To: Wei Liu
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb

On Wed, 2014-09-24 at 22:14 +0100, Wei Liu wrote:
> On Wed, Sep 24, 2014 at 04:59:18PM -0400, Meng Xu wrote:
> [...]
> > 
> > >
> > > Apart from trying not to  forget the introductory mail and threading in
> > > the future, please use the same $subject prefix on the introductory mail
> > > as the rest of the series (e.g. '[PATCH for 4.5 v4 0/4]: Introduce rtds
> > > real-time scheduler for Xen'), so I can figure out what goes with what.
> > 
> > I will remember that and avoid this. I used my gmail to send the
> > introductory email and used my school's email to send the patches,
> > that could be the reason why it's hard to find the introductory mail.
> > Anyway, I will avoid this and add the introductory mail as 0/4 as you
> > said.
> > 
> 
> FWIW git format-patch has --cover option to nicely generate a cover
> letter with some useful diffstat in it. Then you can add in your stuffs
> in cover letter and send that along with other patches all in one go
> with git send-email.

Also FWIW you can use
         git send-email --in-reply-to=<message.id@domain.example>
to hook the series onto a 0/N mail which you sent manually.

Ian.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-23 13:27 ` Meng Xu
@ 2014-09-23 13:30   ` Ian Campbell
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2014-09-23 13:30 UTC (permalink / raw)
  To: Meng Xu
  Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Tue, 2014-09-23 at 09:27 -0400, Meng Xu wrote:
> Do I need to do anything now?

No (not even this mail was strictly needed), I'm in the middle of my
precommit test script...

Ian.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Introduce rtds real-time scheduler for Xen
  2014-09-20 22:16 Meng Xu
@ 2014-09-23 13:27 ` Meng Xu
  2014-09-23 13:30   ` Ian Campbell
  0 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-23 13:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

Now this patch set has each patch reviewed and acked by maintainers. I
updated the latest status in this email.


2014-09-20 18:16 GMT-04:00 Meng Xu <xumengpanda@gmail.com>:
> This serie of patches adds rtds real-time scheduler to Xen.
>
> In summary, It supports:
> 1) Preemptive Global Earliest Deadline First scheduling policy by
> using a global RunQ for the scheduler;
> 2) Assign/display VCPUs' parameters of each domain (All VCPUs of each
> domain have the same period and budget);
> 3) Supports CPU Pool
> Note:
>  a)  Although the toolstack only allows users to set the paramters of
> all VCPUs of the same domain to the same number, the scheduler
> supports to schedule VCPUs with different parameters of the same
> domain. In Xen 4.6, we plan to support assign/display each VCPU's
> parameters of each domain.
>  b)  Parameters of a domain at tool stack is in microsecond, instead
> of millisecond.
>
> The status of this patch set is as follows:
> [PATCH v4 1/4] xen: add real time scheduler rtds
>       Acked-by: Jan Beulich <jbeulich@suse.com>

Reviewed-and-Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

>
> [PATCH v4 2/4] libxc: add rtds scheduler
>     Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
>     Acked-by: Ian Campbell <ian.campbell@citrix.com>
>     Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> [PATCH v4 3/4] libxl: add rtds scheduler
>      Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
>      Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

>
> [PATCH v4 4/4] xl: introduce rtds scheduler
>      Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
>      Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
>      Acked-by: Ian Campbell <ian.campbell@citrix.com>
>

A quick question:
Do I need to do anything now?

Thanks,

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Introduce rtds real-time scheduler for Xen
@ 2014-09-20 22:16 Meng Xu
  2014-09-23 13:27 ` Meng Xu
  0 siblings, 1 reply; 49+ messages in thread
From: Meng Xu @ 2014-09-20 22:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

This serie of patches adds rtds real-time scheduler to Xen.

In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by
using a global RunQ for the scheduler;
2) Assign/display VCPUs' parameters of each domain (All VCPUs of each
domain have the same period and budget);
3) Supports CPU Pool
Note:
 a)  Although the toolstack only allows users to set the paramters of
all VCPUs of the same domain to the same number, the scheduler
supports to schedule VCPUs with different parameters of the same
domain. In Xen 4.6, we plan to support assign/display each VCPU's
parameters of each domain.
 b)  Parameters of a domain at tool stack is in microsecond, instead
of millisecond.

The status of this patch set is as follows:
[PATCH v4 1/4] xen: add real time scheduler rtds
      Acked-by: Jan Beulich <jbeulich@suse.com>

[PATCH v4 2/4] libxc: add rtds scheduler
    Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

[PATCH v4 3/4] libxl: add rtds scheduler
     Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
     Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

[PATCH v4 4/4] xl: introduce rtds scheduler
     Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
     Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
     Acked-by: Ian Campbell <ian.campbell@citrix.com>

-----------------------------------------------------------------------------------------------------------------------------
TODO after Xen 4.5:
    a) Burn budget in finer granularity instead of 1ms; [medium]
    b) Use separate timer per vcpu for each vcpu's budget
replenishment, instead of scanning the full runqueue every now and
then [medium]
    c) Handle time stolen from domU by hypervisor. When it runs on a
machine with many sockets and lots of cores, the spin-lock for global
RunQ used in rtds scheduler could eat up time from domU, which could
make domU have less budget than it requires. [not sure about
difficulty right now] (Thank Konrad Rzeszutek to point this out in the
XenSummit. :-))
    d) Toolstack supports assiging/display each VCPU's parameters of
each domain.

-----------------------------------------------------------------------------------------------------------------------------
The design of this rtds scheduler is as follows:

This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.

Note: cpumask and cpupool is supported.

If you are intersted in the details of the design and evaluation of
this rt scheduler, please refer to our paper "Real-TimeMulti-Core
Virtual Machine Scheduling in Xen"
(http://www.cis.upenn.edu/~mengxu/emsoft14/emsoft14.pdf), which will
be published in EMSOFT14. This paper has the following details:
    a) Desgin of this scheduler;
    b) Measurement of the implementation overhead, e.g., scheduler
overhead, context switch overhead, etc.
    c) Comparison of this rt scheduler and credit scheduler in terms
of the real-time performance.

If you are interested in other real-time schedulers in Xen, please
refer to the RT-Xen project's website
(https://sites.google.com/site/realtimexen/). It also supports
Preemptive Global Rate Monotonic schedulers.
-----------------------------------------------------------------------------------------------------------------------------
We tested the following commands/scenarios:
#xl list
#xl sched-rtds
//set VCPUs' parameters of each domain to new value
#xl sched-rtds -d Domain-0 -p 20000 -b 10000

// list cpupool information
#xl cpupool-list

//create a cpupool test
#xl cpupool-cpu-remove Pool-0 3
#xl cpupool-cpu-remove Pool-0 2
#xl cpupool-create name=\"test\" sched=\"rtds\"
#xl cpupool-cpu-add test 3
#xl cpupool-cpu-add test 2

//migrate vm1 from cpupool Pool-0 to cpupool test.
#xl cpupool-migrate vm1 test

We also tested the scenarios:
1) System boots with rtds scheduler, create a cpupool with credit
scheduler, and migrate a domU vm1 to the new cpupool;
    #xl cpupool-cpu-remove Pool-0 3
    #xl cpupool-create name=\"test\" sched=\"credit\"
    #xl cpupool-cpu-add test 3
    #xl cpupool-migrate vm1 test
2) System boots with credit scheduler, create a cpupool with rtds
scheduler and migrate a domU vm1 to the new cpupool;
    #xl cpupool-cpu-remove Pool-0 3
    #xl cpupool-cpu-remove Pool-0 2
    #xl cpupool-create name=\"test\" sched=\"rtds\"
    #xl cpupool-cpu-add test 3
    #xl cpupool-cpu-add test 2
    #xl cpupool-migrate vm1 test

-----------------------------------------------------------------------------------------------------------------------------
Any comment, question, and concerns are more than welcome! :-)

Thank you very much!

Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-09-25  7:39 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-14 21:37 Introduce rtds real-time scheduler for Xen Meng Xu
2014-09-14 21:37 ` [PATCH v3 1/4] xen: add real time scheduler rtds Meng Xu
2014-09-15 10:40   ` Jan Beulich
2014-09-16  8:42   ` Dario Faggioli
2014-09-16  8:52     ` Jan Beulich
2014-09-16  8:59       ` Dario Faggioli
2014-09-16 16:38       ` Meng Xu
2014-09-17  9:22         ` Jan Beulich
2014-09-18 16:08   ` George Dunlap
2014-09-18 18:08     ` Dario Faggioli
2014-09-19 13:11       ` Meng Xu
2014-09-22 17:11         ` Dario Faggioli
2014-09-22 20:40           ` Meng Xu
2014-09-19  9:45     ` Dario Faggioli
2014-09-19 16:44     ` Konrad Rzeszutek Wilk
2014-09-22 17:26       ` Dario Faggioli
2014-09-22 20:45         ` Meng Xu
2014-09-22 22:43         ` Konrad Rzeszutek Wilk
2014-09-20 21:10     ` Meng Xu
2014-09-23 10:47       ` George Dunlap
2014-09-14 21:37 ` [PATCH v3 2/4] libxc: add rtds scheduler Meng Xu
2014-09-18 16:15   ` George Dunlap
2014-09-14 21:37 ` [PATCH v3 3/4] libxl: " Meng Xu
2014-09-15 22:07   ` Ian Campbell
2014-09-16  1:11     ` Meng Xu
2014-09-16  1:49       ` Ian Campbell
2014-09-16  3:32         ` Meng Xu
2014-09-16  7:27           ` Dario Faggioli
2014-09-16 16:54             ` Ian Campbell
2014-09-17 10:19               ` Dario Faggioli
2014-09-16  8:04   ` Dario Faggioli
2014-09-16 16:56     ` Ian Campbell
2014-09-18 16:24   ` George Dunlap
2014-09-18 17:19     ` Ian Campbell
2014-09-14 21:37 ` [PATCH v3 4/4] xl: introduce " Meng Xu
2014-09-15 22:18   ` Ian Campbell
2014-09-16  7:49   ` Dario Faggioli
2014-09-18 16:35   ` George Dunlap
2014-09-16  7:43 ` Introduce rtds real-time scheduler for Xen Dario Faggioli
2014-09-17 14:15 ` Dario Faggioli
2014-09-17 14:33   ` Meng Xu
2014-09-18 16:00   ` Meng Xu
2014-09-23 13:50 ` Ian Campbell
2014-09-24 20:59   ` Meng Xu
2014-09-24 21:14     ` Wei Liu
2014-09-25  7:39       ` Ian Campbell
2014-09-20 22:16 Meng Xu
2014-09-23 13:27 ` Meng Xu
2014-09-23 13:30   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.