All of lore.kernel.org
 help / color / mirror / Atom feed
* Introduce rt real-time scheduler for Xen
@ 2014-08-24 22:58 Meng Xu
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 72+ messages in thread
From: Meng Xu @ 2014-08-24 22:58 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb

Hi all,

This serie of patches adds rt real-time scheduler to Xen.

In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler;
2) Assign/display each VCPU's parameters of each domain;
3) Supports CPU Pool

Compared with the set of patches in version RFC v2, this set of patch has the following improvement:
    a) added rt scheduler specific TRACE facility for rt scheduler
    b) more efficient RunQ implementation to avoid scanning the whole RunQ when insert a vcpu without budget.
    c) bug fix for cpupool support.

-----------------------------------------------------------------------------------------------------------------------------
TODO:
    a) Burn budget in finer granularity instead of 1ms; [medium]
    b) Use separate timer per vcpu for each vcpu's budget replenishment, instead of scanning the full runqueue every now and then [medium]
    c) Handle time stolen from domU by hypervisor. When it runs on a machine with many sockets and lots of cores, the spin-lock for global RunQ used in rt scheduler could eat up time from domU, which could make domU have less budget than it requires. [not sure about difficulty right now] (Thank Konrad Rzeszutek to point this out in the XenSummit. :-))

Plan:
    We will work on TODO a) and b) and try to finish these two items before September 10th. (We will also tackle the comments raised in the review of this set of patches.)

-----------------------------------------------------------------------------------------------------------------------------
The design of this rt scheduler is as follows:
This rt scheduler follows the Preemptive Global Earliest Deadline First (GEDF) theory in real-time field.
Each VCPU can have a dedicated period and budget. While scheduled, a VCPU burns its budget. Each VCPU has its budget replenished at the beginning of each of its periods; Each VCPU discards its unused budget at the end of each of its periods. If a VCPU runs out of budget in a period, it has to wait until next period.
The mechanism of how to burn a VCPU's budget depends on the server mechanism implemented for each VCPU.
The mechanism of deciding the priority of VCPUs at each scheduling point is based on the Preemptive Global Earliest Deadline First scheduling scheme.

Server mechanism: a VCPU is implemented as a deferrable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Priority scheme: Global Earliest Deadline First (EDF).
At any scheduling point, the VCPU with earliest deadline has highest priority.

Queue scheme: A global runqueue for each CPU pool.
The runqueue holds all runnable VCPUs.
VCPUs in the runqueue are divided into two parts: with and without remaining budget.
At each part, VCPUs are sorted based on GEDF priority scheme.

Scheduling quanta: 1 ms.

If you are intersted in the details of the design and evaluation of this rt scheduler, please refer to our paper "Real-Time Multi-Core Virtual Machine Scheduling in Xen" (http://www.cis.upenn.edu/~mengxu/emsoft14/emsoft14.pdf), which will be published in EMSOFT14. This paper has the following details:
    a) Desgin of this scheduler;
    b) Measurement of the implementation overhead, e.g., scheduler overhead, context switch overhead, etc.
    c) Comparison of this rt scheduler and credit scheduler in terms of the real-time performance.
-----------------------------------------------------------------------------------------------------------------------------
One scenario to show the functionality of this rt scheduler is as follows:
//list each vcpu's parameters of each domain in cpu pools using rt scheduler
#xl sched-rt
Cpupool Pool-0: sched=EDF
Name                                ID VCPU Period Budget
Domain-0                             0    0  10000  10000
Domain-0                             0    1  20000  20000
Domain-0                             0    2  30000  30000
Domain-0                             0    3  10000  10000
litmus1                              1    0  10000   4000
litmus1                              1    1  10000   4000

//set the parameters of the vcpu 1 of domain litmus1:
# xl sched-rt -d litmus1 -v 1 -p 20000 -b 10000

//domain litmus1's vcpu 1's parameters are changed, display each VCPU's parameters separately:
# xl sched-rt -d litmus1
Name                                ID VCPU Period Budget
litmus1                              1    0  10000   4000
litmus1                              1    1  20000  10000

// list cpupool information
xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0              12        rt       y          2

//create a cpupool test
#xl cpupool-cpu-remove Pool-0 11
#xl cpupool-cpu-remove Pool-0 10
#xl cpupool-create name=\"test\" sched=\"credit\"
#xl cpupool-cpu-add test 11
#xl cpupool-cpu-add test 10
#xl cpupool-list
Name               CPUs   Sched     Active   Domain count
Pool-0              10        rt       y          2
test                 2    credit       y          0   

//migrate litmus1 from cpupool Pool-0 to cpupool test.
#xl cpupool-migrate litmus1 test

//now litmus1 is in cpupool test
# xl sched-credit 
Cpupool test: tslice=30ms ratelimit=1000us
Name                                ID Weight  Cap
litmus1                              1    256    0 

-----------------------------------------------------------------------------------------------------------------------------
This set of patches is tested by using the above scenario and running cpu intensive tasks inside each guest domain. We manually check if a domain can have its required resource without being interferenced by other domains; We also manually checked if the scheduling sequence of vcpus follows the Earliest Deadline First scheduling policy.

Any comment, question, and concerns are more than welcome! :-)

Thank you very much!

Best,

Meng

[PATCH v1 1/4] xen: add real time scheduler rt
[PATCH v1 2/4] libxc: add rt scheduler
[PATCH v1 3/4] libxl: add rt scheduler
[PATCH v1 4/4] xl: introduce rt scheduler

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-24 22:58 Introduce rt real-time scheduler for Xen Meng Xu
@ 2014-08-24 22:58 ` Meng Xu
  2014-08-26 14:27   ` Jan Beulich
                     ` (3 more replies)
  2014-08-24 22:58 ` [PATCH v1 2/4] libxc: add rt scheduler Meng Xu
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 72+ messages in thread
From: Meng Xu @ 2014-08-24 22:58 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb

This scheduler follows the pre-emptive Global EDF theory in real-time field.
Each VCPU can have a dedicated period and budget.
While scheduled, a VCPU burns its budget.
A VCPU has its budget replenished at the beginning of each of its periods;
The VCPU discards its unused budget at the end of each of its periods.
If a VCPU runs out of budget in a period, it has to wait until next period.
The mechanism of how to burn a VCPU's budget depends on the server mechanism
implemented for each VCPU.

Server mechanism: a VCPU is implemented as a deferable server.
When a VCPU is scheduled to execute on a PCPU, its budget is continuously
burned.

Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
At any scheduling point, the VCPU with earliest deadline has highest
priority.

Queue scheme: A global Runqueue for each CPU pool.
The Runqueue holds all runnable VCPUs.
VCPUs in the Runqueue are divided into two parts: with and without budget.
At each part, VCPUs are sorted based on gEDF priority scheme.

Scheduling quantum: 1 ms;

Note: cpumask and cpupool is supported.

This is still in the development phase.

Signed-off-by: Sisu Xi <xisisu@gmail.com>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
---
 xen/common/Makefile         |    1 +
 xen/common/sched_rt.c       | 1205 +++++++++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c       |    4 +-
 xen/include/public/domctl.h |   30 +-
 xen/include/public/trace.h  |    1 +
 xen/include/xen/sched-if.h  |    1 +
 6 files changed, 1239 insertions(+), 3 deletions(-)
 create mode 100644 xen/common/sched_rt.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3683ae3..5a23aa4 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -26,6 +26,7 @@ obj-y += sched_credit.o
 obj-y += sched_credit2.o
 obj-y += sched_sedf.o
 obj-y += sched_arinc653.o
+obj-y += sched_rt.o
 obj-y += schedule.o
 obj-y += shutdown.o
 obj-y += softirq.o
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
new file mode 100644
index 0000000..b1d9e6a
--- /dev/null
+++ b/xen/common/sched_rt.c
@@ -0,0 +1,1205 @@
+/******************************************************************************
+ * Preemptive Global Earliest Deadline First  (EDF) scheduler for Xen
+ * EDF scheduling is one of most popular real-time scheduling algorithm used in
+ * embedded field.
+ *
+ * by Sisu Xi, 2013, Washington University in Saint Louis
+ * and Meng Xu, 2014, University of Pennsylvania
+ *
+ * based on the code of credit Scheduler
+ */
+
+#include <xen/config.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/sched.h>
+#include <xen/domain.h>
+#include <xen/delay.h>
+#include <xen/event.h>
+#include <xen/time.h>
+#include <xen/perfc.h>
+#include <xen/sched-if.h>
+#include <xen/softirq.h>
+#include <asm/atomic.h>
+#include <xen/errno.h>
+#include <xen/trace.h>
+#include <xen/cpu.h>
+#include <xen/keyhandler.h>
+#include <xen/trace.h>
+#include <xen/guest_access.h>
+
+/*
+ * TODO:
+ *
+ * Migration compensation and resist like credit2 to better use cache;
+ * Lock Holder Problem, using yield?
+ * Self switch problem: VCPUs of the same domain may preempt each other;
+ */
+
+/*
+ * Design:
+ *
+ * This scheduler follows the Preemptive Global EDF theory in real-time field.
+ * Each VCPU can have a dedicated period and budget. 
+ * While scheduled, a VCPU burns its budget.
+ * A VCPU has its budget replenished at the beginning of each of its periods;
+ * The VCPU discards its unused budget at the end of each of its periods.
+ * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * The mechanism of how to burn a VCPU's budget depends on the server mechanism
+ * implemented for each VCPU.
+ *
+ * Server mechanism: a VCPU is implemented as a deferable server.
+ * When a VCPU has a task running on it, its budget is continuously burned;
+ * When a VCPU has no task but with budget left, its budget is preserved.
+ *
+ * Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
+ * At any scheduling point, the VCPU with earliest deadline has highest priority.
+ *
+ * Queue scheme: A global runqueue for each CPU pool. 
+ * The runqueue holds all runnable VCPUs. 
+ * VCPUs in the runqueue are divided into two parts: with and without remaining budget. 
+ * At each part, VCPUs are sorted based on EDF priority scheme.
+ *
+ * Scheduling quanta: 1 ms; but accounting the budget is in microsecond.
+ *
+ * Note: cpumask and cpupool is supported.
+ */
+
+/*
+ * Locking:
+ * Just like credit2, a global system lock is used to protect the RunQ.
+ * The global lock is referenced by schedule_data.schedule_lock from all physical cpus.
+ *
+ * The lock is already grabbed when calling wake/sleep/schedule/ functions in schedule.c
+ *
+ * The functions involes RunQ and needs to grab locks are:
+ *    dump, vcpu_insert, vcpu_remove, context_saved,
+ */
+
+
+/*
+ * Default parameters: Period and budget in default is 10 and 4 ms, respectively
+ */
+#define RT_DEFAULT_PERIOD     (MICROSECS(10))
+#define RT_DEFAULT_BUDGET     (MICROSECS(4))
+
+/*
+ * Flags
+ */
+/* RT_scheduled: Is this vcpu either running on, or context-switching off,
+ * a phyiscal cpu?
+ * + Accessed only with Runqueue lock held.
+ * + Set when chosen as next in rt_schedule().
+ * + Cleared after context switch has been saved in rt_context_saved()
+ * + Checked in vcpu_wake to see if we can add to the Runqueue, or if we should
+ *   set RT_delayed_runq_add
+ * + Checked to be false in runq_insert.
+ */
+#define __RT_scheduled            1
+#define RT_scheduled (1<<__RT_scheduled)
+/* RT_delayed_runq_add: Do we need to add this to the Runqueueu once it'd done 
+ * being context switching out?
+ * + Set when scheduling out in rt_schedule() if prev is runable
+ * + Set in rt_vcpu_wake if it finds RT_scheduled set
+ * + Read in rt_context_saved(). If set, it adds prev to the Runqueue and
+ *   clears the bit.
+ *
+ */
+#define __RT_delayed_runq_add     2
+#define RT_delayed_runq_add (1<<__RT_delayed_runq_add)
+
+/*
+ * Debug only. Used to printout debug information
+ */
+#define printtime()\
+        ({s_time_t now = NOW(); \
+          printk("%u : %3ld.%3ldus : %-19s ",\
+          smp_processor_id(), now/MICROSECS(1), now%MICROSECS(1)/1000, __func__);} )
+
+/*
+ * rt tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_RT_TICKLE           TRC_SCHED_CLASS_EVT(RT, 1)
+#define TRC_RT_RUNQ_PICK        TRC_SCHED_CLASS_EVT(RT, 2)
+#define TRC_RT_BUDGET_BURN      TRC_SCHED_CLASS_EVT(RT, 3)
+#define TRC_RT_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RT, 4)
+#define TRC_RT_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RT, 5)
+#define TRC_RT_VCPU_DUMP        TRC_SCHED_CLASS_EVT(RT, 6)
+
+/*
+ * Systme-wide private data, include a global RunQueue
+ * The global lock is referenced by schedule_data.schedule_lock from all physical cpus.
+ * It can be grabbed via vcpu_schedule_lock_irq()
+ */
+struct rt_private {
+    spinlock_t lock;           /* The global coarse grand lock */
+    struct list_head sdom;     /* list of availalbe domains, used for dump */
+    struct list_head runq;     /* Ordered list of runnable VMs */
+    struct rt_vcpu *flag_vcpu; /* position of the first depleted vcpu (zero budget) */
+    cpumask_t cpus;            /* cpumask_t of available physical cpus */
+    cpumask_t tickled;         /* another cpu in the queue already ticked this one */
+};
+
+/*
+ * Virtual CPU
+ */
+struct rt_vcpu {
+    struct list_head runq_elem; /* On the runqueue list */
+    struct list_head sdom_elem; /* On the domain VCPU list */
+
+    /* Up-pointers */
+    struct rt_dom *sdom;
+    struct vcpu *vcpu;
+
+    /* VCPU parameters, in milliseconds */
+    s_time_t period;
+    s_time_t budget;
+
+    /* VCPU current infomation in nanosecond */
+    long cur_budget;             /* current budget */
+    s_time_t last_start;        /* last start time */
+    s_time_t cur_deadline;      /* current deadline for EDF */
+
+    unsigned flags;             /* mark __RT_scheduled, etc.. */
+};
+
+/*
+ * Domain
+ */
+struct rt_dom {
+    struct list_head vcpu;      /* link its VCPUs */
+    struct list_head sdom_elem; /* link list on rt_priv */
+    struct domain *dom;         /* pointer to upper domain */
+};
+
+/*
+ * Useful inline functions
+ */
+static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
+{
+    return _ops->sched_data;
+}
+
+static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
+{
+    return _vcpu->sched_priv;
+}
+
+static inline struct rt_dom *RT_DOM(const struct domain *_dom)
+{
+    return _dom->sched_priv;
+}
+
+static inline struct list_head *RUNQ(const struct scheduler *_ops)
+{
+    return &RT_PRIV(_ops)->runq;
+}
+
+//#define RT_PRIV(_ops)     ((struct rt_private *)((_ops)->sched_data))
+//#define RT_VCPU(_vcpu)    ((struct rt_vcpu *)(_vcpu)->sched_priv)
+//#define RT_DOM(_dom)      ((struct rt_dom *)(_dom)->sched_priv)
+//#define RUNQ(_ops)              (&RT_PRIV(_ops)->runq)
+
+/*
+ * RunQueue helper functions
+ */
+static int
+__vcpu_on_runq(const struct rt_vcpu *svc)
+{
+   return !list_empty(&svc->runq_elem);
+}
+
+static struct rt_vcpu *
+__runq_elem(struct list_head *elem)
+{
+    return list_entry(elem, struct rt_vcpu, runq_elem);
+}
+
+/*
+ * Debug related code, dump vcpu/cpu information
+ */
+static void
+rt_dump_vcpu(const struct rt_vcpu *svc)
+{
+    char cpustr[1024];
+
+    ASSERT(svc != NULL);
+    /* flag vcpu */
+    if( svc->sdom == NULL )
+        return;
+
+    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
+    printk("[%5d.%-2u] cpu %u, (%"PRId64", %"PRId64"), cur_b=%"PRId64" cur_d=%"PRId64" last_start=%"PRId64" onR=%d runnable=%d cpu_hard_affinity=%s ",
+            svc->vcpu->domain->domain_id,
+            svc->vcpu->vcpu_id,
+            svc->vcpu->processor,
+            svc->period,
+            svc->budget,
+            svc->cur_budget,
+            svc->cur_deadline,
+            svc->last_start,
+            __vcpu_on_runq(svc),
+            vcpu_runnable(svc->vcpu),
+            cpustr);
+    memset(cpustr, 0, sizeof(cpustr));
+    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool));
+    printk("cpupool=%s\n", cpustr);
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16,vcpu:16;
+            unsigned processor;
+            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, cur_deadline_hi;
+            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
+        } d;
+        d.dom = svc->vcpu->domain->domain_id;
+        d.vcpu = svc->vcpu->vcpu_id;
+        d.processor = svc->vcpu->processor;
+        d.cur_budget_lo = (unsigned) svc->cur_budget;
+        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
+        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
+        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
+        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
+        trace_var(TRC_RT_VCPU_DUMP, 1,
+                  sizeof(d),
+                  (unsigned char *)&d);
+    }
+}
+
+static void
+rt_dump_pcpu(const struct scheduler *ops, int cpu)
+{
+    struct rt_vcpu *svc = RT_VCPU(curr_on_cpu(cpu));
+
+    printtime();
+    rt_dump_vcpu(svc);
+}
+
+/*
+ * should not need lock here. only showing stuff 
+ */
+static void
+rt_dump(const struct scheduler *ops)
+{
+    struct list_head *iter_sdom, *iter_svc, *runq, *iter;
+    struct rt_private *prv = RT_PRIV(ops);
+    struct rt_vcpu *svc;
+    unsigned int cpu = 0;
+    unsigned int loop = 0;
+
+    printtime();
+    printk("Priority Scheme: EDF\n");
+
+    printk("PCPU info:\n");
+    for_each_cpu(cpu, &prv->cpus) 
+        rt_dump_pcpu(ops, cpu);
+
+    printk("Global RunQueue info:\n");
+    loop = 0;
+    runq = RUNQ(ops);
+    list_for_each( iter, runq ) 
+    {
+        svc = __runq_elem(iter);
+        printk("\t%3d: ", ++loop);
+        rt_dump_vcpu(svc);
+    }
+
+    printk("Domain info:\n");
+    loop = 0;
+    list_for_each( iter_sdom, &prv->sdom ) 
+    {
+        struct rt_dom *sdom;
+        sdom = list_entry(iter_sdom, struct rt_dom, sdom_elem);
+        printk("\tdomain: %d\n", sdom->dom->domain_id);
+
+        list_for_each( iter_svc, &sdom->vcpu ) 
+        {
+            svc = list_entry(iter_svc, struct rt_vcpu, sdom_elem);
+            printk("\t\t%3d: ", ++loop);
+            rt_dump_vcpu(svc);
+        }
+    }
+
+    printk("\n");
+}
+
+static inline void
+__runq_remove(struct rt_vcpu *svc)
+{
+    if ( __vcpu_on_runq(svc) )
+        list_del_init(&svc->runq_elem);
+}
+
+/*
+ * Insert a vcpu in the RunQ based on vcpus' deadline: 
+ * EDF schedule policy: vcpu with smaller deadline has higher priority;
+ * The vcpu svc to be inserted will be inserted just before the very first 
+ * vcpu iter_svc in the Runqueue whose deadline is equal or larger than 
+ * svc's deadline.
+ */
+static void
+__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+{
+    struct rt_private *prv = RT_PRIV(ops);
+    struct list_head *runq = RUNQ(ops);
+    struct list_head *iter;
+    spinlock_t *schedule_lock;
+    
+    schedule_lock = per_cpu(schedule_data, svc->vcpu->processor).schedule_lock;
+    ASSERT( spin_is_locked(schedule_lock) );
+    
+    /* Debug only */
+    if ( __vcpu_on_runq(svc) )
+    {
+        rt_dump(ops);
+    }
+    ASSERT( !__vcpu_on_runq(svc) );
+
+    /* svc still has budget */
+    if ( svc->cur_budget > 0 ) 
+    {
+        list_for_each(iter, runq) 
+        {
+            struct rt_vcpu * iter_svc = __runq_elem(iter);
+            if ( iter_svc->cur_budget == 0 ||
+                 svc->cur_deadline <= iter_svc->cur_deadline )
+                    break;
+         }
+        list_add_tail(&svc->runq_elem, iter);
+     }
+    else 
+    { /* svc has no budget */
+        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
+    }
+}
+
+/*
+ * Init/Free related code
+ */
+static int
+rt_init(struct scheduler *ops)
+{
+    struct rt_private *prv = xzalloc(struct rt_private);
+
+    if ( prv == NULL )
+        return -ENOMEM;
+
+    spin_lock_init(&prv->lock);
+    INIT_LIST_HEAD(&prv->sdom);
+    INIT_LIST_HEAD(&prv->runq);
+
+    prv->flag_vcpu = xzalloc(struct rt_vcpu);
+    prv->flag_vcpu->cur_budget = 0;
+    prv->flag_vcpu->sdom = NULL; /* distinguish this vcpu with others */
+    list_add(&prv->flag_vcpu->runq_elem, &prv->runq);
+
+    ops->sched_data = prv;
+
+    printtime();
+    printk("\n");
+
+    return 0;
+}
+
+static void
+rt_deinit(const struct scheduler *ops)
+{
+    struct rt_private *prv = RT_PRIV(ops);
+
+    printtime();
+    printk("\n");
+    xfree(prv->flag_vcpu);
+    xfree(prv);
+}
+
+/* 
+ * point per_cpu spinlock to the global system lock; all cpu have same global system lock 
+ */
+static void *
+rt_alloc_pdata(const struct scheduler *ops, int cpu)
+{
+    struct rt_private *prv = RT_PRIV(ops);
+
+    cpumask_set_cpu(cpu, &prv->cpus);
+
+    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+
+    printtime();
+    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
+    /* same as credit2, not a bogus pointer */
+    return (void *)1;
+}
+
+static void
+rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+{
+    struct rt_private * prv = RT_PRIV(ops);
+    cpumask_clear_cpu(cpu, &prv->cpus);
+    printtime();
+    printk("%s cpu=%d\n", __FUNCTION__, cpu);
+}
+
+static void *
+rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
+{
+    unsigned long flags;
+    struct rt_dom *sdom;
+    struct rt_private * prv = RT_PRIV(ops);
+
+    printtime();
+    printk("dom=%d\n", dom->domain_id);
+
+    sdom = xzalloc(struct rt_dom);
+    if ( sdom == NULL ) 
+    {
+        printk("%s, xzalloc failed\n", __func__);
+        return NULL;
+    }
+
+    INIT_LIST_HEAD(&sdom->vcpu);
+    INIT_LIST_HEAD(&sdom->sdom_elem);
+    sdom->dom = dom;
+
+    /* spinlock here to insert the dom */
+    spin_lock_irqsave(&prv->lock, flags);
+    list_add_tail(&sdom->sdom_elem, &(prv->sdom));
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    return sdom;
+}
+
+static void
+rt_free_domdata(const struct scheduler *ops, void *data)
+{
+    unsigned long flags;
+    struct rt_dom *sdom = data;
+    struct rt_private *prv = RT_PRIV(ops);
+
+    printtime();
+    printk("dom=%d\n", sdom->dom->domain_id);
+
+    spin_lock_irqsave(&prv->lock, flags);
+    list_del_init(&sdom->sdom_elem);
+    spin_unlock_irqrestore(&prv->lock, flags);
+    xfree(data);
+}
+
+static int
+rt_dom_init(const struct scheduler *ops, struct domain *dom)
+{
+    struct rt_dom *sdom;
+
+    printtime();
+    printk("dom=%d\n", dom->domain_id);
+
+    /* IDLE Domain does not link on rt_private */
+    if ( is_idle_domain(dom) ) 
+        return 0;
+
+    sdom = rt_alloc_domdata(ops, dom);
+    if ( sdom == NULL ) 
+    {
+        printk("%s, failed\n", __func__);
+        return -ENOMEM;
+    }
+    dom->sched_priv = sdom;
+
+    return 0;
+}
+
+static void
+rt_dom_destroy(const struct scheduler *ops, struct domain *dom)
+{
+    printtime();
+    printk("dom=%d\n", dom->domain_id);
+
+    rt_free_domdata(ops, RT_DOM(dom));
+}
+
+static void *
+rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+{
+    struct rt_vcpu *svc;
+    s_time_t now = NOW();
+    long count;
+
+    /* Allocate per-VCPU info */
+    svc = xzalloc(struct rt_vcpu);
+    if ( svc == NULL ) 
+    {
+        printk("%s, xzalloc failed\n", __func__);
+        return NULL;
+    }
+
+    INIT_LIST_HEAD(&svc->runq_elem);
+    INIT_LIST_HEAD(&svc->sdom_elem);
+    svc->flags = 0U;
+    svc->sdom = dd;
+    svc->vcpu = vc;
+    svc->last_start = 0;            /* init last_start is 0 */
+
+    svc->period = RT_DEFAULT_PERIOD;
+    if ( !is_idle_vcpu(vc) )
+        svc->budget = RT_DEFAULT_BUDGET;
+
+    count = (now/MICROSECS(svc->period)) + 1;
+    /* sync all VCPU's start time to 0 */
+    svc->cur_deadline += count * MICROSECS(svc->period);
+
+    svc->cur_budget = svc->budget*1000; /* counting in microseconds level */
+    /* Debug only: dump new vcpu's info */
+    printtime();
+    rt_dump_vcpu(svc);
+
+    return svc;
+}
+
+static void
+rt_free_vdata(const struct scheduler *ops, void *priv)
+{
+    struct rt_vcpu *svc = priv;
+
+    /* Debug only: dump freed vcpu's info */
+    printtime();
+    rt_dump_vcpu(svc);
+    xfree(svc);
+}
+
+/*
+ * TODO: Do we need to add vc to the new Runqueue?
+ * This function is called in sched_move_domain() in schedule.c
+ * When move a domain to a new cpupool, 
+ * may have to add vc to the Runqueue of the new cpupool
+ */
+static void
+rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu *svc = RT_VCPU(vc);
+
+    /* Debug only: dump info of vcpu to insert */
+    printtime();
+    rt_dump_vcpu(svc);
+
+    /* not addlocate idle vcpu to dom vcpu list */
+    if ( is_idle_vcpu(vc) )
+        return;
+
+    list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu);   /* add to dom vcpu list */
+}
+
+/*
+ * TODO: same as rt_vcpu_insert()
+ */
+static void
+rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * const svc = RT_VCPU(vc);
+    struct rt_dom * const sdom = svc->sdom;
+
+    printtime();
+    rt_dump_vcpu(svc);
+
+    BUG_ON( sdom == NULL );
+    BUG_ON( __vcpu_on_runq(svc) );
+
+    if ( !is_idle_vcpu(vc) ) 
+        list_del_init(&svc->sdom_elem);
+}
+
+/* 
+ * Pick a valid CPU for the vcpu vc
+ * Valid CPU of a vcpu is intesection of vcpu's affinity and available cpus
+ */
+static int
+rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+{
+    cpumask_t cpus;
+    cpumask_t *online;
+    int cpu;
+    struct rt_private * prv = RT_PRIV(ops);
+
+    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
+    cpumask_and(&cpus, &prv->cpus, online);
+    cpumask_and(&cpus, &cpus, vc->cpu_hard_affinity);
+
+    cpu = cpumask_test_cpu(vc->processor, &cpus)
+            ? vc->processor 
+            : cpumask_cycle(vc->processor, &cpus);
+    ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
+
+    return cpu;
+}
+
+/*
+ * Burn budget at microsecond level. 
+ */
+static void
+burn_budgets(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) 
+{
+    s_time_t delta;
+    long count = 0;
+
+    /* don't burn budget for idle VCPU */
+    if ( is_idle_vcpu(svc->vcpu) ) 
+    {
+        return;
+    }
+
+    /* first time called for this svc, update last_start */
+    if ( svc->last_start == 0 ) 
+    {
+        svc->last_start = now;
+        return;
+    }
+
+    /*
+     * update deadline info: When deadline is in the past,
+     * it need to be updated to the deadline of the current period,
+     * and replenish the budget 
+     */
+    delta = now - svc->cur_deadline;
+    if ( delta >= 0 ) 
+    {
+        count = ( delta/MICROSECS(svc->period) ) + 1;
+        svc->cur_deadline += count * MICROSECS(svc->period);
+        svc->cur_budget = svc->budget * 1000;
+
+        /* TRACE */
+        {
+            struct {
+                unsigned dom:16,vcpu:16;
+                unsigned cur_budget_lo, cur_budget_hi;
+            } d;
+            d.dom = svc->vcpu->domain->domain_id;
+            d.vcpu = svc->vcpu->vcpu_id;
+            d.cur_budget_lo = (unsigned) svc->cur_budget;
+            d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+            trace_var(TRC_RT_BUDGET_REPLENISH, 1,
+                      sizeof(d),
+                      (unsigned char *) &d);
+        }
+
+        return;
+    }
+
+    /* burn at nanoseconds level */
+    delta = now - svc->last_start;
+    /* 
+     * delta < 0 only happens in nested virtualization;
+     * TODO: how should we handle delta < 0 in a better way? */
+    if ( delta < 0 ) 
+    {
+        printk("%s, ATTENTION: now is behind last_start! delta = %ld for ",
+                __func__, delta);
+        rt_dump_vcpu(svc);
+        svc->last_start = now;  /* update last_start */
+        svc->cur_budget = 0;   /* FIXME: should we recover like this? */
+        return;
+    }
+
+    if ( svc->cur_budget == 0 ) 
+        return;
+
+    svc->cur_budget -= delta;
+    if ( svc->cur_budget < 0 ) 
+        svc->cur_budget = 0;
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16, vcpu:16;
+            unsigned cur_budget_lo;
+            unsigned cur_budget_hi;
+            int delta;
+        } d;
+        d.dom = svc->vcpu->domain->domain_id;
+        d.vcpu = svc->vcpu->vcpu_id;
+        d.cur_budget_lo = (unsigned) svc->cur_budget;
+        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+        d.delta = delta;
+        trace_var(TRC_RT_BUDGET_BURN, 1,
+                  sizeof(d),
+                  (unsigned char *) &d);
+    }
+}
+
+/* 
+ * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
+ * lock is grabbed before calling this function 
+ */
+static struct rt_vcpu *
+__runq_pick(const struct scheduler *ops, cpumask_t mask)
+{
+    struct list_head *runq = RUNQ(ops);
+    struct list_head *iter;
+    struct rt_vcpu *svc = NULL;
+    struct rt_vcpu *iter_svc = NULL;
+    cpumask_t cpu_common;
+    cpumask_t *online;
+    struct rt_private * prv = RT_PRIV(ops);
+
+    list_for_each(iter, runq) 
+    {
+        iter_svc = __runq_elem(iter);
+
+        /* flag vcpu */
+        if(iter_svc->sdom == NULL)
+            break;
+
+        /* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */
+        online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
+        cpumask_and(&cpu_common, online, &prv->cpus);
+        cpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity);
+        cpumask_and(&cpu_common, &mask, &cpu_common);
+        if ( cpumask_empty(&cpu_common) )
+            continue;
+
+        ASSERT( iter_svc->cur_budget > 0 );
+
+        svc = iter_svc;
+        break;
+    }
+
+    /* TRACE */
+    {
+        if( svc != NULL )
+        {
+            struct {
+                unsigned dom:16, vcpu:16;
+                unsigned cur_deadline_lo, cur_deadline_hi;
+                unsigned cur_budget_lo, cur_budget_hi;
+            } d;
+            d.dom = svc->vcpu->domain->domain_id;
+            d.vcpu = svc->vcpu->vcpu_id;
+            d.cur_deadline_lo = (unsigned) svc->cur_deadline;
+            d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
+            d.cur_budget_lo = (unsigned) svc->cur_budget;
+            d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
+            trace_var(TRC_RT_RUNQ_PICK, 1,
+                      sizeof(d),
+                      (unsigned char *) &d);
+        }
+        else
+            trace_var(TRC_RT_RUNQ_PICK, 1, 0, NULL);
+    }
+
+    return svc;
+}
+
+/*
+ * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq
+ * lock is grabbed before calling this function 
+ */
+static void
+__repl_update(const struct scheduler *ops, s_time_t now)
+{
+    struct list_head *runq = RUNQ(ops);
+    struct list_head *iter;
+    struct list_head *tmp;
+    struct rt_vcpu *svc = NULL;
+
+    s_time_t diff;
+    long count;
+
+    list_for_each_safe(iter, tmp, runq) 
+    {
+        svc = __runq_elem(iter);
+
+        /* not update flag_vcpu's budget */
+        if(svc->sdom == NULL)
+            continue;
+
+        diff = now - svc->cur_deadline;
+        if ( diff > 0 ) 
+        {
+            count = (diff/MICROSECS(svc->period)) + 1;
+            svc->cur_deadline += count * MICROSECS(svc->period);
+            svc->cur_budget = svc->budget * 1000;
+            __runq_remove(svc);
+            __runq_insert(ops, svc);
+        }
+    }
+}
+
+/* 
+ * schedule function for rt scheduler.
+ * The lock is already grabbed in schedule.c, no need to lock here 
+ */
+static struct task_slice
+rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+{
+    const int cpu = smp_processor_id();
+    struct rt_private * prv = RT_PRIV(ops);
+    struct rt_vcpu * const scurr = RT_VCPU(current);
+    struct rt_vcpu * snext = NULL;
+    struct task_slice ret = { .migrated = 0 };
+
+    /* clear ticked bit now that we've been scheduled */
+    if ( cpumask_test_cpu(cpu, &prv->tickled) )
+        cpumask_clear_cpu(cpu, &prv->tickled);
+
+    /* burn_budget would return for IDLE VCPU */
+    burn_budgets(ops, scurr, now);
+
+    __repl_update(ops, now);
+
+    if ( tasklet_work_scheduled ) 
+    {
+        snext = RT_VCPU(idle_vcpu[cpu]);
+    } 
+    else 
+    {
+        cpumask_t cur_cpu;
+        cpumask_clear(&cur_cpu);
+        cpumask_set_cpu(cpu, &cur_cpu);
+        snext = __runq_pick(ops, cur_cpu);
+        if ( snext == NULL )
+            snext = RT_VCPU(idle_vcpu[cpu]);
+
+        /* if scurr has higher priority and budget, still pick scurr */
+        if ( !is_idle_vcpu(current) &&
+             vcpu_runnable(current) &&
+             scurr->cur_budget > 0 &&
+             ( is_idle_vcpu(snext->vcpu) ||
+               scurr->cur_deadline <= snext->cur_deadline ) ) 
+            snext = scurr;
+    }
+
+    if ( snext != scurr &&
+         !is_idle_vcpu(current) &&
+         vcpu_runnable(current) )
+        set_bit(__RT_delayed_runq_add, &scurr->flags);
+    
+
+    snext->last_start = now;
+    if ( !is_idle_vcpu(snext->vcpu) ) 
+    {
+        if ( snext != scurr ) 
+        {
+            __runq_remove(snext);
+            set_bit(__RT_scheduled, &snext->flags);
+        }
+        if ( snext->vcpu->processor != cpu ) 
+        {
+            snext->vcpu->processor = cpu;
+            ret.migrated = 1;
+        }
+    }
+
+    ret.time = MILLISECS(1); /* sched quantum */
+    ret.task = snext->vcpu;
+
+    /* TRACE */
+    {
+        struct {
+            unsigned dom:16,vcpu:16;
+            unsigned cur_deadline_lo, cur_deadline_hi;
+            unsigned cur_budget_lo, cur_budget_hi;
+        } d;
+        d.dom = snext->vcpu->domain->domain_id;
+        d.vcpu = snext->vcpu->vcpu_id;
+        d.cur_deadline_lo = (unsigned) snext->cur_deadline;
+        d.cur_deadline_hi = (unsigned) (snext->cur_deadline >> 32);
+        d.cur_budget_lo = (unsigned) snext->cur_budget;
+        d.cur_budget_hi = (unsigned) (snext->cur_budget >> 32);
+        trace_var(TRC_RT_SCHED_TASKLET, 1,
+                  sizeof(d),
+                  (unsigned char *)&d);
+    }
+
+    return ret;
+}
+
+/*
+ * Remove VCPU from RunQ
+ * The lock is already grabbed in schedule.c, no need to lock here 
+ */
+static void
+rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * const svc = RT_VCPU(vc);
+
+    BUG_ON( is_idle_vcpu(vc) );
+
+    if ( curr_on_cpu(vc->processor) == vc ) 
+        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    else if ( __vcpu_on_runq(svc) ) 
+        __runq_remove(svc);
+    else if ( test_bit(__RT_delayed_runq_add, &svc->flags) )
+        clear_bit(__RT_delayed_runq_add, &svc->flags);
+}
+
+/*
+ * Pick a vcpu on a cpu to kick out to place the running candidate
+ * Called by wake() and context_saved()
+ * We have a running candidate here, the kick logic is:
+ * Among all the cpus that are within the cpu affinity
+ * 1) if the new->cpu is idle, kick it. This could benefit cache hit
+ * 2) if there are any idle vcpu, kick it.
+ * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
+ *    if snext has higher priority, kick it.
+ *
+ * TODO:
+ * 1) what if these two vcpus belongs to the same domain?
+ *    replace a vcpu belonging to the same domain introduces more overhead
+ *
+ * lock is grabbed before calling this function 
+ */
+static void
+runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
+{
+    struct rt_private * prv = RT_PRIV(ops);
+    struct rt_vcpu * latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
+    struct rt_vcpu * iter_svc;
+    struct vcpu * iter_vc;
+    int cpu = 0, cpu_to_tickle = 0;
+    cpumask_t not_tickled;
+    cpumask_t *online;
+
+    if ( new == NULL || is_idle_vcpu(new->vcpu) ) 
+        return;
+
+    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
+    cpumask_and(&not_tickled, online, &prv->cpus);
+    cpumask_and(&not_tickled, &not_tickled, new->vcpu->cpu_hard_affinity);
+    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
+
+    /* 1) if new's previous cpu is idle, kick it for cache benefit */
+    if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) ) 
+    {
+        cpu_to_tickle = new->vcpu->processor;
+        goto out;
+    }
+
+    /* 2) if there are any idle pcpu, kick it */
+    /* The same loop also find the one with lowest priority */
+    for_each_cpu(cpu, &not_tickled) 
+    {
+        iter_vc = curr_on_cpu(cpu);
+        if ( is_idle_vcpu(iter_vc) ) 
+        {
+            cpu_to_tickle = cpu;
+            goto out;
+        }
+        iter_svc = RT_VCPU(iter_vc);
+        if ( latest_deadline_vcpu == NULL || 
+             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
+            latest_deadline_vcpu = iter_svc;
+    }
+
+    /* 3) candicate has higher priority, kick out the lowest priority vcpu */
+    if ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline ) 
+    {
+        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
+        goto out;
+    }
+
+out:
+    /* TRACE */ 
+    {
+        struct {
+            unsigned cpu:8, pad:24;
+        } d;
+        d.cpu = cpu_to_tickle;
+        d.pad = 0;
+        trace_var(TRC_RT_TICKLE, 0,
+                  sizeof(d),
+                  (unsigned char *)&d);
+    }
+
+    cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
+    cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
+    return;    
+}
+
+/* 
+ * Should always wake up runnable vcpu, put it back to RunQ. 
+ * Check priority to raise interrupt 
+ * The lock is already grabbed in schedule.c, no need to lock here 
+ * TODO: what if these two vcpus belongs to the same domain? 
+ */
+static void
+rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * const svc = RT_VCPU(vc);
+    s_time_t diff;
+    s_time_t now = NOW();
+    long count = 0;
+    struct rt_private * prv = RT_PRIV(ops);
+    struct rt_vcpu * snext = NULL;        /* highest priority on RunQ */
+
+    BUG_ON( is_idle_vcpu(vc) );
+
+    if ( unlikely(curr_on_cpu(vc->processor) == vc) ) 
+        return;
+
+    /* on RunQ, just update info is ok */
+    if ( unlikely(__vcpu_on_runq(svc)) ) 
+        return;
+
+    /* If context hasn't been saved for this vcpu yet, we can't put it on
+     * the Runqueue. Instead, we set a flag so that it will be put on the Runqueue
+     * After the context has been saved. */
+    if ( unlikely(test_bit(__RT_scheduled, &svc->flags)) ) 
+    {
+        set_bit(__RT_delayed_runq_add, &svc->flags);
+        return;
+    }
+
+    /* update deadline info */
+    diff = now - svc->cur_deadline;
+    if ( diff >= 0 ) 
+    {
+        count = ( diff/MICROSECS(svc->period) ) + 1;
+        svc->cur_deadline += count * MICROSECS(svc->period);
+        svc->cur_budget = svc->budget * 1000;
+    }
+
+    __runq_insert(ops, svc);
+    __repl_update(ops, now);
+    snext = __runq_pick(ops, prv->cpus);    /* pick snext from ALL valid cpus */
+    runq_tickle(ops, snext);
+
+    return;
+}
+
+/* 
+ * scurr has finished context switch, insert it back to the RunQ,
+ * and then pick the highest priority vcpu from runq to run 
+ */
+static void
+rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+{
+    struct rt_vcpu * svc = RT_VCPU(vc);
+    struct rt_vcpu * snext = NULL;
+    struct rt_private * prv = RT_PRIV(ops);
+    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+
+    clear_bit(__RT_scheduled, &svc->flags);
+    /* not insert idle vcpu to runq */
+    if ( is_idle_vcpu(vc) ) 
+        goto out;
+
+    if ( test_and_clear_bit(__RT_delayed_runq_add, &svc->flags) && 
+         likely(vcpu_runnable(vc)) ) 
+    {
+        __runq_insert(ops, svc);
+        __repl_update(ops, NOW());
+        snext = __runq_pick(ops, prv->cpus);    /* pick snext from ALL cpus */
+        runq_tickle(ops, snext);
+    }
+out:
+    vcpu_schedule_unlock_irq(lock, vc);
+}
+
+/*
+ * set/get each vcpu info of each domain
+ */
+static int
+rt_dom_cntl(
+    const struct scheduler *ops, 
+    struct domain *d, 
+    struct xen_domctl_scheduler_op *op)
+{
+    struct rt_dom * const sdom = RT_DOM(d);
+    struct list_head *iter;
+    int vcpu_index = 0;
+    int rc = 0;
+
+    switch ( op->cmd )
+    {
+    case XEN_DOMCTL_SCHEDOP_getnumvcpus:
+        op->u.rt.nr_vcpus = 0;
+        list_for_each( iter, &sdom->vcpu ) 
+            vcpu_index++;
+        op->u.rt.nr_vcpus = vcpu_index;
+        break;
+    case XEN_DOMCTL_SCHEDOP_getinfo:
+        /* for debug use, whenever adjust Dom0 parameter, do global dump */
+        if ( d->domain_id == 0 ) 
+            rt_dump(ops);
+
+        vcpu_index = 0;
+        list_for_each( iter, &sdom->vcpu ) 
+        {
+            xen_domctl_sched_rt_params_t local_sched;
+            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem);
+
+            if( vcpu_index >= op->u.rt.nr_vcpus )
+                break;
+
+            local_sched.budget = svc->budget;
+            local_sched.period = svc->period;
+            local_sched.index = vcpu_index;
+            if( copy_to_guest_offset(op->u.rt.vcpu, vcpu_index, &local_sched, 1) )
+            {
+                rc = -EFAULT;
+                break;
+            }
+            vcpu_index++;
+        }
+        break;
+    case XEN_DOMCTL_SCHEDOP_putinfo:
+        list_for_each( iter, &sdom->vcpu ) 
+        {
+            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem);
+
+            /* adjust per VCPU parameter */
+            if ( op->u.rt.vcpu_index == svc->vcpu->vcpu_id ) 
+            { 
+                vcpu_index = op->u.rt.vcpu_index;
+
+                if ( vcpu_index < 0 ) 
+                    printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d\n",
+                            vcpu_index);
+                else
+                    printk("XEN_DOMCTL_SCHEDOP_putinfo: "
+                            "vcpu_index=%d, period=%"PRId64", budget=%"PRId64"\n",
+                            vcpu_index, op->u.rt.period, op->u.rt.budget);
+
+                svc->period = op->u.rt.period;
+                svc->budget = op->u.rt.budget;
+
+                break;
+            }
+        }
+        break;
+    }
+
+    return rc;
+}
+
+static struct rt_private _rt_priv;
+
+const struct scheduler sched_rt_def = {
+    .name           = "SMP RT DS Scheduler",
+    .opt_name       = "rt_ds",
+    .sched_id       = XEN_SCHEDULER_RT_DS,
+    .sched_data     = &_rt_priv,
+
+    .dump_cpu_state = rt_dump_pcpu,
+    .dump_settings  = rt_dump,
+    .init           = rt_init,
+    .deinit         = rt_deinit,
+    .alloc_pdata    = rt_alloc_pdata,
+    .free_pdata     = rt_free_pdata,
+    .alloc_domdata  = rt_alloc_domdata,
+    .free_domdata   = rt_free_domdata,
+    .init_domain    = rt_dom_init,
+    .destroy_domain = rt_dom_destroy,
+    .alloc_vdata    = rt_alloc_vdata,
+    .free_vdata     = rt_free_vdata,
+    .insert_vcpu    = rt_vcpu_insert,
+    .remove_vcpu    = rt_vcpu_remove,
+
+    .adjust         = rt_dom_cntl,
+
+    .pick_cpu       = rt_cpu_pick,
+    .do_schedule    = rt_schedule,
+    .sleep          = rt_vcpu_sleep,
+    .wake           = rt_vcpu_wake,
+    .context_saved  = rt_context_saved,
+};
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 55503e0..7d2c6d1 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
     &sched_credit_def,
     &sched_credit2_def,
     &sched_arinc653_def,
+    &sched_rt_def,
 };
 
 static struct scheduler __read_mostly ops;
@@ -1090,7 +1091,8 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 
     if ( (op->sched_id != DOM2OP(d)->sched_id) ||
          ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
-          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
+          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) &&
+          (op->cmd != XEN_DOMCTL_SCHEDOP_getnumvcpus)) )
         return -EINVAL;
 
     /* NB: the pluggable scheduler code needs to take care
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 5b11bbf..27d01c1 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
 typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
 
+/*
+ * This structure is used to pass to rt scheduler from a 
+ * privileged domain to Xen
+ */
+struct xen_domctl_sched_rt_params {
+    /* get vcpus' info */
+    uint64_t period; /* s_time_t type */
+    uint64_t budget;
+    uint16_t index;
+    uint16_t padding[3];
+};
+typedef struct xen_domctl_sched_rt_params xen_domctl_sched_rt_params_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_sched_rt_params_t);
 
 /* XEN_DOMCTL_scheduler_op */
 /* Scheduler types. */
@@ -346,9 +359,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
 #define XEN_SCHEDULER_CREDIT   5
 #define XEN_SCHEDULER_CREDIT2  6
 #define XEN_SCHEDULER_ARINC653 7
+#define XEN_SCHEDULER_RT_DS    8
+
 /* Set or get info? */
-#define XEN_DOMCTL_SCHEDOP_putinfo 0
-#define XEN_DOMCTL_SCHEDOP_getinfo 1
+#define XEN_DOMCTL_SCHEDOP_putinfo      0
+#define XEN_DOMCTL_SCHEDOP_getinfo      1
+#define XEN_DOMCTL_SCHEDOP_getnumvcpus   2
 struct xen_domctl_scheduler_op {
     uint32_t sched_id;  /* XEN_SCHEDULER_* */
     uint32_t cmd;       /* XEN_DOMCTL_SCHEDOP_* */
@@ -367,6 +383,16 @@ struct xen_domctl_scheduler_op {
         struct xen_domctl_sched_credit2 {
             uint16_t weight;
         } credit2;
+        struct xen_domctl_sched_rt{
+            /* get vcpus' params */
+            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;
+            uint16_t nr_vcpus;
+            /* set one vcpu's params */
+            uint16_t vcpu_index;
+            uint16_t padding[2];
+            uint64_t period;
+            uint64_t budget;
+        } rt;
     } u;
 };
 typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
index cfcf4aa..87340c4 100644
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -77,6 +77,7 @@
 #define TRC_SCHED_CSCHED2  1
 #define TRC_SCHED_SEDF     2
 #define TRC_SCHED_ARINC653 3
+#define TRC_SCHED_RT       4
 
 /* Per-scheduler tracing */
 #define TRC_SCHED_CLASS_EVT(_c, _e) \
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 4164dff..bcbe234 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -169,6 +169,7 @@ extern const struct scheduler sched_sedf_def;
 extern const struct scheduler sched_credit_def;
 extern const struct scheduler sched_credit2_def;
 extern const struct scheduler sched_arinc653_def;
+extern const struct scheduler sched_rt_def;
 
 
 struct cpupool
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v1 2/4] libxc: add rt scheduler
  2014-08-24 22:58 Introduce rt real-time scheduler for Xen Meng Xu
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
@ 2014-08-24 22:58 ` Meng Xu
  2014-09-05 10:34   ` Dario Faggioli
  2014-08-24 22:58 ` [PATCH v1 3/4] libxl: " Meng Xu
  2014-08-24 22:58 ` [PATCH v1 4/4] xl: introduce " Meng Xu
  3 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-08-24 22:58 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb

Add xc_sched_rt_* functions to interact with Xen to set/get domain's
parameters for rt scheduler.

Signed-off-by: Sisu Xi <xisisu@gmail.com>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
---
 tools/libxc/Makefile  |    1 +
 tools/libxc/xc_rt.c   |   90 +++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h |   12 +++++++
 3 files changed, 103 insertions(+)
 create mode 100644 tools/libxc/xc_rt.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 22eef8e..c2b02a4 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -20,6 +20,7 @@ CTRL_SRCS-y       += xc_sedf.c
 CTRL_SRCS-y       += xc_csched.c
 CTRL_SRCS-y       += xc_csched2.c
 CTRL_SRCS-y       += xc_arinc653.c
+CTRL_SRCS-y       += xc_rt.c
 CTRL_SRCS-y       += xc_tbuf.c
 CTRL_SRCS-y       += xc_pm.c
 CTRL_SRCS-y       += xc_cpu_hotplug.c
diff --git a/tools/libxc/xc_rt.c b/tools/libxc/xc_rt.c
new file mode 100644
index 0000000..e2ddda5
--- /dev/null
+++ b/tools/libxc/xc_rt.c
@@ -0,0 +1,90 @@
+/****************************************************************************
+ *
+ *        File: xc_rt.c
+ *      Author: Sisu Xi 
+ *              Meng Xu
+ *
+ * Description: XC Interface to the rt scheduler
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "xc_private.h"
+
+int xc_sched_rt_domain_set(xc_interface *xch,
+                           uint32_t domid,
+                           struct xen_domctl_sched_rt_params *sdom)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_scheduler_op;
+    domctl.domain = (domid_t) domid;
+    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT_DS;
+    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo;
+    domctl.u.scheduler_op.u.rt.vcpu_index = sdom->index;
+    domctl.u.scheduler_op.u.rt.period = sdom->period;
+    domctl.u.scheduler_op.u.rt.budget = sdom->budget;
+
+    rc = do_domctl(xch, &domctl);
+
+    return rc;
+}
+
+int xc_sched_rt_domain_get(xc_interface *xch,
+                           uint32_t domid,
+                           struct xen_domctl_sched_rt_params *sdom,
+                           uint16_t num_vcpus)
+{
+    int rc;
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BOUNCE(sdom, 
+        sizeof(*sdom) * num_vcpus, 
+        XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, sdom) )
+        return -1;
+
+    domctl.cmd = XEN_DOMCTL_scheduler_op;
+    domctl.domain = (domid_t) domid;
+    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT_DS;
+    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_getinfo;
+    domctl.u.scheduler_op.u.rt.nr_vcpus = num_vcpus;
+    set_xen_guest_handle(domctl.u.scheduler_op.u.rt.vcpu, sdom);
+
+    rc = do_domctl(xch, &domctl);
+
+    xc_hypercall_bounce_post(xch, sdom);
+
+    return rc;
+}
+
+int xc_sched_rt_domain_get_num_vcpus(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint16_t *num_vcpus)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_scheduler_op;
+    domctl.domain = (domid_t) domid;
+    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT_DS;
+    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_getnumvcpus;
+
+    rc = do_domctl(xch, &domctl);
+
+    *num_vcpus = domctl.u.scheduler_op.u.rt.nr_vcpus;
+    return rc;
+}
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1c5d0db..fd066cc 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -875,6 +875,18 @@ int xc_sched_credit2_domain_get(xc_interface *xch,
                                uint32_t domid,
                                struct xen_domctl_sched_credit2 *sdom);
 
+int xc_sched_rt_domain_set(xc_interface *xch,
+                          uint32_t domid,
+                          struct xen_domctl_sched_rt_params *sdom);
+int xc_sched_rt_domain_get(xc_interface *xch,
+                          uint32_t domid,
+                          struct xen_domctl_sched_rt_params *sdom,
+                          uint16_t num_vcpus);
+
+int xc_sched_rt_domain_get_num_vcpus(xc_interface *xch,
+                                    uint32_t domid,
+                                    uint16_t *num_vcpus);
+
 int
 xc_sched_arinc653_schedule_set(
     xc_interface *xch,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v1 3/4] libxl: add rt scheduler
  2014-08-24 22:58 Introduce rt real-time scheduler for Xen Meng Xu
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
  2014-08-24 22:58 ` [PATCH v1 2/4] libxc: add rt scheduler Meng Xu
@ 2014-08-24 22:58 ` Meng Xu
  2014-08-25 13:17   ` Wei Liu
                     ` (2 more replies)
  2014-08-24 22:58 ` [PATCH v1 4/4] xl: introduce " Meng Xu
  3 siblings, 3 replies; 72+ messages in thread
From: Meng Xu @ 2014-08-24 22:58 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb

Add libxl functions to set/get domain's parameters for rt scheduler

Signed-off-by: Sisu Xi <xisisu@gmail.com>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
---
 tools/libxl/libxl.c         |  139 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h         |    7 +++
 tools/libxl/libxl_types.idl |   15 +++++
 3 files changed, 161 insertions(+)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3526539..440e8df31 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
     return 0;
 }
 
+static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
+                               libxl_domain_sched_params *scinfo)
+{
+    struct xen_domctl_sched_rt_params* sdom;
+    uint16_t num_vcpus;
+    int rc, i;
+
+    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
+    if (rc != 0) {
+        LOGE(ERROR, "getting num_vcpus of domain sched rt");
+        return ERROR_FAIL;
+    }
+    
+    /* FIXME: can malloc be used in libxl? seems it was used in the file */
+    sdom = (struct xen_domctl_sched_rt_params *)
+            malloc( sizeof(struct xen_domctl_sched_rt_params) * num_vcpus );
+    if ( !sdom ){
+        LOGE(ERROR, "Allocate sdom array fails\n");
+        return ERROR_INVAL;
+    }
+
+    rc = xc_sched_rt_domain_get(CTX->xch, domid, sdom, num_vcpus);
+    if (rc != 0) {
+        LOGE(ERROR, "getting domain sched rt");
+        return ERROR_FAIL;
+    }
+
+    /* FIXME: how to guarantee libxl_*_dispose be called exactly once? */
+    libxl_domain_sched_params_init(scinfo);
+    
+    scinfo->rt.num_vcpus = num_vcpus;
+    scinfo->sched = LIBXL_SCHEDULER_RT;
+    /* FIXME: can malloc be used in libxl? seems it was used in the file */
+    scinfo->rt.vcpus = (libxl_vcpu *)
+                       malloc( sizeof(libxl_vcpu) * scinfo->rt.num_vcpus );
+    if ( !scinfo->rt.vcpus ){
+        LOGE(ERROR, "Allocate lib_vcpu array fails\n");
+        return ERROR_INVAL;
+    }
+    for( i = 0; i < num_vcpus; ++i)
+    {
+        scinfo->rt.vcpus[i].period = sdom[i].period;
+        scinfo->rt.vcpus[i].budget = sdom[i].budget;
+        scinfo->rt.vcpus[i].index = sdom[i].index;
+    }
+    
+    free(sdom);
+    return 0;
+}
+
+#define SCHED_RT_VCPU_PERIOD_MAX    31536000000000 /* one year in microsecond*/
+#define SCHED_RT_VCPU_BUDGET_MAX    SCHED_RT_VCPU_PERIOD_MAX
+
+/*
+ * Sanity check of the scinfo parameters
+ * return 0 if all values are valid
+ * return 1 if one param is default value
+ * return 2 if the target vcpu's index, period or budget is out of range
+ */
+static int sched_rt_domain_set_validate_params(libxl__gc *gc,
+                                               const libxl_domain_sched_params *scinfo,
+                                               const uint16_t num_vcpus)
+{
+    int vcpu_index = scinfo->rt.vcpu_index;
+
+    if (vcpu_index == LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT ||
+        scinfo->rt.period == LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT ||
+        scinfo->rt.budget == LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
+    {
+        return 1;
+    }
+
+    if (vcpu_index < 0 || vcpu_index > num_vcpus)
+    {
+        LOG(ERROR, "VCPU index is not set or out of range, "
+                    "valid values are within range from 0 to %d", num_vcpus);
+        return 2;
+    }
+
+    if (scinfo->rt.period < 1 ||
+        scinfo->rt.period > SCHED_RT_VCPU_PERIOD_MAX)
+    {
+        LOG(ERROR, "VCPU period is not set or out of range, "
+                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PERIOD_MAX);
+        return 2;
+    }
+
+    if (scinfo->rt.budget < 1 ||
+        scinfo->rt.budget > SCHED_RT_VCPU_BUDGET_MAX)
+    {
+        LOG(ERROR, "VCPU budget is not set or out of range, "
+                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_BUDGET_MAX);
+        return 2;
+    }
+
+    return 0;
+
+}
+
+static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid,
+                               const libxl_domain_sched_params *scinfo)
+{
+    struct xen_domctl_sched_rt_params sdom;
+    uint16_t num_vcpus;
+    int rc;
+ 
+    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
+    if (rc != 0) {
+        LOGE(ERROR, "getting domain sched rt");
+        return ERROR_FAIL;
+    }
+    
+    rc = sched_rt_domain_set_validate_params(gc, scinfo, num_vcpus);
+    if (rc == 2)
+        return ERROR_INVAL;
+    if (rc == 1)
+        return 0;
+    if (rc == 0)
+    {
+        sdom.index = scinfo->rt.vcpu_index;
+        sdom.period = scinfo->rt.period;
+        sdom.budget = scinfo->rt.budget;
+    }
+
+    rc = xc_sched_rt_domain_set(CTX->xch, domid, &sdom);
+    if ( rc < 0 ) {
+        LOGE(ERROR, "setting domain sched rt");
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
                                   const libxl_domain_sched_params *scinfo)
 {
@@ -5177,6 +5310,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_ARINC653:
         ret=sched_arinc653_domain_set(gc, domid, scinfo);
         break;
+    case LIBXL_SCHEDULER_RT:
+        ret=sched_rt_domain_set(gc, domid, scinfo);
+        break;
     default:
         LOG(ERROR, "Unknown scheduler");
         ret=ERROR_INVAL;
@@ -5207,6 +5343,9 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_CREDIT2:
         ret=sched_credit2_domain_get(gc, domid, scinfo);
         break;
+    case LIBXL_SCHEDULER_RT:
+        ret=sched_rt_domain_get(gc, domid, scinfo);
+        break;
     default:
         LOG(ERROR, "Unknown scheduler");
         ret=ERROR_INVAL;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index bfeb3bc..4657056 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1240,6 +1240,13 @@ int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid,
 #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT   -1
 #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1
 
+#define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT     -1
+#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT       -1
+#define LIBXL_DOMAIN_SCHED_PARAM_NUM_VCPUS_DEFAULT  -1
+#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT -1
+/* Consistent with XEN_LEGACY_MAX_VCPUS xen/arch-x86/xen.h*/
+#define LIBXL_XEN_LEGACY_MAX_VCPUS                  32 
+
 int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
                                   libxl_domain_sched_params *params);
 int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0b3496f..c33a776 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -153,6 +153,7 @@ libxl_scheduler = Enumeration("scheduler", [
     (5, "credit"),
     (6, "credit2"),
     (7, "arinc653"),
+    (8, "rt"),
     ])
 
 # Consistent with SHUTDOWN_* in sched.h (apart from UNKNOWN)
@@ -303,6 +304,19 @@ libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
     ])
 
+libxl_rt_vcpu = Struct("vcpu",[
+    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
+    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
+    ("index",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
+    ])
+
+libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[
+    ("vcpus",        Array(libxl_rt_vcpu, "num_vcpus")),
+    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
+    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
+    ("vcpu_index",   integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
+    ])
+
 libxl_domain_sched_params = Struct("domain_sched_params",[
     ("sched",        libxl_scheduler),
     ("weight",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
@@ -311,6 +325,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
     ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
     ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
     ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
+    ("rt",           libxl_domain_sched_rt_params),
     ])
 
 libxl_domain_build_info = Struct("domain_build_info",[
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v1 4/4] xl: introduce rt scheduler
  2014-08-24 22:58 Introduce rt real-time scheduler for Xen Meng Xu
                   ` (2 preceding siblings ...)
  2014-08-24 22:58 ` [PATCH v1 3/4] libxl: " Meng Xu
@ 2014-08-24 22:58 ` Meng Xu
  2014-08-25 13:31   ` Wei Liu
  2014-09-03 15:52   ` George Dunlap
  3 siblings, 2 replies; 72+ messages in thread
From: Meng Xu @ 2014-08-24 22:58 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xumengpanda, Meng Xu, JBeulich,
	chaowang, lichong659, dgolomb

Add xl command for rt scheduler

Signed-off-by: Sisu Xi <xisisu@gmail.com>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
---
 docs/man/xl.pod.1         |   40 ++++++++++++++
 tools/libxl/xl.h          |    1 +
 tools/libxl/xl_cmdimpl.c  |  131 +++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c |    9 ++++
 4 files changed, 181 insertions(+)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 9d1c2a5..bd26447 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1035,6 +1035,46 @@ Restrict output to domains in the specified cpupool.
 
 =back
 
+=item B<sched-rt> [I<OPTIONS>]
+
+Set or get rt (Real Time) scheduler parameters. This rt scheduler applies 
+Preemptive Global Earliest Deadline First real-time scheduling algorithm to 
+schedule VCPUs in the system. Each VCPU has a dedicated period and budget.
+While scheduled, a VCPU burns its budget. 
+A VCPU has its budget replenished at the beginning of each of its periods;
+The VCPU discards its unused budget at the end of its periods.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-d DOMAIN>, B<--domain=DOMAIN>
+
+Specify domain for which scheduler parameters are to be modified or retrieved.
+Mandatory for modifying scheduler parameters.
+
+=item B<-v VCPU>, B<--vcpu=VCPU>
+
+Specify the index of VCPU whose parameters will be set. 
+A domain can have multiple VCPUs; Each VCPU has a unique index in this domain;
+When set domain's parameters, it needs to set each VCPU's parameters of this 
+domain.
+
+=item B<-p PERIOD>, B<--period=PERIOD>
+
+A VCPU replenish its budget in every period. Time unit is millisecond.
+
+=item B<-b BUDGET>, B<--budget=BUDGET>
+
+A VCPU has BUDGET amount of time to run for each period. 
+Time unit is millisecond.
+
+=item B<-c CPUPOOL>, B<--cpupool=CPUPOOL>
+
+Restrict output to domains in the specified cpupool.
+
+=back
+
 =back
 
 =head1 CPUPOOLS COMMANDS
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..51b634a 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -67,6 +67,7 @@ int main_memset(int argc, char **argv);
 int main_sched_credit(int argc, char **argv);
 int main_sched_credit2(int argc, char **argv);
 int main_sched_sedf(int argc, char **argv);
+int main_sched_rt(int argc, char **argv);
 int main_domid(int argc, char **argv);
 int main_domname(int argc, char **argv);
 int main_rename(int argc, char **argv);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index f1c136a..22f7f9a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5175,6 +5175,52 @@ static int sched_sedf_domain_output(
     return 0;
 }
 
+
+static int sched_rt_domain_output(
+    int domid)
+{
+    char *domname;
+    libxl_domain_sched_params scinfo;
+    int rc = 0, i;
+
+    if (domid < 0) {
+        printf("%-33s %4s %4s %9s %9s\n", "Name", "ID", "VCPU", "Period", "Budget");
+        return 0;
+    }
+
+    libxl_domain_sched_params_init(&scinfo);
+    rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo);
+    if (rc)
+        goto out;
+
+    domname = libxl_domid_to_name(ctx, domid);
+    for( i = 0; i < scinfo.rt.num_vcpus; i++ )
+    {
+        printf("%-33s %4d %4d %9"PRIu64" %9"PRIu64"\n",
+            domname,
+            domid,
+            scinfo.rt.vcpus[i].index,
+            scinfo.rt.vcpus[i].period,
+            scinfo.rt.vcpus[i].budget);
+    }
+    free(domname);
+
+out:
+    libxl_domain_sched_params_dispose(&scinfo);
+    return rc;
+}
+
+static int sched_rt_pool_output(uint32_t poolid)
+{
+    char *poolname;
+
+    poolname = libxl_cpupoolid_to_name(ctx, poolid);
+    printf("Cpupool %s: sched=EDF\n", poolname);
+
+    free(poolname);
+    return 0;
+}
+
 static int sched_default_pool_output(uint32_t poolid)
 {
     char *poolname;
@@ -5542,6 +5588,91 @@ int main_sched_sedf(int argc, char **argv)
     return 0;
 }
 
+/*
+ * <nothing>            : List all domain paramters and sched params
+ * -d [domid]           : List domain params for domain
+ * -d [domid] [params]  : Set domain params for domain 
+ */
+int main_sched_rt(int argc, char **argv)
+{
+    const char *dom = NULL;
+    const char *cpupool = NULL;
+    int period = 10, opt_p = 0;
+    int budget = 4, opt_b = 0;
+    int vcpu_index = 0, opt_v = 0;
+    int opt, rc;
+    static struct option opts[] = {
+        {"domain", 1, 0, 'd'},
+        {"period", 1, 0, 'p'},
+        {"budget", 1, 0, 'b'},
+        {"vcpu", 1, 0, 'v'},
+        {"cpupool", 1, 0, 'c'},
+        COMMON_LONG_OPTS,
+        {0, 0, 0, 0}
+    };
+
+    SWITCH_FOREACH_OPT(opt, "d:p:b:v:c:h", opts, "sched-rt", 0) {
+    case 'd':
+        dom = optarg;
+        break;
+    case 'p':
+        period = strtol(optarg, NULL, 10);
+        opt_p = 1;
+        break;
+    case 'b':
+        budget = strtol(optarg, NULL, 10);
+        opt_b = 1;
+        break;
+    case 'v':
+        vcpu_index = strtol(optarg, NULL, 10);
+        opt_v = 1;
+        break;
+    case 'c':
+        cpupool = optarg;
+        break;
+    }
+
+    if (cpupool && (dom || opt_p || opt_b || opt_v)) {
+        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");
+        return 1;
+    }
+    if (!dom && (opt_p || opt_b || opt_v)) {
+        fprintf(stderr, "Must specify a domain.\n");
+        return 1;
+    }
+    if ( (opt_v || opt_p || opt_b) && (opt_p + opt_b + opt_v != 3) ) {
+        fprintf(stderr, "Must specify vcpu, period, budget\n");
+        return 1;
+    }
+    
+    if (!dom) { /* list all domain's rt scheduler info */
+        return -sched_domain_output(LIBXL_SCHEDULER_RT,
+                                    sched_rt_domain_output,
+                                    sched_rt_pool_output,
+                                    cpupool);
+    } else {
+        uint32_t domid = find_domain(dom);
+        if (!opt_p && !opt_b && !opt_v) { /* output rt scheduler info */
+            sched_rt_domain_output(-1);
+            return -sched_rt_domain_output(domid);
+        } else { /* set rt scheduler paramaters */
+            libxl_domain_sched_params scinfo;
+            libxl_domain_sched_params_init(&scinfo);
+            scinfo.sched = LIBXL_SCHEDULER_RT;
+            scinfo.rt.vcpu_index = vcpu_index;
+            scinfo.rt.period = period;
+            scinfo.rt.budget = budget;
+
+            rc = sched_domain_set(domid, &scinfo);
+            libxl_domain_sched_params_dispose(&scinfo);
+            if (rc)
+                return -rc;
+        }
+    }
+
+    return 0;
+}
+
 int main_domid(int argc, char **argv)
 {
     uint32_t domid;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7b7fa92..de4e954 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -277,6 +277,15 @@ struct cmd_spec cmd_table[] = {
       "                               --period/--slice)\n"
       "-c CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
     },
+    { "sched-rt",
+      &main_sched_rt, 0, 1,
+      "Get/set rt scheduler parameters",
+      "[-d <Domain> [-v[=VCPU]] [-p[=PERIOD]] [-b[=BUDGET]]]",
+      "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
+      "-v VCPU,   --vcpu=VCPU         VCPU\n"
+      "-p PERIOD, --period=PERIOD     Period (us)\n"
+      "-b BUDGET, --budget=BUDGET     Budget (us)\n"
+    },
     { "domid",
       &main_domid, 0, 0,
       "Convert a domain name to domain id",
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-08-24 22:58 ` [PATCH v1 3/4] libxl: " Meng Xu
@ 2014-08-25 13:17   ` Wei Liu
  2014-08-25 15:55     ` Meng Xu
  2014-09-03 15:33   ` George Dunlap
  2014-09-05 10:21   ` Dario Faggioli
  2 siblings, 1 reply; 72+ messages in thread
From: Wei Liu @ 2014-08-25 13:17 UTC (permalink / raw)
  To: Meng Xu
  Cc: wei.liu2, ian.campbell, xisisu, stefano.stabellini,
	george.dunlap, dario.faggioli, ian.jackson, xen-devel,
	xumengpanda, JBeulich, chaowang, lichong659, dgolomb

On Sun, Aug 24, 2014 at 06:58:44PM -0400, Meng Xu wrote:
> Add libxl functions to set/get domain's parameters for rt scheduler
> 
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> ---
>  tools/libxl/libxl.c         |  139 +++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl.h         |    7 +++
>  tools/libxl/libxl_types.idl |   15 +++++
>  3 files changed, 161 insertions(+)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 3526539..440e8df31 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
>      return 0;
>  }
>  
> +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
> +                               libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rt_params* sdom;
> +    uint16_t num_vcpus;
> +    int rc, i;
> +
> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting num_vcpus of domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +    
> +    /* FIXME: can malloc be used in libxl? seems it was used in the file */
> +    sdom = (struct xen_domctl_sched_rt_params *)
> +            malloc( sizeof(struct xen_domctl_sched_rt_params) * num_vcpus );

It's better to use libxl__malloc here. With that change you can also
omit the test of sdom.

> +    if ( !sdom ){

No need to add spaces inside brackets, on the other hand you need
one space before {.

I can see this issues appears repeatly in this patch. Please check and
fix them.

> +        LOGE(ERROR, "Allocate sdom array fails\n");
> +        return ERROR_INVAL;
> +    }
> +
> +    rc = xc_sched_rt_domain_get(CTX->xch, domid, sdom, num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +
> +    /* FIXME: how to guarantee libxl_*_dispose be called exactly once? */
> +    libxl_domain_sched_params_init(scinfo);
> +    

libxl_*_dispose should be idempotent (as least we intent to make it so).
And it's caller's responsibility to ensure it's called once if it wishes
to.

> +    scinfo->rt.num_vcpus = num_vcpus;
> +    scinfo->sched = LIBXL_SCHEDULER_RT;
> +    /* FIXME: can malloc be used in libxl? seems it was used in the file */
> +    scinfo->rt.vcpus = (libxl_vcpu *)
> +                       malloc( sizeof(libxl_vcpu) * scinfo->rt.num_vcpus );

libxl__malloc.

If you don't want this allocation to be automatically freed, use NOGC
instead of gc.

> +    if ( !scinfo->rt.vcpus ){
> +        LOGE(ERROR, "Allocate lib_vcpu array fails\n");
> +        return ERROR_INVAL;
> +    }
> +    for( i = 0; i < num_vcpus; ++i)
> +    {
> +        scinfo->rt.vcpus[i].period = sdom[i].period;
> +        scinfo->rt.vcpus[i].budget = sdom[i].budget;
> +        scinfo->rt.vcpus[i].index = sdom[i].index;
> +    }
> +    
> +    free(sdom);

Remove this if you use libxl__malloc.

> +    return 0;
> +}
> +
> +#define SCHED_RT_VCPU_PERIOD_MAX    31536000000000 /* one year in microsecond*/
> +#define SCHED_RT_VCPU_BUDGET_MAX    SCHED_RT_VCPU_PERIOD_MAX
> +
> +/*
> + * Sanity check of the scinfo parameters
> + * return 0 if all values are valid
> + * return 1 if one param is default value
> + * return 2 if the target vcpu's index, period or budget is out of range
> + */
> +static int sched_rt_domain_set_validate_params(libxl__gc *gc,
> +                                               const libxl_domain_sched_params *scinfo,
> +                                               const uint16_t num_vcpus)
> +{
> +    int vcpu_index = scinfo->rt.vcpu_index;
> +
> +    if (vcpu_index == LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT ||
> +        scinfo->rt.period == LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT ||
> +        scinfo->rt.budget == LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
> +    {
> +        return 1;
> +    }
> +
> +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
> +    {

Libxl coding style normally has { in the same line of "if" "for" etc. Please
check and fix other occurences.

> +        LOG(ERROR, "VCPU index is not set or out of range, "
> +                    "valid values are within range from 0 to %d", num_vcpus);
> +        return 2;
> +    }
> +
> +    if (scinfo->rt.period < 1 ||
> +        scinfo->rt.period > SCHED_RT_VCPU_PERIOD_MAX)
> +    {
> +        LOG(ERROR, "VCPU period is not set or out of range, "
> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PERIOD_MAX);

Line too long. Move SCHED_RT_VCPU_PERIOD_MAX to a new line.

> +        return 2;
> +    }
> +
> +    if (scinfo->rt.budget < 1 ||
> +        scinfo->rt.budget > SCHED_RT_VCPU_BUDGET_MAX)
> +    {
> +        LOG(ERROR, "VCPU budget is not set or out of range, "
> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_BUDGET_MAX);

Ditto.

> +        return 2;
> +    }
> +
> +    return 0;
> +
> +}
> +
> +static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid,
> +                               const libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rt_params sdom;
> +    uint16_t num_vcpus;
> +    int rc;
> + 
> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +    
> +    rc = sched_rt_domain_set_validate_params(gc, scinfo, num_vcpus);
> +    if (rc == 2)
> +        return ERROR_INVAL;
> +    if (rc == 1)
> +        return 0;
> +    if (rc == 0)
> +    {
> +        sdom.index = scinfo->rt.vcpu_index;
> +        sdom.period = scinfo->rt.period;
> +        sdom.budget = scinfo->rt.budget;
> +    }
> +

No need to test for 0 here as there can't be other value at this point.

> +    rc = xc_sched_rt_domain_set(CTX->xch, domid, &sdom);
> +    if ( rc < 0 ) {
> +        LOGE(ERROR, "setting domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +
> +    return 0;

The code structure can be rearranged to use "goto out" style so that
there's only one "return rc" at the end.

> +}
> +
>  int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
>                                    const libxl_domain_sched_params *scinfo)
>  {
> @@ -5177,6 +5310,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
>      case LIBXL_SCHEDULER_ARINC653:
>          ret=sched_arinc653_domain_set(gc, domid, scinfo);
>          break;
> +    case LIBXL_SCHEDULER_RT:
> +        ret=sched_rt_domain_set(gc, domid, scinfo);
> +        break;
>      default:
>          LOG(ERROR, "Unknown scheduler");
>          ret=ERROR_INVAL;
> @@ -5207,6 +5343,9 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
>      case LIBXL_SCHEDULER_CREDIT2:
>          ret=sched_credit2_domain_get(gc, domid, scinfo);
>          break;
> +    case LIBXL_SCHEDULER_RT:
> +        ret=sched_rt_domain_get(gc, domid, scinfo);
> +        break;
>      default:
>          LOG(ERROR, "Unknown scheduler");
>          ret=ERROR_INVAL;
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index bfeb3bc..4657056 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1240,6 +1240,13 @@ int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid,
>  #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT   -1
>  #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1
>  
> +#define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT     -1
> +#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT       -1
> +#define LIBXL_DOMAIN_SCHED_PARAM_NUM_VCPUS_DEFAULT  -1
> +#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT -1
> +/* Consistent with XEN_LEGACY_MAX_VCPUS xen/arch-x86/xen.h*/
> +#define LIBXL_XEN_LEGACY_MAX_VCPUS                  32 
> +

Where is this macro used? I cannot find it in this one and following xl
patch.

Wei.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 4/4] xl: introduce rt scheduler
  2014-08-24 22:58 ` [PATCH v1 4/4] xl: introduce " Meng Xu
@ 2014-08-25 13:31   ` Wei Liu
  2014-08-25 16:12     ` Meng Xu
  2014-09-03 15:52   ` George Dunlap
  1 sibling, 1 reply; 72+ messages in thread
From: Wei Liu @ 2014-08-25 13:31 UTC (permalink / raw)
  To: Meng Xu
  Cc: wei.liu2, ian.campbell, xisisu, stefano.stabellini,
	george.dunlap, dario.faggioli, ian.jackson, xen-devel,
	xumengpanda, JBeulich, chaowang, lichong659, dgolomb

I know most of the code is copied from existing code, so I only
commented on some nits I found.

On Sun, Aug 24, 2014 at 06:58:45PM -0400, Meng Xu wrote:
[...]
>  =head1 CPUPOOLS COMMANDS
> diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
> index 10a2e66..51b634a 100644
> --- a/tools/libxl/xl.h
> +++ b/tools/libxl/xl.h
> @@ -67,6 +67,7 @@ int main_memset(int argc, char **argv);
>  int main_sched_credit(int argc, char **argv);
>  int main_sched_credit2(int argc, char **argv);
>  int main_sched_sedf(int argc, char **argv);
> +int main_sched_rt(int argc, char **argv);
>  int main_domid(int argc, char **argv);
>  int main_domname(int argc, char **argv);
>  int main_rename(int argc, char **argv);
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index f1c136a..22f7f9a 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -5175,6 +5175,52 @@ static int sched_sedf_domain_output(
>      return 0;
>  }
>  
> +
> +static int sched_rt_domain_output(
> +    int domid)

Join this line to previous line please.

> +{
> +    char *domname;
> +    libxl_domain_sched_params scinfo;
> +    int rc = 0, i;
> +
> +    if (domid < 0) {
> +        printf("%-33s %4s %4s %9s %9s\n", "Name", "ID", "VCPU", "Period", "Budget");
> +        return 0;
> +    }
> +
> +    libxl_domain_sched_params_init(&scinfo);
> +    rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo);
> +    if (rc)
> +        goto out;
> +
> +    domname = libxl_domid_to_name(ctx, domid);
> +    for( i = 0; i < scinfo.rt.num_vcpus; i++ )
> +    {

Spaces and coding style.

Note that toolstack coding style is different from hypervisor coding
style.

> +        printf("%-33s %4d %4d %9"PRIu64" %9"PRIu64"\n",
> +            domname,
> +            domid,
> +            scinfo.rt.vcpus[i].index,
> +            scinfo.rt.vcpus[i].period,
> +            scinfo.rt.vcpus[i].budget);
> +    }
> +    free(domname);
> +
> +out:
> +    libxl_domain_sched_params_dispose(&scinfo);
> +    return rc;
> +}
> +
[...]
> +
> +    if (cpupool && (dom || opt_p || opt_b || opt_v)) {
> +        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");

Line too long.

Wei.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-08-25 13:17   ` Wei Liu
@ 2014-08-25 15:55     ` Meng Xu
  2014-08-26  9:51       ` Wei Liu
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-08-25 15:55 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Dario Faggioli, Ian Jackson, xen-devel, Meng Xu, Jan Beulich,
	Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2500 bytes --]

Hi Wei,

Thank you very much for your quick review! I really appreciate it. :-)

In summary, I correct those code you pointed out.


>
> > +        LOGE(ERROR, "Allocate sdom array fails\n");
> > +        return ERROR_INVAL;
> > +    }
> > +
> > +    rc = xc_sched_rt_domain_get(CTX->xch, domid, sdom, num_vcpus);
> > +    if (rc != 0) {
> > +        LOGE(ERROR, "getting domain sched rt");
> > +        return ERROR_FAIL;
> > +    }
> > +
> > +    /* FIXME: how to guarantee libxl_*_dispose be called exactly once?
> */
> > +    libxl_domain_sched_params_init(scinfo);
> > +
>
> libxl_*_dispose should be idempotent (as least we intent to make it so).
> And it's caller's responsibility to ensure it's called once if it wishes
> to.
>

​I see. Right now, it is disposed by the caller in xl.​


> > +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
> > +    {
>
> Libxl coding style normally has { in the same line of "if" "for" etc.
> Please
> check and fix other occurences.
>

​A quick question:
Should I use the style "if () { " for all files in the tool stack or just
in libxl files?

If I need to use this style in all tool stack files, I will check and
modify them. (Right now, I just made the correction for this exact patch.)


> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index bfeb3bc..4657056 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -1240,6 +1240,13 @@ int libxl_sched_credit_params_set(libxl_ctx *ctx,
> uint32_t poolid,
> >  #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT   -1
> >  #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1
> >
> > +#define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT     -1
> > +#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT       -1
> > +#define LIBXL_DOMAIN_SCHED_PARAM_NUM_VCPUS_DEFAULT  -1
> > +#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT -1
> > +/* Consistent with XEN_LEGACY_MAX_VCPUS xen/arch-x86/xen.h*/
> > +#define LIBXL_XEN_LEGACY_MAX_VCPUS                  32
> > +
>
> Where is this macro used? I cannot find it in this one and following xl
> patch.
>
>
​My bad. This should exist any more since we dynamically allocate vcpus
structure based on the number of vcpus in a domain to display vcpus'
information.
Delete it now.

Thank you again for your useful comments, Wei!

Best,

Meng



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 4673 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 4/4] xl: introduce rt scheduler
  2014-08-25 13:31   ` Wei Liu
@ 2014-08-25 16:12     ` Meng Xu
  0 siblings, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-08-25 16:12 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 954 bytes --]

Hi Wei,

Thank you very much for your comment! All of the points you mentioned in
this patch have been solved now.
​


> > +{
> > +    char *domname;
> > +    libxl_domain_sched_params scinfo;
> > +    int rc = 0, i;
> > +
> > +    if (domid < 0) {
> > +        printf("%-33s %4s %4s %9s %9s\n", "Name", "ID", "VCPU",
> "Period", "Budget");
> > +        return 0;
> > +    }
> > +
> > +    libxl_domain_sched_params_init(&scinfo);
> > +    rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo);
> > +    if (rc)
> > +        goto out;
> > +
> > +    domname = libxl_domid_to_name(ctx, domid);
> > +    for( i = 0; i < scinfo.rt.num_vcpus; i++ )
> > +    {
>
> Spaces and coding style.
>
> Note that toolstack coding style is different from hypervisor coding
> style.
>

​I see. :-) Thanks!​


​Meng​


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 1751 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-08-25 15:55     ` Meng Xu
@ 2014-08-26  9:51       ` Wei Liu
  0 siblings, 0 replies; 72+ messages in thread
From: Wei Liu @ 2014-08-26  9:51 UTC (permalink / raw)
  To: Meng Xu
  Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini,
	George Dunlap, Dario Faggioli, Ian Jackson, xen-devel, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Mon, Aug 25, 2014 at 11:55:26AM -0400, Meng Xu wrote:
[...]
> 
> ​I see. Right now, it is disposed by the caller in xl.​
> 
> 
> > > +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
> > > +    {
> >
> > Libxl coding style normally has { in the same line of "if" "for" etc.
> > Please
> > check and fix other occurences.
> >
> 
> ​A quick question:
> Should I use the style "if () { " for all files in the tool stack or just
> in libxl files?
> 
> If I need to use this style in all tool stack files, I will check and
> modify them. (Right now, I just made the correction for this exact patch.)
> 

There is a CODING_STYLE file in top level directory for hypervisor.
Libxl also has one in its own directory.

If there's no coding guideline I think you need to follow existing code.
In your case libxc doesn't have one, you just follow what's already
there to be consistent.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
@ 2014-08-26 14:27   ` Jan Beulich
  2014-08-27  2:07     ` Meng Xu
  2014-09-03 13:40   ` George Dunlap
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Jan Beulich @ 2014-08-26 14:27 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	dario.faggioli, ian.jackson, xen-devel, xumengpanda, chaowang,
	lichong659, dgolomb

>>> On 25.08.14 at 00:58, <mengxu@cis.upenn.edu> wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h

Just a couple of comments on the interface changes; I'll leave the
actual scheduler code to the scheduler specialists.

> @@ -367,6 +383,16 @@ struct xen_domctl_scheduler_op {
>          struct xen_domctl_sched_credit2 {
>              uint16_t weight;
>          } credit2;
> +        struct xen_domctl_sched_rt{

Missing blank before {.

> +            /* get vcpus' params */
> +            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;

Why does this need to be a handle? Do you permit setting these
to different values for different vCPU-s? Considering that other
schedulers don't do this, why does yours need to?

> +            uint16_t nr_vcpus;
> +            /* set one vcpu's params */
> +            uint16_t vcpu_index;
> +            uint16_t padding[2];
> +            uint64_t period;
> +            uint64_t budget;

Are values overflowing 32 bits here really useful/meaningful?

Jan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-26 14:27   ` Jan Beulich
@ 2014-08-27  2:07     ` Meng Xu
  2014-08-27  6:26       ` Jan Beulich
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-08-27  2:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2877 bytes --]

Hi Jan,


> > +            /* get vcpus' params */
> > +            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;
>
> Why does this need to be a handle? Do you permit setting these
> to different values for different vCPU-s? Considering that other
> schedulers don't do this, why does yours need to?
>

​Yes, we need a handler here to get each vcpu's parameters of a domain.

let me explain why we need to set and get the parameters of "each" vcpu:
1) A VCPU is the basic scheduling and accounting unit in the Global
Earliest Deadline First (gEDF) scheduling. We account the budget
consumption for each vcpu instead of each domain, while the credit or
credit2 scheduler account the credit consumption for each domain.
2) Based on the Global Earliest Deadline First (gEDF) scheduling theory,
each vcpu's parameter will be used to decide the scheduling sequence of
these vcpus.  Two vcpus with same utilization but different period and
 budget can be scheduled differently. For example, the vcpu with budget
10ms and period 20ms is less responsive than the vcpu with budget 2ms and
period 8ms, although they have the same utilization 0.5.

Therefore, a domain's real-time performance is based on the parameters of
each VCPU of this domain.
Hence, users need to be able to set and get each vcpu's parameters of a
domain.

This gEDF scheduler is different from the credit and credit2 schedulers.
The existing credit and credit2 scheduler account the credit for each
domain, instead of each vcpu, that's why they set parameter per domain
instead of per vcpu.

In my memory, we had such discussion on this question in the mailing list
after the first RFC patch of this rt scheduler was released. We agreed that
the real-time scheduler should supports setting and getting each vcpu's
parameters. :-)


>
> > +            uint16_t nr_vcpus;
> > +            /* set one vcpu's params */
> > +            uint16_t vcpu_index;
> > +            uint16_t padding[2];
> > +            uint64_t period;
> > +            uint64_t budget;
>
> Are values overflowing 32 bits here really useful/meaningful?
>

​W​
e allow the period and budget to be at most 31536000000000 (which is one
year in microsecond) in the libxl.c. 31536000000000 is larger than 2^32
=4294967296. So we have to use 64bit type here for period and budget.

​In addition, This is consistent with the period and budget type s_time_t
in the kernel space. In the kernel space (sched_rt.c), we represent the
period and budget in the type s_time_t, which is signed 64bit. So we use
the uint​64_t for period and budget here to avoid some type conversion.

Thank you very much for your time, comment and advice in this patch!

Best,

Meng



-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 4920 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-27  2:07     ` Meng Xu
@ 2014-08-27  6:26       ` Jan Beulich
  2014-08-27 14:28         ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: Jan Beulich @ 2014-08-27  6:26 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Chao Wang, Chong Li, Dagaen Golomb

>>> On 27.08.14 at 04:07, <xumengpanda@gmail.com> wrote:
>> > +            /* get vcpus' params */
>> > +            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;
>>
>> Why does this need to be a handle? Do you permit setting these
>> to different values for different vCPU-s? Considering that other
>> schedulers don't do this, why does yours need to?
>>
> 
> ​Yes, we need a handler here to get each vcpu's parameters of a domain.
> 
> let me explain why we need to set and get the parameters of "each" vcpu:
> 1) A VCPU is the basic scheduling and accounting unit in the Global
> Earliest Deadline First (gEDF) scheduling. We account the budget
> consumption for each vcpu instead of each domain, while the credit or
> credit2 scheduler account the credit consumption for each domain.
> 2) Based on the Global Earliest Deadline First (gEDF) scheduling theory,
> each vcpu's parameter will be used to decide the scheduling sequence of
> these vcpus.  Two vcpus with same utilization but different period and
>  budget can be scheduled differently. For example, the vcpu with budget
> 10ms and period 20ms is less responsive than the vcpu with budget 2ms and
> period 8ms, although they have the same utilization 0.5.
> 
> Therefore, a domain's real-time performance is based on the parameters of
> each VCPU of this domain.
> Hence, users need to be able to set and get each vcpu's parameters of a
> domain.
> 
> This gEDF scheduler is different from the credit and credit2 schedulers.
> The existing credit and credit2 scheduler account the credit for each
> domain, instead of each vcpu, that's why they set parameter per domain
> instead of per vcpu.

Parameter setting and accounting aren't tied together, and both
credit schedulers account on a per-vCPU basis afaict. Hence this
doesn't really answer the question.

> In my memory, we had such discussion on this question in the mailing list
> after the first RFC patch of this rt scheduler was released. We agreed that
> the real-time scheduler should supports setting and getting each vcpu's
> parameters. :-)

If so, can you point me to the specific mails rather than have me go
dig for them?

>> > +            uint16_t nr_vcpus;
>> > +            /* set one vcpu's params */
>> > +            uint16_t vcpu_index;
>> > +            uint16_t padding[2];
>> > +            uint64_t period;
>> > +            uint64_t budget;
>>
>> Are values overflowing 32 bits here really useful/meaningful?
>>
> 
> ​W​
> e allow the period and budget to be at most 31536000000000 (which is one
> year in microsecond) in the libxl.c. 31536000000000 is larger than 2^32
> =4294967296. So we have to use 64bit type here for period and budget.
> 
> ​In addition, This is consistent with the period and budget type s_time_t
> in the kernel space. In the kernel space (sched_rt.c), we represent the
> period and budget in the type s_time_t, which is signed 64bit. So we use
> the uint​64_t for period and budget here to avoid some type conversion.

Neither of this answers the question: Is this really a useful value
range?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-27  6:26       ` Jan Beulich
@ 2014-08-27 14:28         ` Meng Xu
  2014-08-27 15:04           ` Jan Beulich
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-08-27 14:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 5964 bytes --]

Hi Jan,


2014-08-27 2:26 GMT-04:00 Jan Beulich <JBeulich@suse.com>:

> >>> On 27.08.14 at 04:07, <xumengpanda@gmail.com> wrote:
> >> > +            /* get vcpus' params */
> >> > +            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;
> >>
> >> Why does this need to be a handle? Do you permit setting these
> >> to different values for different vCPU-s? Considering that other
> >> schedulers don't do this, why does yours need to?
> >>
> >
> > Yes, we need a handler here to get each vcpu's parameters of a domain.
> >
> > let me explain why we need to set and get the parameters of "each" vcpu:
> > 1) A VCPU is the basic scheduling and accounting unit in the Global
> > Earliest Deadline First (gEDF) scheduling. We account the budget
> > consumption for each vcpu instead of each domain, while the credit or
> > credit2 scheduler account the credit consumption for each domain.
> > 2) Based on the Global Earliest Deadline First (gEDF) scheduling theory,
> > each vcpu's parameter will be used to decide the scheduling sequence of
> > these vcpus.  Two vcpus with same utilization but different period and
> >  budget can be scheduled differently. For example, the vcpu with budget
> > 10ms and period 20ms is less responsive than the vcpu with budget 2ms and
> > period 8ms, although they have the same utilization 0.5.
> >
> > Therefore, a domain's real-time performance is based on the parameters of
> > each VCPU of this domain.
> > Hence, users need to be able to set and get each vcpu's parameters of a
> > domain.
> >
> > This gEDF scheduler is different from the credit and credit2 schedulers.
> > The existing credit and credit2 scheduler account the credit for each
> > domain, instead of each vcpu, that's why they set parameter per domain
> > instead of per vcpu.
>
> Parameter setting and accounting aren't tied together, and both
> credit schedulers account on a per-vCPU basis afaict. Hence this
> doesn't really answer the question.
>

Let's me explain in  another short and clearer way. 

Because each vcpu's parameters can affect the scheduling sequence and thus
affect the real-time performance of a domain, users may want to know what
is the parameters of each vcpu of each domain so that they can have an
intuition of how the vcpus will be scheduled.  (Do you agree? :-))
Users may also need to set each VCPU's parameter of a domain to achieve
their desired real-time performance for this domain. After they set a
vcpu's parameter of a domain, they need to have a way to check the new
parameters of this vcpu of this domain. right?

Because of the above two scenarios, users need to know each vcpu's
parameters of a domain. So we need the handler to pass each vcpu's
parameters from kernel to userspace to show to users.

One thing to note is that: this handler is only used to get each vcpu's
parameters of a domain. We don't need this handler to set a vcpu's
parameter.



> > In my memory, we had such discussion on this question in the mailing list
> > after the first RFC patch of this rt scheduler was released. We agreed
> that
> > the real-time scheduler should supports setting and getting each vcpu's
> > parameters. :-)
>
> If so, can you point me to the specific mails rather than have me go
> dig for them?
>

Sure! My bad.

We had a long discussion of the design of this functionality of getting
each vcpu's parameters. It's here:
http://www.gossamer-threads.com/lists/xen/devel/339146

Another thread that discusses the interface for improved SEDF also
discusses the idea of getting/setting each vcpu's parameters for a
real-time scheduler.  This rt scheduler is supposed to replace the existing
SEDF scheduler.

Here is the link to this thread:
http://www.gossamer-threads.com/lists/xen/devel/339056

I extract the interesting part related to this question:
Quote from Dario
:


"I don't
think the renaming+SEDF deprecation should happen until proper SMP
support is implemented, and probably also not until support for per-VCPU
scheduling parameters (quite important for an advanced real-time
scheduling solution) is there."

"The problems SEDF has are:
1. it has really really really poor SMP support
2. it does not allow to specify scheduling parameters on a per-VCPU
basis, but only on a domain basis. This is fine for general purpose
schedulers, but can be quite important in real-time workloads "

Please let me know if you have further questions. Maybe Dario could also
give more insight on this, later. :-) 


> >> > +            uint16_t nr_vcpus;
> >> > +            /* set one vcpu's params */
> >> > +            uint16_t vcpu_index;
> >> > +            uint16_t padding[2];
> >> > +            uint64_t period;
> >> > +            uint64_t budget;
> >>
> >> Are values overflowing 32 bits here really useful/meaningful?
> >>
> >
> > W
> > e allow the period and budget to be at most 31536000000000 (which is one
> > year in microsecond) in the libxl.c. 31536000000000 is larger than 2^32
> > =4294967296. So we have to use 64bit type here for period and budget.
> >
> > In addition, This is consistent with the period and budget type s_time_t
> > in the kernel space. In the kernel space (sched_rt.c), we represent the
> > period and budget in the type s_time_t, which is signed 64bit. So we use
> > the uint64_t for period and budget here to avoid some type conversion.
>
> Neither of this answers the question: Is this really a useful value
> range?
>

I see the issue. Is 31536000000000 a good upper bound for period and
budget?
Actually, I'm not sure. This totally depends on users' requirement.
4294967296us = 1.19hour. I'm not sure 1.19hour should be long enough for
real-time applications?
If it's enough, I can definitely change the type from uint64 to uint32.
Do you have any suggestion of how we can get a proper upper bound for
period and budget?

Thank you very much!

Best,

Meng


-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 9709 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-27 14:28         ` Meng Xu
@ 2014-08-27 15:04           ` Jan Beulich
  2014-08-28 16:06             ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: Jan Beulich @ 2014-08-27 15:04 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Chao Wang, Chong Li, Dagaen Golomb

>>> On 27.08.14 at 16:28, <xumengpanda@gmail.com> wrote:
> Because each vcpu's parameters can affect the scheduling sequence and thus
> affect the real-time performance of a domain, users may want to know what
> is the parameters of each vcpu of each domain so that they can have an
> intuition of how the vcpus will be scheduled.  (Do you agree? :-))

I agree, but that isn't a specific property of this scheduler, just
one of it's specific intended use (RT).

> Users may also need to set each VCPU's parameter of a domain to achieve
> their desired real-time performance for this domain. After they set a
> vcpu's parameter of a domain, they need to have a way to check the new
> parameters of this vcpu of this domain. right?

Of course.

> We had a long discussion of the design of this functionality of getting
> each vcpu's parameters. It's here:
> http://www.gossamer-threads.com/lists/xen/devel/339146 

This thread doesn't discuss this at all - Dario takes it for given in
his first reply: "My view here is, since per-VCPU scheduling
parameters are important for this scheduler, ..."

> Another thread that discusses the interface for improved SEDF also
> discusses the idea of getting/setting each vcpu's parameters for a
> real-time scheduler.  This rt scheduler is supposed to replace the existing
> SEDF scheduler.
> 
> Here is the link to this thread:
> http://www.gossamer-threads.com/lists/xen/devel/339056 

Again this one more states than discusses the need.

> I extract the interesting part related to this question:
> Quote from Dario
> :
> 
> 
> "I don't
> think the renaming+SEDF deprecation should happen until proper SMP
> support is implemented, and probably also not until support for per-VCPU
> scheduling parameters (quite important for an advanced real-time
> scheduling solution) is there."

That doesn't directly relate to the question I raised, it's more like a
follow-up assuming that per-vCPU parameters are needed here,
but not in other schedulers.

So bottom line - While I realize that RT may desire more fine
grained control, I still don't see why such wouldn't be applicable
uniformly to all schedulers.

>> >> > +            uint16_t nr_vcpus;
>> >> > +            /* set one vcpu's params */
>> >> > +            uint16_t vcpu_index;
>> >> > +            uint16_t padding[2];
>> >> > +            uint64_t period;
>> >> > +            uint64_t budget;
>> >>
>> >> Are values overflowing 32 bits here really useful/meaningful?
>> >>
>> >
>> > W
>> > e allow the period and budget to be at most 31536000000000 (which is one
>> > year in microsecond) in the libxl.c. 31536000000000 is larger than 2^32
>> > =4294967296. So we have to use 64bit type here for period and budget.
>> >
>> > In addition, This is consistent with the period and budget type s_time_t
>> > in the kernel space. In the kernel space (sched_rt.c), we represent the
>> > period and budget in the type s_time_t, which is signed 64bit. So we use
>> > the uint64_t for period and budget here to avoid some type conversion.
>>
>> Neither of this answers the question: Is this really a useful value
>> range?
>>
> 
> I see the issue. Is 31536000000000 a good upper bound for period and
> budget?
> Actually, I'm not sure. This totally depends on users' requirement.
> 4294967296us = 1.19hour. I'm not sure 1.19hour should be long enough for
> real-time applications?
> If it's enough, I can definitely change the type from uint64 to uint32.
> Do you have any suggestion of how we can get a proper upper bound for
> period and budget?

I think anything going into the seconds, not to speak of minutes or
hours, range is already beyond boundaries of being reasonable /
useful.

Jan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-27 15:04           ` Jan Beulich
@ 2014-08-28 16:06             ` Meng Xu
  2014-08-29  9:05               ` Jan Beulich
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-08-28 16:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 5305 bytes --]

Hi Jan,

2014-08-27 11:04 GMT-04:00 Jan Beulich <JBeulich@suse.com>:

> >>> On 27.08.14 at 16:28, <xumengpanda@gmail.com> wrote:
> > Because each vcpu's parameters can affect the scheduling sequence and
> thus
> > affect the real-time performance of a domain, users may want to know what
> > is the parameters of each vcpu of each domain so that they can have an
> > intuition of how the vcpus will be scheduled.  (Do you agree? :-))
>
> I agree, but that isn't a specific property of this scheduler, just
> one of it's specific intended use (RT).
>
> > Users may also need to set each VCPU's parameter of a domain to achieve
> > their desired real-time performance for this domain. After they set a
> > vcpu's parameter of a domain, they need to have a way to check the new
> > parameters of this vcpu of this domain. right?
>
> Of course.
>
> > We had a long discussion of the design of this functionality of getting
> > each vcpu's parameters. It's here:
> > http://www.gossamer-threads.com/lists/xen/devel/339146
>
> This thread doesn't discuss this at all - Dario takes it for given in
> his first reply: "My view here is, since per-VCPU scheduling
> parameters are important for this scheduler, ..."
>
> > Another thread that discusses the interface for improved SEDF also
> > discusses the idea of getting/setting each vcpu's parameters for a
> > real-time scheduler.  This rt scheduler is supposed to replace the
> existing
> > SEDF scheduler.
> >
> > Here is the link to this thread:
> > http://www.gossamer-threads.com/lists/xen/devel/339056
>
> Again this one more states than discusses the need.
>
> > I extract the interesting part related to this question:
> > Quote from Dario
> > :
> >
> >
> > "I don't
> > think the renaming+SEDF deprecation should happen until proper SMP
> > support is implemented, and probably also not until support for per-VCPU
> > scheduling parameters (quite important for an advanced real-time
> > scheduling solution) is there."
>
> That doesn't directly relate to the question I raised, it's more like a
> follow-up assuming that per-vCPU parameters are needed here,
> but not in other schedulers.
>
> So bottom line - While I realize that RT may desire more fine
> grained control, I still don't see why such wouldn't be applicable
> uniformly to all schedulers.
>

​Yes, RT needs more fine grained control: it needs users to be able to
set/get each VCPU's parameters. (This states RT scheduler needs such
functionality of setting/getting each VCPU's parameters.)
​
​As to your concern "I still don't see why such wouldn't be
applicable uniformly to all schedulers."​, are you suggesting that the
credit and credit2 scheduler could also allow users to set/get each VCPU's
parameters? (I think that could be possible but this should be some design
decision made by credit and credit2 scheduler developer?)



>
> >> >> > +            uint16_t nr_vcpus;
> >> >> > +            /* set one vcpu's params */
> >> >> > +            uint16_t vcpu_index;
> >> >> > +            uint16_t padding[2];
> >> >> > +            uint64_t period;
> >> >> > +            uint64_t budget;
> >> >>
> >> >> Are values overflowing 32 bits here really useful/meaningful?
> >> >>
> >> >
> >> > W
> >> > e allow the period and budget to be at most 31536000000000 (which is
> one
> >> > year in microsecond) in the libxl.c. 31536000000000 is larger than
> 2^32
> >> > =4294967296. So we have to use 64bit type here for period and budget.
> >> >
> >> > In addition, This is consistent with the period and budget type
> s_time_t
> >> > in the kernel space. In the kernel space (sched_rt.c), we represent
> the
> >> > period and budget in the type s_time_t, which is signed 64bit. So we
> use
> >> > the uint64_t for period and budget here to avoid some type conversion.
> >>
> >> Neither of this answers the question: Is this really a useful value
> >> range?
> >>
> >
> > I see the issue. Is 31536000000000 a good upper bound for period and
> > budget?
> > Actually, I'm not sure. This totally depends on users' requirement.
> > 4294967296us = 1.19hour. I'm not sure 1.19hour should be long enough for
> > real-time applications?
> > If it's enough, I can definitely change the type from uint64 to uint32.
> > Do you have any suggestion of how we can get a proper upper bound for
> > period and budget?
>
> I think anything going into the seconds, not to speak of minutes or
> hours, range is already beyond boundaries of being reasonable /
> useful.
>

​Hmm, that's fair. If we want to limit the range to seconds, then uint32 is
enough. However, what if some user really want to set the period/budget to
hours/days? Then we couldn't support it if we use uint32 type for
period/budget. (This is just my subtle concern. I'm not arguing that we
have to use uint64 or uint32. As long as everyone agrees with a type, I can
just change it to the agreed type very quickly.) Do you have any suggestion
of how we can reach an agreement and get the range finalized? ​


​Thank you very much for your time and help in this matter!

Best,

Meng​



-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 7580 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-28 16:06             ` Meng Xu
@ 2014-08-29  9:05               ` Jan Beulich
  2014-08-29 19:35                 ` Meng Xu
  2014-09-03 14:08                 ` George Dunlap
  0 siblings, 2 replies; 72+ messages in thread
From: Jan Beulich @ 2014-08-29  9:05 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Chao Wang, Chong Li, Dagaen Golomb

>>> On 28.08.14 at 18:06, <xumengpanda@gmail.com> wrote:
> ​As to your concern "I still don't see why such wouldn't be
> applicable uniformly to all schedulers."​, are you suggesting that the
> credit and credit2 scheduler could also allow users to set/get each VCPU's
> parameters? (I think that could be possible but this should be some design
> decision made by credit and credit2 scheduler developer?)

Perhaps. But I think a more uniform interface to the individual
schedulers would help on the tools side too. But in the end I'm
just raising questions/concerns here, it's George who needs to
be okay with your approach.

>> I think anything going into the seconds, not to speak of minutes or
>> hours, range is already beyond boundaries of being reasonable /
>> useful.
> 
> ​Hmm, that's fair. If we want to limit the range to seconds, then uint32 is
> enough. However, what if some user really want to set the period/budget to
> hours/days? Then we couldn't support it if we use uint32 type for
> period/budget. (This is just my subtle concern. I'm not arguing that we
> have to use uint64 or uint32. As long as everyone agrees with a type, I can
> just change it to the agreed type very quickly.) Do you have any suggestion
> of how we can reach an agreement and get the range finalized? ​

Let's see what others, mainly George and Dario, think.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-29  9:05               ` Jan Beulich
@ 2014-08-29 19:35                 ` Meng Xu
  2014-09-03 14:08                 ` George Dunlap
  1 sibling, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-08-29 19:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Dario Faggioli, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 3064 bytes --]

Hi Jan,


2014-08-29 5:05 GMT-04:00 Jan Beulich <JBeulich@suse.com>:

> >>> On 28.08.14 at 18:06, <xumengpanda@gmail.com> wrote:
> > ​As to your concern "I still don't see why such wouldn't be
> > applicable uniformly to all schedulers."​, are you suggesting that the
> > credit and credit2 scheduler could also allow users to set/get each
> VCPU's
> > parameters? (I think that could be possible but this should be some
> design
> > decision made by credit and credit2 scheduler developer?)
>
> Perhaps. But I think a more uniform interface to the individual
> schedulers would help on the tools side too. But in the end I'm
> just raising questions/concerns here, it's George who needs to
> be okay with your approach.
>

​Thank you very much for your questions and suggestions!

BTW, the ARINC653 scheduler in Xen also specifies each VCPU's parameters.
It's just FYI. :-)

In xen/include/public/sysctl.h, it specifies each vcpu's parameter in
sched_entries array.

/*

 * This structure is used to pass a new ARINC653 schedule from a

 * privileged domain (ie dom0) to Xen.

 */

struct xen_sysctl_arinc653_schedule {

    /* major_frame holds the time for the new schedule's major frame

     * in nanoseconds. */

    uint64_aligned_t     major_frame;

    /* num_sched_entries holds how many of the entries in the

     * sched_entries[] array are valid. */

    uint8_t     num_sched_entries;

    /* The sched_entries array holds the actual schedule entries. */

    struct {

        /* dom_handle must match a domain's UUID */

        xen_domain_handle_t dom_handle;

        /* If a domain has multiple VCPUs, vcpu_id specifies which one

         * this schedule entry applies to. It should be set to 0 if

         * there is only one VCPU for the domain. */

        unsigned int vcpu_id;

        /* runtime specifies the amount of time that should be allocated

         * to this VCPU per major frame. It is specified in nanoseconds */

        uint64_aligned_t runtime;

    } sched_entries[ARINC653_MAX_DOMAINS_PER_SCHEDULE];

};


>
> >> I think anything going into the seconds, not to speak of minutes or
> >> hours, range is already beyond boundaries of being reasonable /
> >> useful.
> >
> > ​Hmm, that's fair. If we want to limit the range to seconds, then uint32
> is
> > enough. However, what if some user really want to set the period/budget
> to
> > hours/days? Then we couldn't support it if we use uint32 type for
> > period/budget. (This is just my subtle concern. I'm not arguing that we
> > have to use uint64 or uint32. As long as everyone agrees with a type, I
> can
> > just change it to the agreed type very quickly.) Do you have any
> suggestion
> > of how we can reach an agreement and get the range finalized? ​
>
> Let's see what others, mainly George and Dario, think.
>

​Sure! Thanks!

Meng​


-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 5086 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
  2014-08-26 14:27   ` Jan Beulich
@ 2014-09-03 13:40   ` George Dunlap
  2014-09-03 14:11     ` Meng Xu
  2014-09-05  9:46     ` Dario Faggioli
  2014-09-03 14:20   ` George Dunlap
  2014-09-05 17:17   ` Dario Faggioli
  3 siblings, 2 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-03 13:40 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, chaowang, Chong Li,
	dgolomb

On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> This scheduler follows the pre-emptive Global EDF theory in real-time field.
> Each VCPU can have a dedicated period and budget.
> While scheduled, a VCPU burns its budget.
> A VCPU has its budget replenished at the beginning of each of its periods;
> The VCPU discards its unused budget at the end of each of its periods.
> If a VCPU runs out of budget in a period, it has to wait until next period.
> The mechanism of how to burn a VCPU's budget depends on the server mechanism
> implemented for each VCPU.
>
> Server mechanism: a VCPU is implemented as a deferable server.
> When a VCPU is scheduled to execute on a PCPU, its budget is continuously
> burned.
>
> Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> At any scheduling point, the VCPU with earliest deadline has highest
> priority.
>
> Queue scheme: A global Runqueue for each CPU pool.
> The Runqueue holds all runnable VCPUs.
> VCPUs in the Runqueue are divided into two parts: with and without budget.
> At each part, VCPUs are sorted based on gEDF priority scheme.
>
> Scheduling quantum: 1 ms;
>
> Note: cpumask and cpupool is supported.
>
> This is still in the development phase.

You should probably take this out now that you've removed the RFC. :-)

I'm just doing a first pass, so just a few quick comments to begin with.

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 55503e0..7d2c6d1 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
>      &sched_credit_def,
>      &sched_credit2_def,
>      &sched_arinc653_def,
> +    &sched_rt_def,
>  };
>
>  static struct scheduler __read_mostly ops;
> @@ -1090,7 +1091,8 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
>
>      if ( (op->sched_id != DOM2OP(d)->sched_id) ||
>           ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
> -          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
> +          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) &&
> +          (op->cmd != XEN_DOMCTL_SCHEDOP_getnumvcpus)) )

Why are you introducing this as a schedop?  Isn't this information
already exposed in getdomaininfo?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-29  9:05               ` Jan Beulich
  2014-08-29 19:35                 ` Meng Xu
@ 2014-09-03 14:08                 ` George Dunlap
  2014-09-03 14:24                   ` Meng Xu
  1 sibling, 1 reply; 72+ messages in thread
From: George Dunlap @ 2014-09-03 14:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Meng Xu, Chao Wang, Chong Li, Dagaen Golomb

On Fri, Aug 29, 2014 at 10:05 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 28.08.14 at 18:06, <xumengpanda@gmail.com> wrote:
>> As to your concern "I still don't see why such wouldn't be
>> applicable uniformly to all schedulers.", are you suggesting that the
>> credit and credit2 scheduler could also allow users to set/get each VCPU's
>> parameters? (I think that could be possible but this should be some design
>> decision made by credit and credit2 scheduler developer?)
>
> Perhaps. But I think a more uniform interface to the individual
> schedulers would help on the tools side too. But in the end I'm
> just raising questions/concerns here, it's George who needs to
> be okay with your approach.

Well the domctls aren't part of the stable ABI, so we can always
refactor things if we want to.

One could imagine restructuring the hypercall so that you could
specify a list of vcpus at the top level; but given that none of the
other schedulers actually want that at the moment, it seems like a lot
of faff for no good reason.

>
>>> I think anything going into the seconds, not to speak of minutes or
>>> hours, range is already beyond boundaries of being reasonable /
>>> useful.
>>
>> Hmm, that's fair. If we want to limit the range to seconds, then uint32 is
>> enough. However, what if some user really want to set the period/budget to
>> hours/days? Then we couldn't support it if we use uint32 type for
>> period/budget. (This is just my subtle concern. I'm not arguing that we
>> have to use uint64 or uint32. As long as everyone agrees with a type, I can
>> just change it to the agreed type very quickly.) Do you have any suggestion
>> of how we can reach an agreement and get the range finalized?
>
> Let's see what others, mainly George and Dario, think.

Under what circumstances would anyone want a period of an hour?  At
this point we're talking about a vcpu being allowed to run
continuously without pre-emption for half an hour, and then not
*allowed* to run at all for another half hour.  That just seems really
ridiculous.

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 13:40   ` George Dunlap
@ 2014-09-03 14:11     ` Meng Xu
  2014-09-03 14:15       ` George Dunlap
  2014-09-05  9:46     ` Dario Faggioli
  1 sibling, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-03 14:11 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 3930 bytes --]

Hi George,


2014-09-03 9:40 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> > This scheduler follows the pre-emptive Global EDF theory in real-time
> field.
> > Each VCPU can have a dedicated period and budget.
> > While scheduled, a VCPU burns its budget.
> > A VCPU has its budget replenished at the beginning of each of its
> periods;
> > The VCPU discards its unused budget at the end of each of its periods.
> > If a VCPU runs out of budget in a period, it has to wait until next
> period.
> > The mechanism of how to burn a VCPU's budget depends on the server
> mechanism
> > implemented for each VCPU.
> >
> > Server mechanism: a VCPU is implemented as a deferable server.
> > When a VCPU is scheduled to execute on a PCPU, its budget is continuously
> > burned.
> >
> > Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> > At any scheduling point, the VCPU with earliest deadline has highest
> > priority.
> >
> > Queue scheme: A global Runqueue for each CPU pool.
> > The Runqueue holds all runnable VCPUs.
> > VCPUs in the Runqueue are divided into two parts: with and without
> budget.
> > At each part, VCPUs are sorted based on gEDF priority scheme.
> >
> > Scheduling quantum: 1 ms;
> >
> > Note: cpumask and cpupool is supported.
> >
> > This is still in the development phase.
>
> You should probably take this out now that you've removed the RFC. :-)
>
>
​ditched now. Thanks!​



> I'm just doing a first pass, so just a few quick comments to begin with.
>

​Thank you very much for your review! :-)​


> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 55503e0..7d2c6d1 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
> >      &sched_credit_def,
> >      &sched_credit2_def,
> >      &sched_arinc653_def,
> > +    &sched_rt_def,
> >  };
> >
> >  static struct scheduler __read_mostly ops;
> > @@ -1090,7 +1091,8 @@ long sched_adjust(struct domain *d, struct
> xen_domctl_scheduler_op *op)
> >
> >      if ( (op->sched_id != DOM2OP(d)->sched_id) ||
> >           ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
> > -          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
> > +          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) &&
> > +          (op->cmd != XEN_DOMCTL_SCHEDOP_getnumvcpus)) )
>
> Why are you introducing this as a schedop?  Isn't this information
> already exposed in getdomaininfo?


​I introduce ​XEN_DOMCTL_SCHEDOP_getnumvcpus as a schedop because we need
to know the number of vcpus a domain has when the tool stack wants to
display the parameters of EACH vcpu.

I think the operation you meant in getdomaininfo is XEN_DOMCTL_max_vcpus
(in file xen/common/domctl.c)? If so, I think this operation is setting the
max vcpus for a domain instead of getting the number of vcpus this domain
has. Therefore, I don't think I can reuse the operation
XEN_DOMCTL_max_vcpus in getdomaininfo.

The detailed reason of why I need to get the number of vcpus a domain has
is as follows:
When the tool stack (command xl sched-rt -d domain) displays the parameters
of EACH vcpu, the tool stack will allocate an array whose size is "sizeof(
struct xen_domctl_sched_rt_params) * num_vcpus_of_this_domain" and bounce
this array to the hypervisor. After hypervisor fills out the parameters of
each vcpu, this array will be bounced out to tool stack to display to users.

In order to know how large this array should be, we need to know the number
of vcpus this domain has.

Please let me know if you have any other concerns or questions. :-)

Thank you very much!

Best,

​Meng​


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 6342 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:11     ` Meng Xu
@ 2014-09-03 14:15       ` George Dunlap
  2014-09-03 14:35         ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: George Dunlap @ 2014-09-03 14:15 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Wed, Sep 3, 2014 at 3:11 PM, Meng Xu <xumengpanda@gmail.com> wrote:
>> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> > index 55503e0..7d2c6d1 100644
>> > --- a/xen/common/schedule.c
>> > +++ b/xen/common/schedule.c
>> > @@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
>> >      &sched_credit_def,
>> >      &sched_credit2_def,
>> >      &sched_arinc653_def,
>> > +    &sched_rt_def,
>> >  };
>> >
>> >  static struct scheduler __read_mostly ops;
>> > @@ -1090,7 +1091,8 @@ long sched_adjust(struct domain *d, struct
>> > xen_domctl_scheduler_op *op)
>> >
>> >      if ( (op->sched_id != DOM2OP(d)->sched_id) ||
>> >           ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
>> > -          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
>> > +          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) &&
>> > +          (op->cmd != XEN_DOMCTL_SCHEDOP_getnumvcpus)) )
>>
>> Why are you introducing this as a schedop?  Isn't this information
>> already exposed in getdomaininfo?
>
>
> I introduce XEN_DOMCTL_SCHEDOP_getnumvcpus as a schedop because we need to
> know the number of vcpus a domain has when the tool stack wants to display
> the parameters of EACH vcpu.
>
> I think the operation you meant in getdomaininfo is XEN_DOMCTL_max_vcpus (in
> file xen/common/domctl.c)?

No, the operation I had in mind was XEN_DOMCTL_getdomaininfo, which
will give you nr_online_vcpus.

> When the tool stack (command xl sched-rt -d domain) displays the parameters
> of EACH vcpu, the tool stack will allocate an array whose size is
> "sizeof(struct xen_domctl_sched_rt_params) * num_vcpus_of_this_domain" and
> bounce this array to the hypervisor. After hypervisor fills out the
> parameters of each vcpu, this array will be bounced out to tool stack to
> display to users.

Sure, there are lots of operations the toolstack wants that needs the
number of vcpus, which is why it's already exposed with
XEN_DOMCTL_getdomaininfo. :-)

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
  2014-08-26 14:27   ` Jan Beulich
  2014-09-03 13:40   ` George Dunlap
@ 2014-09-03 14:20   ` George Dunlap
  2014-09-03 14:45     ` Jan Beulich
                       ` (2 more replies)
  2014-09-05 17:17   ` Dario Faggioli
  3 siblings, 3 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-03 14:20 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 5b11bbf..27d01c1 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
>  typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>
> +/*
> + * This structure is used to pass to rt scheduler from a
> + * privileged domain to Xen
> + */
> +struct xen_domctl_sched_rt_params {
> +    /* get vcpus' info */
> +    uint64_t period; /* s_time_t type */
> +    uint64_t budget;
> +    uint16_t index;
> +    uint16_t padding[3];

Why the padding?

> +};
> +typedef struct xen_domctl_sched_rt_params xen_domctl_sched_rt_params_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_sched_rt_params_t);
>
>  /* XEN_DOMCTL_scheduler_op */
>  /* Scheduler types. */
> @@ -346,9 +359,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>  #define XEN_SCHEDULER_CREDIT   5
>  #define XEN_SCHEDULER_CREDIT2  6
>  #define XEN_SCHEDULER_ARINC653 7
> +#define XEN_SCHEDULER_RT_DS    8
> +
>  /* Set or get info? */
> -#define XEN_DOMCTL_SCHEDOP_putinfo 0
> -#define XEN_DOMCTL_SCHEDOP_getinfo 1
> +#define XEN_DOMCTL_SCHEDOP_putinfo      0
> +#define XEN_DOMCTL_SCHEDOP_getinfo      1
> +#define XEN_DOMCTL_SCHEDOP_getnumvcpus   2
>  struct xen_domctl_scheduler_op {
>      uint32_t sched_id;  /* XEN_SCHEDULER_* */
>      uint32_t cmd;       /* XEN_DOMCTL_SCHEDOP_* */
> @@ -367,6 +383,16 @@ struct xen_domctl_scheduler_op {
>          struct xen_domctl_sched_credit2 {
>              uint16_t weight;
>          } credit2;
> +        struct xen_domctl_sched_rt{
> +            /* get vcpus' params */
> +            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;
> +            uint16_t nr_vcpus;
> +            /* set one vcpu's params */
> +            uint16_t vcpu_index;
> +            uint16_t padding[2];

And again, why the padding?  This isn't a performance-critical bit of
code: you can safely let the compiler deal with adding padding to the
structure or managing mis-aligned reads.  Or if it really matters to
you, you can re-order the elements of the array so that they're
aligned naturally (e.g., by putting period and budget before
nr_vcpus).

> +            uint64_t period;
> +            uint64_t budget;
> +        } rt;

So if I'm reading this right, you set the information for vcpus one
vcpu at a time, but you want to read the whole lot out all at once?

I don't like the inconsistency.  It would be better if you did the
same thing each direction:  Either pass in an array with info about
the vcpus, or just read the vcpu information one-by-one.

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:08                 ` George Dunlap
@ 2014-09-03 14:24                   ` Meng Xu
  2014-09-03 14:35                     ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-03 14:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 3197 bytes --]

2014-09-03 10:08 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Fri, Aug 29, 2014 at 10:05 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >>>> On 28.08.14 at 18:06, <xumengpanda@gmail.com> wrote:
> >> As to your concern "I still don't see why such wouldn't be
> >> applicable uniformly to all schedulers.", are you suggesting that the
> >> credit and credit2 scheduler could also allow users to set/get each
> VCPU's
> >> parameters? (I think that could be possible but this should be some
> design
> >> decision made by credit and credit2 scheduler developer?)
> >
> > Perhaps. But I think a more uniform interface to the individual
> > schedulers would help on the tools side too. But in the end I'm
> > just raising questions/concerns here, it's George who needs to
> > be okay with your approach.
>
> Well the domctls aren't part of the stable ABI, so we can always
> refactor things if we want to.
>
> One could imagine restructuring the hypercall so that you could
> specify a list of vcpus at the top level; but given that none of the
> other schedulers actually want that at the moment, it seems like a lot
> of faff for no good reason.
>
> >
> >>> I think anything going into the seconds, not to speak of minutes or
> >>> hours, range is already beyond boundaries of being reasonable /
> >>> useful.
> >>
> >> Hmm, that's fair. If we want to limit the range to seconds, then uint32
> is
> >> enough. However, what if some user really want to set the period/budget
> to
> >> hours/days? Then we couldn't support it if we use uint32 type for
> >> period/budget. (This is just my subtle concern. I'm not arguing that we
> >> have to use uint64 or uint32. As long as everyone agrees with a type, I
> can
> >> just change it to the agreed type very quickly.) Do you have any
> suggestion
> >> of how we can reach an agreement and get the range finalized?
> >
> > Let's see what others, mainly George and Dario, think.
>
> Under what circumstances would anyone want a period of an hour?
> ​ ​
> At

this point we're talking about a vcpu being allowed to run
> continuously without pre-emption for half an hour, and then not
> *allowed* to run at all for another half hour.  That just seems really
> ridiculous.
>

​Honestly speaking, I don't think setting period and budget to hours is a
good idea
​ either. ​

​Actually, if users want to run a vcpu for half hour every hours, they
could also set it to run 5milliseconds every 10 milliseconds. So any large
period and budget can be scaled to smaller period and budget. The domain's
performance will not degrade when the period and budget of each vcpu of
this domain is scaled to a smaller number.

How about this:
I change the type to uint32 bit and use 2^32us ~= 1.19h as the upper bound
of period and budget.
​I document the range of period and budget on the website and also let
users know how to scale down the parameters of vcpus in case they have a
very large period and budget.

What do you guys think?

Thanks,

Meng​



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 5143 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:15       ` George Dunlap
@ 2014-09-03 14:35         ` Meng Xu
  0 siblings, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-03 14:35 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2548 bytes --]

Hi George,


2014-09-03 10:15 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Wed, Sep 3, 2014 at 3:11 PM, Meng Xu <xumengpanda@gmail.com> wrote:
> >> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> >> > index 55503e0..7d2c6d1 100644
> >> > --- a/xen/common/schedule.c
> >> > +++ b/xen/common/schedule.c
> >> > @@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
> >> >      &sched_credit_def,
> >> >      &sched_credit2_def,
> >> >      &sched_arinc653_def,
> >> > +    &sched_rt_def,
> >> >  };
> >> >
> >> >  static struct scheduler __read_mostly ops;
> >> > @@ -1090,7 +1091,8 @@ long sched_adjust(struct domain *d, struct
> >> > xen_domctl_scheduler_op *op)
> >> >
> >> >      if ( (op->sched_id != DOM2OP(d)->sched_id) ||
> >> >           ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
> >> > -          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
> >> > +          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) &&
> >> > +          (op->cmd != XEN_DOMCTL_SCHEDOP_getnumvcpus)) )
> >>
> >> Why are you introducing this as a schedop?  Isn't this information
> >> already exposed in getdomaininfo?
> >
> >
> > I introduce XEN_DOMCTL_SCHEDOP_getnumvcpus as a schedop because we need
> to
> > know the number of vcpus a domain has when the tool stack wants to
> display
> > the parameters of EACH vcpu.
> >
> > I think the operation you meant in getdomaininfo is XEN_DOMCTL_max_vcpus
> (in
> > file xen/common/domctl.c)?
>
> No, the operation I had in mind was XEN_DOMCTL_getdomaininfo, which
> will give you nr_online_vcpus.
>
> > When the tool stack (command xl sched-rt -d domain) displays the
> parameters
> > of EACH vcpu, the tool stack will allocate an array whose size is
> > "sizeof(struct xen_domctl_sched_rt_params) * num_vcpus_of_this_domain"
> and
> > bounce this array to the hypervisor. After hypervisor fills out the
> > parameters of each vcpu, this array will be bounced out to tool stack to
> > display to users.
>
> Sure, there are lots of operations the toolstack wants that needs the
> number of vcpus, which is why it's already exposed with
> XEN_DOMCTL_getdomaininfo. :-)


​I see. Thank you very much for your information and advice! It's really
useful!

I will remove the ​
 XEN_DOMCTL_max_vcpus
​ from schedop and use ​
XEN_DOMCTL_getdomaininfo
​ to get the number of vcpus instead! :-)​

​Best,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 4084 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:24                   ` Meng Xu
@ 2014-09-03 14:35                     ` Dario Faggioli
  0 siblings, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-03 14:35 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1961 bytes --]

On mer, 2014-09-03 at 10:24 -0400, Meng Xu wrote:

> 2014-09-03 10:08 GMT-04:00 George Dunlap
> <George.Dunlap@eu.citrix.com>:

>         Under what circumstances would anyone want a period of an
>         hour?
>         ​ ​
>         At
>         this point we're talking about a vcpu being allowed to run
>         continuously without pre-emption for half an hour, and then
>         not
>         *allowed* to run at all for another half hour.  That just
>         seems really
>         ridiculous.
> 
> 
> ​Honestly speaking, I don't think setting period and budget to hours
> is a good idea
> ​ either. ​
>  
Indeed, it makes no sense to me too.

> How about this:
> I change the type to uint32 bit and use 2^32us ~= 1.19h as the upper
> bound of period and budget. 
>
Fine for me.

> ​I document the range of period and budget on the website and also let
> users know how to scale down the parameters of vcpus in case they have
> a very large period and budget.
> 
I guess something like that won't harm on any website (BTW, are you
talking about RT-Xen website? Xen Wiki? Both? :-D). In the code, I won't
"waste" to much line of comments on this. I don't see any special need
for explaining the scaling down either, it's pretty straightforward once
you have even a very light knowledge of what the scheduling algorithm
does (like the bare minimum you need for being able to use it!).

Just make sure that you specify, especially in public headers of all the
various components (so Xen, libxc and libxl) what the time unit is, that
is _the_ important piece of information one needs to use the scheduler
properly.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:20   ` George Dunlap
@ 2014-09-03 14:45     ` Jan Beulich
  2014-09-03 14:59     ` Dario Faggioli
  2014-09-03 15:13     ` Meng Xu
  2 siblings, 0 replies; 72+ messages in thread
From: Jan Beulich @ 2014-09-03 14:45 UTC (permalink / raw)
  To: Meng Xu, George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Chao Wang, Chong Li,
	Dagaen Golomb

>>> On 03.09.14 at 16:20, <George.Dunlap@eu.citrix.com> wrote:
> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 5b11bbf..27d01c1 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
>>  typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>>
>> +/*
>> + * This structure is used to pass to rt scheduler from a
>> + * privileged domain to Xen
>> + */
>> +struct xen_domctl_sched_rt_params {
>> +    /* get vcpus' info */
>> +    uint64_t period; /* s_time_t type */
>> +    uint64_t budget;
>> +    uint16_t index;
>> +    uint16_t padding[3];
> 
> Why the padding?

These structures must look the same for a 32-bit and 64-bit Dom0.

Jan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:20   ` George Dunlap
  2014-09-03 14:45     ` Jan Beulich
@ 2014-09-03 14:59     ` Dario Faggioli
  2014-09-03 15:27       ` Meng Xu
  2014-09-03 15:13     ` Meng Xu
  2 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-03 14:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1703 bytes --]

On mer, 2014-09-03 at 15:20 +0100, George Dunlap wrote:
> So if I'm reading this right, you set the information for vcpus one
> vcpu at a time, but you want to read the whole lot out all at once?
> 
> I don't like the inconsistency.  It would be better if you did the
> same thing each direction:  Either pass in an array with info about
> the vcpus, or just read the vcpu information one-by-one.
> 
I agree. One by one (in both reading and writing direction) was what I
suggested, but others (I think it was Andrew) pointed out that, when
hypercall are involved, batching is always a good thing. And in fact
(although maybe not in hot paths) it's quite likely that one wants to
retrieve the parameters for all the vcpus (e.g., it is like that as far
as xl is concerned), so that's why, IIRC, Meng went for the batching
approach.

I also continue to think that, in the case of this scheduler, a good API
would include both calls to set and get one vcpu at a time *AND* calls
to set and get all of them.

I'm fine with this version only having one of the two alternatives (and
about adding the other one later), but I agree with George it should be
consistent.

If we like batching, then batching it is, but I think in that case we
need to think about one change the parameters of only one vcpu (perhaps
treat budget=0 && period=0 in the i-eth element of the array as 'don't
touch vcpu #i"?).

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:20   ` George Dunlap
  2014-09-03 14:45     ` Jan Beulich
  2014-09-03 14:59     ` Dario Faggioli
@ 2014-09-03 15:13     ` Meng Xu
  2014-09-03 16:06       ` George Dunlap
  2 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-03 15:13 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 5256 bytes --]

Hi George,


2014-09-03 10:20 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> > index 5b11bbf..27d01c1 100644
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
> >  typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> >
> > +/*
> > + * This structure is used to pass to rt scheduler from a
> > + * privileged domain to Xen
> > + */
> > +struct xen_domctl_sched_rt_params {
> > +    /* get vcpus' info */
> > +    uint64_t period; /* s_time_t type */
> > +    uint64_t budget;
> > +    uint16_t index;
> > +    uint16_t padding[3];
>
> Why the padding?
>
>
I did this because of Jan's comment "Also, you need to pad the structure to
a multiple of 8 bytes, or
its layout will differ between 32- and 64-bit (tool stack) callers." I
think what he said make sense so I added the padding here. :-)

Here is the link:​ http://marc.info/?l=xen-devel&m=140661680931179&w=2> > +};
> > +typedef struct xen_domctl_sched_rt_params xen_domctl_sched_rt_params_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_domctl_sched_rt_params_t);
> >
> >  /* XEN_DOMCTL_scheduler_op */
> >  /* Scheduler types. */
> > @@ -346,9 +359,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> >  #define XEN_SCHEDULER_CREDIT   5
> >  #define XEN_SCHEDULER_CREDIT2  6
> >  #define XEN_SCHEDULER_ARINC653 7
> > +#define XEN_SCHEDULER_RT_DS    8
> > +
> >  /* Set or get info? */
> > -#define XEN_DOMCTL_SCHEDOP_putinfo 0
> > -#define XEN_DOMCTL_SCHEDOP_getinfo 1
> > +#define XEN_DOMCTL_SCHEDOP_putinfo      0
> > +#define XEN_DOMCTL_SCHEDOP_getinfo      1
> > +#define XEN_DOMCTL_SCHEDOP_getnumvcpus   2
> >  struct xen_domctl_scheduler_op {
> >      uint32_t sched_id;  /* XEN_SCHEDULER_* */
> >      uint32_t cmd;       /* XEN_DOMCTL_SCHEDOP_* */
> > @@ -367,6 +383,16 @@ struct xen_domctl_scheduler_op {
> >          struct xen_domctl_sched_credit2 {
> >              uint16_t weight;
> >          } credit2;
> > +        struct xen_domctl_sched_rt{
> > +            /* get vcpus' params */
> > +            XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) vcpu;
> > +            uint16_t nr_vcpus;
> > +            /* set one vcpu's params */
> > +            uint16_t vcpu_index;
> > +            uint16_t padding[2];
>
> And again, why the padding?  This isn't a performance-critical bit of
> code: you can safely let the compiler deal with adding padding to the
> structure or managing mis-aligned reads.  Or if it really matters to
> you, you can re-order the elements of the array so that they're
> aligned naturally (e.g., by putting period and budget before
> nr_vcpus).
>

I agree that this is not a performance-critical bit of code. ​I can do
either way as you suggested.
​If I remove the padding, it means the structure won't be a multiply of 8
bytes and its layout will differ between 32- and 64-bit (tool stack)
callers. If that's fine, I'm totally fine with it. Just need consensus. ;-)


>
> > +            uint64_t period;
> > +            uint64_t budget;
> > +        } rt;
>
> So if I'm reading this right, you set the information for vcpus one
> vcpu at a time, but you want to read the whole lot out all at once?


​Yes. ​



> I don't like the inconsistency.  It would be better if you did the
> same thing each direction:  Either pass in an array with info about
> the vcpus, or just read the vcpu information one-by-one.


​I think it's a better idea to
 pass in an array with information about vcpus to get/set vcpus'
information.
​
I only need to change the code related to setting a vcpu's information.
​I have a question:​
​When we set a vcpu's information by using an array, we have two choices:

a) just create an array with one vcpu element, and specify the index of the
vcpu to modify; The concern to this method is that we only uses one element
of this array, so is it a good idea to use an array with only one element?
​b) create an array with all vcpus of this domain, modify the parameters of
the vcpu users want to change, and then bounce the array to hypervisor to
reset these vcpus' parameters.​ The concern to this method is that we don't
need any other vcpus' information to set a specific vcpu's parameters.
Bouncing the whole array with all vcpus information seems expensive and
unnecessary?

​Do you have any suggestion/advice/preference on this?​

​I don't really like about the idea of reading the vcpu's information
one-by-one​. :-) If a domain has many vcpus, say 12 vcpus, we will issue 12
hypercalls to get all vcpus' information of this domain. Because we only
need to issue one hypercall to get all information we want, the extra
hypercalls causes more overhead. This did simplify the implementation, but
may cause more overhead.

Thank you very much!

​Meng​​

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 8417 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 14:59     ` Dario Faggioli
@ 2014-09-03 15:27       ` Meng Xu
  2014-09-03 15:46         ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-03 15:27 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2684 bytes --]

2014-09-03 10:59 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> On mer, 2014-09-03 at 15:20 +0100, George Dunlap wrote:
> > So if I'm reading this right, you set the information for vcpus one
> > vcpu at a time, but you want to read the whole lot out all at once?
> >
> > I don't like the inconsistency.  It would be better if you did the
> > same thing each direction:  Either pass in an array with info about
> > the vcpus, or just read the vcpu information one-by-one.
> >
> I agree. One by one (in both reading and writing direction) was what I
> suggested, but others (I think it was Andrew) pointed out that, when
> hypercall are involved, batching is always a good thing. And in fact
> (although maybe not in hot paths) it's quite likely that one wants to
> retrieve the parameters for all the vcpus (e.g., it is like that as far
> as xl is concerned), so that's why, IIRC, Meng went for the batching
> approach.
>

​Yes. We did discuss this before. Just now, I tried to find the link to the
thread but couldn't. :-(
​


>
> I also continue to think that, in the case of this scheduler, a good API
> would include both calls to set and get one vcpu at a time *AND* calls
> to set and get all of them.
>

​Now it can get all vcpus and set one vcpu; ​
​​The rest of other two approaches are also not hard to implement. ​


>
> I'm fine with this version only having one of the two alternatives (and
> about adding the other one later), but I agree with George it should be
> consistent.
>
> If we like batching, then batching it is, but I think in that case we
> need to think about one change the parameters of only one vcpu (perhaps
> treat budget=0 && period=0 in the i-eth element of the array as 'don't
> touch vcpu #i"?).
>

​Actually it has three ways to implement this (if we go with batching):
1) As Dario mentioned, use period = 0 & budget=0 to indicate that this
vcpu's parameters should not be changed;
2)​ We can first bounce out all VCPU's information, change the vcpus'
information users specified, bounce the modified array back to hypervisor;
and hypervisor set all vcpus' information based on the modified array. If a
vcpu is not changed, its vcpu's information won't change, so its parameters
won't change;
3) We only create arrays that have the vcpus to be modified. After we bound
the array to hypervisor, hypervisor can use the vcpu's index to decide
which vcpu's parameters should be modified.

​Which method do you guys prefer?​

​Thanks,​

​Meng​



-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-08-24 22:58 ` [PATCH v1 3/4] libxl: " Meng Xu
  2014-08-25 13:17   ` Wei Liu
@ 2014-09-03 15:33   ` George Dunlap
  2014-09-03 20:52     ` Meng Xu
  2014-09-04 14:27     ` George Dunlap
  2014-09-05 10:21   ` Dario Faggioli
  2 siblings, 2 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-03 15:33 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> Add libxl functions to set/get domain's parameters for rt scheduler
>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> ---
>  tools/libxl/libxl.c         |  139 +++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl.h         |    7 +++
>  tools/libxl/libxl_types.idl |   15 +++++
>  3 files changed, 161 insertions(+)
>
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 3526539..440e8df31 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
>      return 0;
>  }
>
> +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
> +                               libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rt_params* sdom;
> +    uint16_t num_vcpus;
> +    int rc, i;
> +
> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting num_vcpus of domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +
> +    /* FIXME: can malloc be used in libxl? seems it was used in the file */
> +    sdom = (struct xen_domctl_sched_rt_params *)
> +            malloc( sizeof(struct xen_domctl_sched_rt_params) * num_vcpus );

There are two ways to do allocation; one which will be garbage
collected automatically (so you don't need to call free on it), and
one which returns something which the caller has to remember to free.

You can call libxl__calloc(gc, num_vcpus, sizeof(struct
xen_domctl_sched_rt_params)) with the garbage collector for this one;
then you won't need to free it below, as it will be freed by the
GC_FREE at the end of libxl_domain_params_get().  Then...

> +    if ( !sdom ){
> +        LOGE(ERROR, "Allocate sdom array fails\n");
> +        return ERROR_INVAL;
> +    }
> +
> +    rc = xc_sched_rt_domain_get(CTX->xch, domid, sdom, num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +
> +    /* FIXME: how to guarantee libxl_*_dispose be called exactly once? */
> +    libxl_domain_sched_params_init(scinfo);
> +
> +    scinfo->rt.num_vcpus = num_vcpus;
> +    scinfo->sched = LIBXL_SCHEDULER_RT;
> +    /* FIXME: can malloc be used in libxl? seems it was used in the file */
> +    scinfo->rt.vcpus = (libxl_vcpu *)
> +                       malloc( sizeof(libxl_vcpu) * scinfo->rt.num_vcpus );

...for this one, call libxl__calloc(NOGC, scinfo->rt.num_vcpus,
sizeof(libxl_vcpu)).  NOGC will cause the values not to be freed by
GC_FREE (so they should be freed by the caller).

And again, as Wei said, you don't need to check the return value for
either one, as libxl will handle memory allocation errors.

However, it looks like libxl_list_domain() managed to make the libxl
struct look exactly like the libxc struct, so that you don't need to
do this allocate-and-copy thing.  Could you try to arrange for that to
be the case for libxl_rt_vcpu?

> +/*
> + * Sanity check of the scinfo parameters
> + * return 0 if all values are valid
> + * return 1 if one param is default value
> + * return 2 if the target vcpu's index, period or budget is out of range
> + */

These should be checked in the hypervisor, not the toolstack.  The
hypervisor can return -EINVAL, and libxl can pass that error message
on to the caller.

So this function should go away; but in general, just as a style note,
you should avoid "magic constants" like this.  Ideally you'd re-use
some of the LIBXL error codes; but if that won't work, you should make
#define's with a suitable descriptive name.

> +static int sched_rt_domain_set_validate_params(libxl__gc *gc,
> +                                               const libxl_domain_sched_params *scinfo,
> +                                               const uint16_t num_vcpus)
> +{
> +    int vcpu_index = scinfo->rt.vcpu_index;
> +
> +    if (vcpu_index == LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT ||
> +        scinfo->rt.period == LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT ||
> +        scinfo->rt.budget == LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
> +    {
> +        return 1;
> +    }
> +
> +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
> +    {
> +        LOG(ERROR, "VCPU index is not set or out of range, "
> +                    "valid values are within range from 0 to %d", num_vcpus);
> +        return 2;
> +    }
> +
> +    if (scinfo->rt.period < 1 ||
> +        scinfo->rt.period > SCHED_RT_VCPU_PERIOD_MAX)
> +    {
> +        LOG(ERROR, "VCPU period is not set or out of range, "
> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PERIOD_MAX);
> +        return 2;
> +    }
> +
> +    if (scinfo->rt.budget < 1 ||
> +        scinfo->rt.budget > SCHED_RT_VCPU_BUDGET_MAX)
> +    {
> +        LOG(ERROR, "VCPU budget is not set or out of range, "
> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_BUDGET_MAX);
> +        return 2;
> +    }
> +
> +    return 0;
> +
> +}
> +
> +static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid,
> +                               const libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rt_params sdom;
> +    uint16_t num_vcpus;
> +    int rc;
> +
> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +
> +    rc = sched_rt_domain_set_validate_params(gc, scinfo, num_vcpus);
> +    if (rc == 2)
> +        return ERROR_INVAL;
> +    if (rc == 1)
> +        return 0;

Er, one of the parameters is the default value, and so you don't set
any of them?

> @@ -303,6 +304,19 @@ libxl_domain_restore_params = Struct("domain_restore_params", [
>      ("checkpointed_stream", integer),
>      ])
>
> +libxl_rt_vcpu = Struct("vcpu",[
> +    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
> +    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> +    ("index",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
> +    ])
> +
> +libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[
> +    ("vcpus",        Array(libxl_rt_vcpu, "num_vcpus")),
> +    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
> +    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> +    ("vcpu_index",   integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
> +    ])
> +
>  libxl_domain_sched_params = Struct("domain_sched_params",[
>      ("sched",        libxl_scheduler),
>      ("weight",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
> @@ -311,6 +325,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
>      ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
>      ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
>      ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
> +    ("rt",           libxl_domain_sched_rt_params),
>      ])

While the domctl interface is not stable, the libxl interface *is*
stable, so we definitely need to think carefully about what we want
this to look like.

Let me give that a think. :-)

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 15:27       ` Meng Xu
@ 2014-09-03 15:46         ` Dario Faggioli
  2014-09-03 17:13           ` George Dunlap
  0 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-03 15:46 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2109 bytes --]

On mer, 2014-09-03 at 11:27 -0400, Meng Xu wrote:


> ​Actually it has three ways to implement this (if we go with
> batching):
> 1) As Dario mentioned, use period = 0 & budget=0 to indicate that this
> vcpu's parameters should not be changed;
> 2)​ We can first bounce out all VCPU's information, change the vcpus'
> information users specified, bounce the modified array back to
> hypervisor; and hypervisor set all vcpus' information based on the
> modified array. If a vcpu is not changed, its vcpu's information won't
> change, so its parameters won't change; 
>
That would mean issueing an hypercall for getting the array of vcpu
params, then changing the elements corresponding to the vcpu(s?) you
want to update, and issue the actual vcpus_set() hypercall.

Then, in Xen, you'll always change the parameters of _all_ the vcpus,
with some of the new values being exactly equal to the old ones.

This is viable, but quite unpleasant, especially the hypervisor part,
where you risk disturbing vcpus minding their own (possibly real-time)
business for no reason. If you go for it, you should at least avoid
this, by recognizing, inside Xen, that the parameters are the same and
not mess with the vcpu.

Even with that avoided, you still need a large array, as it is for 1)
and, this time, you bounce it up and down, so this, wrt 1), looks like a
loose-loose to me.

> 3) We only create arrays that have the vcpus to be modified. After we
> bound the array to hypervisor, hypervisor can use the vcpu's index to
> decide which vcpu's parameters should be modified.
>
This is an option. I like it less than 1), but I can see that, with 1),
changing the parameters of 1 vcpu of a 32 vcpus domain means pushing
down to Xen quite a bit of 0-s,. :-O

So, my preference is: "not 2". :-D

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 4/4] xl: introduce rt scheduler
  2014-08-24 22:58 ` [PATCH v1 4/4] xl: introduce " Meng Xu
  2014-08-25 13:31   ` Wei Liu
@ 2014-09-03 15:52   ` George Dunlap
  2014-09-03 22:28     ` Meng Xu
  1 sibling, 1 reply; 72+ messages in thread
From: George Dunlap @ 2014-09-03 15:52 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> Add xl command for rt scheduler

The main thing about this is that you probably want to have an
interface where you can set the parameters for multiple vcpus at once;
and that's probably the case regardless of whether we end up making
the hypercall able to batch vcpus or do them one-at-a-time.

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 15:13     ` Meng Xu
@ 2014-09-03 16:06       ` George Dunlap
  2014-09-03 16:57         ` Dario Faggioli
  2014-09-04  2:11         ` Meng Xu
  0 siblings, 2 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-03 16:06 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Wed, Sep 3, 2014 at 4:13 PM, Meng Xu <xumengpanda@gmail.com> wrote:
> Hi George,
>
>
> 2014-09-03 10:20 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:
>
>> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
>> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> > index 5b11bbf..27d01c1 100644
>> > --- a/xen/include/public/domctl.h
>> > +++ b/xen/include/public/domctl.h
>> > @@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
>> >  typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
>> >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>> >
>> > +/*
>> > + * This structure is used to pass to rt scheduler from a
>> > + * privileged domain to Xen
>> > + */
>> > +struct xen_domctl_sched_rt_params {
>> > +    /* get vcpus' info */
>> > +    uint64_t period; /* s_time_t type */
>> > +    uint64_t budget;
>> > +    uint16_t index;
>> > +    uint16_t padding[3];
>>
>> Why the padding?
>>
>
> I did this because of Jan's comment "Also, you need to pad the structure to
> a multiple of 8 bytes, or
> its layout will differ between 32- and 64-bit (tool stack) callers." I think
> what he said make sense so I added the padding here. :-)
>
> Here is the link: http://marc.info/?l=xen-devel&m=140661680931179&w=2

Right. :-)  I personally prefer to handle that by re-arranging the
elements rather than adding padding, unless absolutely necessary.  In
this case that shouldn't be too hard, particularly once we pare the
interface down so we only have one interface (either all one vcpu at a
time, or all batched vcpus).

> I think it's a better idea to
>  pass in an array with information about vcpus to get/set vcpus'
> information.
>
> I only need to change the code related to setting a vcpu's information.
> I have a question:
> When we set a vcpu's information by using an array, we have two choices:
>
> a) just create an array with one vcpu element, and specify the index of the
> vcpu to modify; The concern to this method is that we only uses one element
> of this array, so is it a good idea to use an array with only one element?
> b) create an array with all vcpus of this domain, modify the parameters of
> the vcpu users want to change, and then bounce the array to hypervisor to
> reset these vcpus' parameters. The concern to this method is that we don't
> need any other vcpus' information to set a specific vcpu's parameters.
> Bouncing the whole array with all vcpus information seems expensive and
> unnecessary?
>
> Do you have any suggestion/advice/preference on this?
>
> I don't really like about the idea of reading the vcpu's information
> one-by-one. :-) If a domain has many vcpus, say 12 vcpus, we will issue 12
> hypercalls to get all vcpus' information of this domain. Because we only
> need to issue one hypercall to get all information we want, the extra
> hypercalls causes more overhead. This did simplify the implementation, but
> may cause more overhead.

For convenience for users, I think it's definitely the case that libxl
should provide an interface to get and set all the vcpu parameters at
once.  Then it can either batch them all into a single hypercall (if
that's what we decide), or it can make the individual calls for each
vcpu.

But I don't really expect this to be a performance-critical operation.
Twelve hypercalls 1000 times a second is obviously way too many; but
12 hypercalls even once a second isn't really a big deal.

The main reason I would think to batch the hypercalls is for
consistency: it seems like you may want to change the period / budget
of vcpus atomically, rather than setting one, possibly having dom0
de-scheduled for a few hundred milliseconds, and then setting another.
Same thing goes for reading: I would think you would want a consistent
"snapshot" of some existing state, rather than having the possibility
of reading half the state, then having someone change it, and then
reading the other half.

Re the per-vcpu settings, though: Is it really that common for RT
domains to want different parameters for different vcpus?  Are these
parameters exposed to the guest in any way, so that it can make more
reasonable decisions as to where to run what kinds of workloads?

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 16:06       ` George Dunlap
@ 2014-09-03 16:57         ` Dario Faggioli
  2014-09-03 17:18           ` George Dunlap
  2014-09-04  2:11         ` Meng Xu
  1 sibling, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-03 16:57 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 5822 bytes --]

On mer, 2014-09-03 at 17:06 +0100, George Dunlap wrote:
> On Wed, Sep 3, 2014 at 4:13 PM, Meng Xu <xumengpanda@gmail.com> wrote:
> > Hi George,
> >
> >
> > 2014-09-03 10:20 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:
> >
> >> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> >> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >> > index 5b11bbf..27d01c1 100644
> >> > --- a/xen/include/public/domctl.h
> >> > +++ b/xen/include/public/domctl.h
> >> > @@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
> >> >  typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
> >> >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> >> >
> >> > +/*
> >> > + * This structure is used to pass to rt scheduler from a
> >> > + * privileged domain to Xen
> >> > + */
> >> > +struct xen_domctl_sched_rt_params {
> >> > +    /* get vcpus' info */
> >> > +    uint64_t period; /* s_time_t type */
> >> > +    uint64_t budget;
> >> > +    uint16_t index;
> >> > +    uint16_t padding[3];
> >>
> >> Why the padding?
> >>
> >
> > I did this because of Jan's comment "Also, you need to pad the structure to
> > a multiple of 8 bytes, or
> > its layout will differ between 32- and 64-bit (tool stack) callers." I think
> > what he said make sense so I added the padding here. :-)
> >
> > Here is the link: http://marc.info/?l=xen-devel&m=140661680931179&w=2
> 
> Right. :-)  I personally prefer to handle that by re-arranging the
> elements rather than adding padding, unless absolutely necessary.  In
> this case that shouldn't be too hard, particularly once we pare the
> interface down so we only have one interface (either all one vcpu at a
> time, or all batched vcpus).
> 
> > I think it's a better idea to
> >  pass in an array with information about vcpus to get/set vcpus'
> > information.
> >
> > I only need to change the code related to setting a vcpu's information.
> > I have a question:
> > When we set a vcpu's information by using an array, we have two choices:
> >
> > a) just create an array with one vcpu element, and specify the index of the
> > vcpu to modify; The concern to this method is that we only uses one element
> > of this array, so is it a good idea to use an array with only one element?
> > b) create an array with all vcpus of this domain, modify the parameters of
> > the vcpu users want to change, and then bounce the array to hypervisor to
> > reset these vcpus' parameters. The concern to this method is that we don't
> > need any other vcpus' information to set a specific vcpu's parameters.
> > Bouncing the whole array with all vcpus information seems expensive and
> > unnecessary?
> >
> > Do you have any suggestion/advice/preference on this?
> >
> > I don't really like about the idea of reading the vcpu's information
> > one-by-one. :-) If a domain has many vcpus, say 12 vcpus, we will issue 12
> > hypercalls to get all vcpus' information of this domain. Because we only
> > need to issue one hypercall to get all information we want, the extra
> > hypercalls causes more overhead. This did simplify the implementation, but
> > may cause more overhead.
> 
> For convenience for users, I think it's definitely the case that libxl
> should provide an interface to get and set all the vcpu parameters at
> once.  Then it can either batch them all into a single hypercall (if
> that's what we decide), or it can make the individual calls for each
> vcpu.
> 
Indeed.

> The main reason I would think to batch the hypercalls is for
> consistency: it seems like you may want to change the period / budget
> of vcpus atomically, rather than setting one, possibly having dom0
> de-scheduled for a few hundred milliseconds, and then setting another.
> Same thing goes for reading: I would think you would want a consistent
> "snapshot" of some existing state, rather than having the possibility
> of reading half the state, then having someone change it, and then
> reading the other half.
> 
That is actually the reason why I'd have both things. A "change this
one" variant is handy if one actually had to change only one vcpu, or a
few, but does not mind the non-atomicity.

The batched variant, for both overhead and atomicity reasons.

> Re the per-vcpu settings, though: Is it really that common for RT
> domains to want different parameters for different vcpus?  
>
Whether it's common it is hard to say, but yes, it has to be possible. 

For instance, I can put, in an SMP guest, two real-time applications
with different timing requirements, and pin each one to a different
(v)cpu (I mean pin *inside* the guest). At this point, I'd like for each
vcpu to have a set of RT scheduling parameters, at the Xen level, that
matches the timing requirements of what's running inside.

This may not look so typical in a server/cloud environment, but can
happen (at least in my experience) in a mobile/embedded env.

> Are these
> parameters exposed to the guest in any way, so that it can make more
> reasonable decisions as to where to run what kinds of workloads?
> 
Not right now, AFAICS, but forms of 'scheduling paravirtualization', or
in general this kind of interaction/communication could be very useful
in real-time virtualization, so we may want to support that in future.

In any case, even without that in place right now, I think different
parameters for different vcpus is certainly something we want from an RT
scheduler.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 15:46         ` Dario Faggioli
@ 2014-09-03 17:13           ` George Dunlap
  0 siblings, 0 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-03 17:13 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb

On Wed, Sep 3, 2014 at 4:46 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
>> 3) We only create arrays that have the vcpus to be modified. After we
>> bound the array to hypervisor, hypervisor can use the vcpu's index to
>> decide which vcpu's parameters should be modified.
>>
> This is an option. I like it less than 1), but I can see that, with 1),
> changing the parameters of 1 vcpu of a 32 vcpus domain means pushing
> down to Xen quite a bit of 0-s,. :-O

I guess the real question is how often we expect to be updating the
parameters of a single vcpu, vs just setting all of the parameters
(e.g., during domain creation).

If we expect a lot of tweaking from user-space tools, then we should
go with 3.  If we expect people to set parameters and then leave them
alone, we should go with 1.

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 16:57         ` Dario Faggioli
@ 2014-09-03 17:18           ` George Dunlap
  2014-09-04  2:15             ` Meng Xu
  2014-09-04 14:27             ` Dario Faggioli
  0 siblings, 2 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-03 17:18 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb

On Wed, Sep 3, 2014 at 5:57 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
>> Re the per-vcpu settings, though: Is it really that common for RT
>> domains to want different parameters for different vcpus?
>>
> Whether it's common it is hard to say, but yes, it has to be possible.
>
> For instance, I can put, in an SMP guest, two real-time applications
> with different timing requirements, and pin each one to a different
> (v)cpu (I mean pin *inside* the guest). At this point, I'd like for each
> vcpu to have a set of RT scheduling parameters, at the Xen level, that
> matches the timing requirements of what's running inside.
>
> This may not look so typical in a server/cloud environment, but can
> happen (at least in my experience) in a mobile/embedded env.

But to play devil's advocate for a minute here: couldn't you just put
them in two different single-vcpu VMs then?

>> Are these
>> parameters exposed to the guest in any way, so that it can make more
>> reasonable decisions as to where to run what kinds of workloads?
>>
> Not right now, AFAICS, but forms of 'scheduling paravirtualization', or
> in general this kind of interaction/communication could be very useful
> in real-time virtualization, so we may want to support that in future.
>
> In any case, even without that in place right now, I think different
> parameters for different vcpus is certainly something we want from an RT
> scheduler.

Yeah, the "expose parameters to guests" was just thinking out loud
about what would be useful in the future.  I think possibly allowing a
VM to change its period (while keeping the budget / period ratio the
same) might make sense as well; that way you could have an RT
"appliance" VM that you could just pop onto a system and let it
configure itself, without the user having to do more than give it
basic parameters.

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-03 15:33   ` George Dunlap
@ 2014-09-03 20:52     ` Meng Xu
  2014-09-04 14:27     ` George Dunlap
  1 sibling, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-03 20:52 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 10666 bytes --]

Hi George,


2014-09-03 11:33 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> > Add libxl functions to set/get domain's parameters for rt scheduler
> >
> > Signed-off-by: Sisu Xi <xisisu@gmail.com>
> > Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> > ---
> >  tools/libxl/libxl.c         |  139
> +++++++++++++++++++++++++++++++++++++++++++
> >  tools/libxl/libxl.h         |    7 +++
> >  tools/libxl/libxl_types.idl |   15 +++++
> >  3 files changed, 161 insertions(+)
> >
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index 3526539..440e8df31 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc,
> uint32_t domid,
> >      return 0;
> >  }
> >
> > +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
> > +                               libxl_domain_sched_params *scinfo)
> > +{
> > +    struct xen_domctl_sched_rt_params* sdom;
> > +    uint16_t num_vcpus;
> > +    int rc, i;
> > +
> > +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> > +    if (rc != 0) {
> > +        LOGE(ERROR, "getting num_vcpus of domain sched rt");
> > +        return ERROR_FAIL;
> > +    }
> > +
> > +    /* FIXME: can malloc be used in libxl? seems it was used in the
> file */
> > +    sdom = (struct xen_domctl_sched_rt_params *)
> > +            malloc( sizeof(struct xen_domctl_sched_rt_params) *
> num_vcpus );
>
> There are two ways to do allocation; one which will be garbage
> collected automatically (so you don't need to call free on it), and
> one which returns something which the caller has to remember to free.
>
> You can call libxl__calloc(gc, num_vcpus, sizeof(struct
> xen_domctl_sched_rt_params)) with the garbage collector for this one;
> then you won't need to free it below, as it will be freed by the
> GC_FREE at the end of libxl_domain_params_get().  Then...
>

​Thank you very much for your suggestion! I modified the code as you said
after Wei pointed this out. In the next patch, it will use the GC
mechanisms to handle the memory allocation.
​


>
> > +    if ( !sdom ){
> > +        LOGE(ERROR, "Allocate sdom array fails\n");
> > +        return ERROR_INVAL;
> > +    }
> > +
> > +    rc = xc_sched_rt_domain_get(CTX->xch, domid, sdom, num_vcpus);
> > +    if (rc != 0) {
> > +        LOGE(ERROR, "getting domain sched rt");
> > +        return ERROR_FAIL;
> > +    }
> > +
> > +    /* FIXME: how to guarantee libxl_*_dispose be called exactly once?
> */
> > +    libxl_domain_sched_params_init(scinfo);
> > +
> > +    scinfo->rt.num_vcpus = num_vcpus;
> > +    scinfo->sched = LIBXL_SCHEDULER_RT;
> > +    /* FIXME: can malloc be used in libxl? seems it was used in the
> file */
> > +    scinfo->rt.vcpus = (libxl_vcpu *)
> > +                       malloc( sizeof(libxl_vcpu) *
> scinfo->rt.num_vcpus );
>
> ...for this one, call libxl__calloc(NOGC, scinfo->rt.num_vcpus,
> sizeof(libxl_vcpu)).  NOGC will cause the values not to be freed by
> GC_FREE (so they should be freed by the caller).
>
> And again, as Wei said, you don't need to check the return value for
> either one, as libxl will handle memory allocation errors.
>
​​
​Yes, next release will change as you and Wei said. (I have actually
changed it for the next release. :-)​ )


>
> However, it looks like libxl_list_domain() managed to make the libxl
> struct look exactly like the libxc struct, so that you don't need to
> do this allocate-and-copy thing.  Could you try to arrange for that to
> be the case for libxl_rt_vcpu?
>

​Do you mean that I pass the libxl_rt_vcpu structure as the parameter for
the function  xc_sched_rt_domain_get(CTX->xch, domid, sdom, num_vcpus),
i.e., replace the sdom with the libxl_rt_vcpu data. Then I can avoid
copying from the libxc data sdom to the libxl_rt_vcpu?

​If so, I think it's doable and will change it.​



>
> > +/*
> > + * Sanity check of the scinfo parameters
> > + * return 0 if all values are valid
> > + * return 1 if one param is default value
> > + * return 2 if the target vcpu's index, period or budget is out of range
> > + */
>
> These should be checked in the hypervisor, not the toolstack.  The
> hypervisor can return -EINVAL, and libxl can pass that error message
> on to the caller.
>

​I saw the toolstack for credit and credit2 scheduler also check the
validity of the new parameters. That's why I check the validity of the
parameters at the toolstack.
For example, in function sched_credit_domain_set() @ file
tools/libxl/libxl.c does the check as follows:

    if (scinfo->weight != LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT) {

        if (scinfo->weight < 1 || scinfo->weight > 65535) {

            LOG(ERROR, "Cpu weight out of range, "

                "valid values are within range from 1 to 65535");

            return ERROR_INVAL;

        }

        sdom.weight = scinfo->weight;

    }

Just to confirm, should I not follow what credit does in the libxl.c and
check the validity of the parameters in hypervisor as you suggested? Please
bear with me about this question because I want to make sure I'm heading to
the correct direction. :-)

>
> So this function should go away; but in general, just as a style note,
> you should avoid "magic constants" like this.  Ideally you'd re-use
> some of the LIBXL error codes; but if that won't work, you should make
> #define's with a suitable descriptive name.
>

​Got it! Thank you very much! This is my bad and I will change it. :-( ​



>
> > +static int sched_rt_domain_set_validate_params(libxl__gc *gc,
> > +                                               const
> libxl_domain_sched_params *scinfo,
> > +                                               const uint16_t num_vcpus)
> > +{
> > +    int vcpu_index = scinfo->rt.vcpu_index;
> > +
> > +    if (vcpu_index == LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT ||
> > +        scinfo->rt.period == LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT ||
> > +        scinfo->rt.budget == LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
> > +    {
> > +        return 1;
> > +    }
> > +
> > +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
> > +    {
> > +        LOG(ERROR, "VCPU index is not set or out of range, "
> > +                    "valid values are within range from 0 to %d",
> num_vcpus);
> > +        return 2;
> > +    }
> > +
> > +    if (scinfo->rt.period < 1 ||
> > +        scinfo->rt.period > SCHED_RT_VCPU_PERIOD_MAX)
> > +    {
> > +        LOG(ERROR, "VCPU period is not set or out of range, "
> > +                    "valid values are within range from 0 to %lu",
> SCHED_RT_VCPU_PERIOD_MAX);
> > +        return 2;
> > +    }
> > +
> > +    if (scinfo->rt.budget < 1 ||
> > +        scinfo->rt.budget > SCHED_RT_VCPU_BUDGET_MAX)
> > +    {
> > +        LOG(ERROR, "VCPU budget is not set or out of range, "
> > +                    "valid values are within range from 0 to %lu",
> SCHED_RT_VCPU_BUDGET_MAX);
> > +        return 2;
> > +    }
> > +
> > +    return 0;
> > +
> > +}
> > +
> > +static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid,
> > +                               const libxl_domain_sched_params *scinfo)
> > +{
> > +    struct xen_domctl_sched_rt_params sdom;
> > +    uint16_t num_vcpus;
> > +    int rc;
> > +
> > +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> > +    if (rc != 0) {
> > +        LOGE(ERROR, "getting domain sched rt");
> > +        return ERROR_FAIL;
> > +    }
> > +
> > +    rc = sched_rt_domain_set_validate_params(gc, scinfo, num_vcpus);
> > +    if (rc == 2)
> > +        return ERROR_INVAL;
> > +    if (rc == 1)
> > +        return 0;
>
> Er, one of the parameters is the default value, and so you don't set
> any of them?
>

​Hmm, yes, users should specify the vcpu index, the new period and the new
budget to set a vcpu's parameter. They have to give us the index of the
vcpu to set, otherwise, we don't know which vcpu the new parameters are set
to. :-)

​We can definitely allow users to only specify period or budget to just set
the period or budget. If you think that's a better way to do it. I can
modify it. ​:-)



>
> > @@ -303,6 +304,19 @@ libxl_domain_restore_params =
> Struct("domain_restore_params", [
> >      ("checkpointed_stream", integer),
> >      ])
> >
> > +libxl_rt_vcpu = Struct("vcpu",[
> > +    ("period",       uint64, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
> > +    ("budget",       uint64, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> > +    ("index",        integer, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
> > +    ])
> > +
> > +libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[
> > +    ("vcpus",        Array(libxl_rt_vcpu, "num_vcpus")),
> > +    ("period",       uint64, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
> > +    ("budget",       uint64, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> > +    ("vcpu_index",   integer, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
> > +    ])
> > +
> >  libxl_domain_sched_params = Struct("domain_sched_params",[
> >      ("sched",        libxl_scheduler),
> >      ("weight",       integer, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
> > @@ -311,6 +325,7 @@ libxl_domain_sched_params =
> Struct("domain_sched_params",[
> >      ("slice",        integer, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
> >      ("latency",      integer, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
> >      ("extratime",    integer, {'init_val':
> 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
> > +    ("rt",           libxl_domain_sched_rt_params),
> >      ])
>
> While the domctl interface is not stable, the libxl interface *is*
> stable, so we definitely need to think carefully about what we want
> this to look like.
>
> Let me give that a think. :-)
>

​Sure! ​ I totally agree! When every one agrees with the interface, I can
modify it accordingly very quickly. The d
ifficult thing is the consensus of how the interface should look like :-P

​Thank you very much for your comments and suggestions! ​


​Meng​


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 15805 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 4/4] xl: introduce rt scheduler
  2014-09-03 15:52   ` George Dunlap
@ 2014-09-03 22:28     ` Meng Xu
  2014-09-05  9:40       ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-03 22:28 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1158 bytes --]

Hi George,

2014-09-03 11:52 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> > Add xl command for rt scheduler
>
> The main thing about this is that you probably want to have an
> interface where you can set the parameters for multiple vcpus at once;
> and that's probably the case regardless of whether we end up making
> the hypercall able to batch vcpus or do them one-at-a-time.
>

​I'm totally fine with allowing users to set the parameters for multiple
vcpus at once.

My question is:
When users set multiple vcpus at once, how the command looks like?
For example, if they want to set two vcpus' parameters for dom 1's at once
(to be specific, set vcpu 0's period to 20 and budget to 10, and set vcpu
1's period to 30 and budget to 20.),  they should use the command like "xl
sched-rt -d 1 -v 0 -p 20 -b 10 -v 1 -p 30 -b 20"?
When several vcpus are set at once, the command will be very long. Is that
ok?

​Thanks,

Meng​


-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 2231 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 16:06       ` George Dunlap
  2014-09-03 16:57         ` Dario Faggioli
@ 2014-09-04  2:11         ` Meng Xu
  2014-09-04 11:00           ` Dario Faggioli
  2014-09-04 13:03           ` George Dunlap
  1 sibling, 2 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-04  2:11 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2407 bytes --]

2014-09-03 12:06 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Wed, Sep 3, 2014 at 4:13 PM, Meng Xu <xumengpanda@gmail.com> wrote:
> > Hi George,
> >
> >
> > 2014-09-03 10:20 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:
> >
> >> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> >> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >> > index 5b11bbf..27d01c1 100644
> >> > --- a/xen/include/public/domctl.h
> >> > +++ b/xen/include/public/domctl.h
> >> > @@ -339,6 +339,19 @@ struct xen_domctl_max_vcpus {
> >> >  typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
> >> >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> >> >
> >> > +/*
> >> > + * This structure is used to pass to rt scheduler from a
> >> > + * privileged domain to Xen
> >> > + */
> >> > +struct xen_domctl_sched_rt_params {
> >> > +    /* get vcpus' info */
> >> > +    uint64_t period; /* s_time_t type */
> >> > +    uint64_t budget;
> >> > +    uint16_t index;
> >> > +    uint16_t padding[3];
> >>
> >> Why the padding?
> >>
> >
> > I did this because of Jan's comment "Also, you need to pad the structure
> to
> > a multiple of 8 bytes, or
> > its layout will differ between 32- and 64-bit (tool stack) callers." I
> think
> > what he said make sense so I added the padding here. :-)
> >
> > Here is the link: http://marc.info/?l=xen-devel&m=140661680931179&w=2
>
> Right. :-)  I personally prefer to handle that by re-arranging the
> elements rather than adding padding, unless absolutely necessary.  In
> this case that shouldn't be too hard, particularly once we pare the
> interface down so we only have one interface (either all one vcpu at a
> time, or all batched vcpus).
>

​I agree. When we settle down the interface, I will implement it. At that
time, I will prefer to handle the layout by rearranging the elements unless
it is necessary to pad it. ​

​So let's first settle down the approach of setting/getting vcpus, (either
one vcpu at a time or all batched vcpus.)​

​What do you guys think about the approach of setting/getting vcpus? which
approach do you prefer? Can we vote for it? (or maybe there exist some
other ways to reach agreement? :-))​

​Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 3761 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 17:18           ` George Dunlap
@ 2014-09-04  2:15             ` Meng Xu
  2014-09-04 14:27             ` Dario Faggioli
  1 sibling, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-04  2:15 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1485 bytes --]

2014-09-03 13:18 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Wed, Sep 3, 2014 at 5:57 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> >> Re the per-vcpu settings, though: Is it really that common for RT
> >> domains to want different parameters for different vcpus?
> >>
> > Whether it's common it is hard to say, but yes, it has to be possible.
> >
> > For instance, I can put, in an SMP guest, two real-time applications
> > with different timing requirements, and pin each one to a different
> > (v)cpu (I mean pin *inside* the guest). At this point, I'd like for each
> > vcpu to have a set of RT scheduling parameters, at the Xen level, that
> > matches the timing requirements of what's running inside.
> >
> > This may not look so typical in a server/cloud environment, but can
> > happen (at least in my experience) in a mobile/embedded env.
>
> But to play devil's advocate for a minute here: couldn't you just put
> them in two different single-vcpu VMs then?
>

​Not really. What if these two applications (or threads) want to
communicate with each other. For example, One thread is sensing some data
and send to the other thread to do some control-algorithm's computation.

Well, they can still share info. between VMs, but the overhead should be
larger than the sharing in the same domain. ​
 ​
​Meng​


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 2311 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-04  2:11         ` Meng Xu
@ 2014-09-04 11:00           ` Dario Faggioli
  2014-09-04 13:03           ` George Dunlap
  1 sibling, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-04 11:00 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2288 bytes --]

On mer, 2014-09-03 at 22:11 -0400, Meng Xu wrote:


> ​So let's first settle down the approach of setting/getting vcpus,
> (either one vcpu at a time or all batched vcpus.)​
> 
> 
> ​What do you guys think about the approach of setting/getting vcpus?
> which approach do you prefer? Can we vote for it? (or maybe there
> exist some other ways to reach agreement? :-))​
> 
I'm not a fan of voting for this kind of things. :-)

I think we were close to nail it down in previous messages within this
thread. George said, in
<CAFLBxZaV9kUtc0=Ew3yO0BvR6EvLsHjK2pEOj6yB4eFpwG8big@mail.gmail.com>

 <<I guess the real question is how often we expect to be updating the
   parameters of a single vcpu, vs just setting all of the parameters
   (e.g., during domain creation).

   If we expect a lot of tweaking from user-space tools, then we should
   go with 3.  If we expect people to set parameters and then leave them
   alone, we should go with 1.>>

(where 3 is the array with a variable number of elements, i.e., only
relative to the affected vcpus, and 1 is the full array, with 0-s [or
whatever else] for 'don't touch this')

It's hard to be sure, but I'm leaning toward the full array (so 1, in
Meng's list in that email). In fact, I expect parameters to be set for
all the vcpus at, or immediately after, domain creation and, if
something has to change at some point, it's quite likely that it will
involve most (if not all) the vcpus (possibly, to different params, of
course). I also do not expect for that to happen too frequently.

1 also looks to me, although potentially inefficient, rather easier to
use and implement. E.g., immagine a toolstack --different from xl (and
perhaps even different than libxl)-- which has to live in a Dom0 with
limited support for dynamic memory allocations (are we thinking embedded
or not :-P). With 1, it can just prepare a static array a go with it,
while 3 may cause problems.

So, yes, I think I'd go for 1.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-04  2:11         ` Meng Xu
  2014-09-04 11:00           ` Dario Faggioli
@ 2014-09-04 13:03           ` George Dunlap
  2014-09-04 14:00             ` Meng Xu
  1 sibling, 1 reply; 72+ messages in thread
From: George Dunlap @ 2014-09-04 13:03 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

On Thu, Sep 4, 2014 at 3:11 AM, Meng Xu <xumengpanda@gmail.com> wrote:
>
> What do you guys think about the approach of setting/getting vcpus? which
> approach do you prefer? Can we vote for it? (or maybe there exist some other
> ways to reach agreement? :-))

Well, the standard thing is for us to try to reach a consensus if
possible.  If there's not a consensus, I as the scheduler maintainer
would get to / have to decide; Keir or Jan might overrule me if they
thought I was being completely unreasonable, but that would be a
pretty exceptional case. :-)

On Thu, Sep 4, 2014 at 12:00 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> It's hard to be sure, but I'm leaning toward the full array (so 1, in
> Meng's list in that email). In fact, I expect parameters to be set for
> all the vcpus at, or immediately after, domain creation and, if
> something has to change at some point, it's quite likely that it will
> involve most (if not all) the vcpus (possibly, to different params, of
> course). I also do not expect for that to happen too frequently.

That would have been my guess -- that "set up once and don't change"
was the common case.

So let's go with #1:
* Always pass an array of all vcpus
* But have some magic values which mean "don't change this one".

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-04 13:03           ` George Dunlap
@ 2014-09-04 14:00             ` Meng Xu
  0 siblings, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-04 14:00 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Chenyang Lu,
	Dario Faggioli, Ian Jackson, xen-devel, Linh Thi Xuan Phan,
	Meng Xu, Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1801 bytes --]

Hi George and Dario,


2014-09-04 9:03 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Thu, Sep 4, 2014 at 3:11 AM, Meng Xu <xumengpanda@gmail.com> wrote:
> >
> > What do you guys think about the approach of setting/getting vcpus? which
> > approach do you prefer? Can we vote for it? (or maybe there exist some
> other
> > ways to reach agreement? :-))
>
> Well, the standard thing is for us to try to reach a consensus if
> possible.  If there's not a consensus, I as the scheduler maintainer
> would get to / have to decide; Keir or Jan might overrule me if they
> thought I was being completely unreasonable, but that would be a
> pretty exceptional case. :-)
>

​I see. This is also a great idea! :-) Next time when I encounter such
issues, I will summarize the strength and weakness of each approach and let
you guys decide. :-)​



>
> On Thu, Sep 4, 2014 at 12:00 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > It's hard to be sure, but I'm leaning toward the full array (so 1, in
> > Meng's list in that email). In fact, I expect parameters to be set for
> > all the vcpus at, or immediately after, domain creation and, if
> > something has to change at some point, it's quite likely that it will
> > involve most (if not all) the vcpus (possibly, to different params, of
> > course). I also do not expect for that to happen too frequently.
>
> That would have been my guess -- that "set up once and don't change"
> was the common case.
>
> So let's go with #1:
> * Always pass an array of all vcpus
> * But have some magic values which mean "don't change this one".
>
>
​Great! I will do that! :-)

​Meng​


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 3012 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 17:18           ` George Dunlap
  2014-09-04  2:15             ` Meng Xu
@ 2014-09-04 14:27             ` Dario Faggioli
  2014-09-04 15:30               ` Meng Xu
  1 sibling, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-04 14:27 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 5621 bytes --]

On mer, 2014-09-03 at 18:18 +0100, George Dunlap wrote:
> On Wed, Sep 3, 2014 at 5:57 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:

> > For instance, I can put, in an SMP guest, two real-time applications
> > with different timing requirements, and pin each one to a different
> > (v)cpu (I mean pin *inside* the guest). At this point, I'd like for each
> > vcpu to have a set of RT scheduling parameters, at the Xen level, that
> > matches the timing requirements of what's running inside.
> >
> > This may not look so typical in a server/cloud environment, but can
> > happen (at least in my experience) in a mobile/embedded env.
> 
> But to play devil's advocate for a minute here: 
>
Hehe, please, be my guest! :-D :-D

> couldn't you just put
> them in two different single-vcpu VMs then?
> 
Sure I can! Or not. :-)

What I mean is that, yes, for a whole bunch of people, "let's just use 2
VMs can be an answer". For others, it just can't, e.g., as Meng said, if
there are communication constraints or, perhaps, evenn just if one has
the application implemented like that already, running on a 2-way
baremetal system, and splitting would mean rewriting, instead of just
putting it inside a VM, as it usually happens with virtualization.

All this to say that, even if there exist (as always?) workarounds, this
is a feature I really --at some point, if not now-- would like to
support.

One more reason for that, that I at least wanted to cite, comes from the
so-called "real-time theory". Yeah, yeah, I know, academia people are
just crazy (right Meng! :-P), but from time to time, something that
makes sense comes out of there (ehm... like Xen? :-P).

So, you'd have to bear with me if I don't have all the details about the
involved math fresh in my mind (perhaps Meng can help here, if anyone is
interested), an if I reference a couple of research papers.
The key point is there are theoretically sound and mathematically proved
reasons for why, if you have a 2 vcpus VM and you want to give it 140%
pCPU bandwidth, it may be very bad idea to give 70% to each of its
vcpus.

The papers are these two here:
[1] A Hierarchical Multiprocessor Bandwidth Reservation Scheme
    with Timing Guarantees
    http://www.cs.unc.edu/~anderson/papers/ecrts08c.pdf
[2] Hierarchical Scheduling Framework for Virtual Clustering of
    Multiprocessors
    http://cps.kaist.ac.kr/papers/08ecrts-virtual-clustering.pdf

They (especially the first one) don't mention virtualization explicitly.
Rather, they talk about something called 'hierarchical scheduling'. Just
think about Xen scheduler al tier 0, and DomU scheduler as tier 1, of
the hierarchy, and here you have the analogy!

I find Example 2 (pages 3 and 4) in [1] particularly well suited to
explain the basic idea, while I guess Meng is a lot more familiar with
[2] (it comes right his institution! :-) )

It all comes from the fact that execution (of a real-time task inside a
VM) is sequential. Suppose I have that 2 vcpus VM, to which I need to
assign 140% of pCPU bandwidth, and inside of which there can be one or
more real-time task, together with some other _non_ real-time
activities.
The times that I have more than one rt task active contemporary, inside
the VM, they can exploit the parallelism the 2 vcpus provide them, so
(to certain extent) no big deal about the single vcpus' parameters. The
times I have *only* one rt task active, it either runs on vcpu1 or on
vcpu2. Well, If the two vcpus have 70% pCPU bandwidth each, it is more
likely for them to be scheduled together, with the rt task not being
able to exploit the parallelism, to the point that it may (depending
from its own bandwidth demand) miss its deadlines. If I give 100% to
vcpu1 and 40% to vcpu2, it is less likely for them to be scheduled
together (or, if you want, they will be scheduled together for less
time), and more likely for the rt task alone to fulfill its timing
constraints.

I can try to add more details and numbers to the example, if required/if
someone is interested.

Forgive the long digression, I could not resist. :-D

All this being said. If this scheduler goes in Xen 4.5 (which I think it
should), I'm fine with, especially at the libxl level, where we can't
change things, "take it easy", and, for example, leave the per-vcpu
parameters support as a future development (perhaps for Xen 4.6). I'm
just saying that I think it is an important part, that I'm happy the
scheduling algorithm and the hypervisor interface supports it and that I
want it to be available, _at_some_point_. :-)

> > In any case, even without that in place right now, I think different
> > parameters for different vcpus is certainly something we want from an RT
> > scheduler.
> 
> Yeah, the "expose parameters to guests" was just thinking out loud
> about what would be useful in the future.  
>
Yep.

> I think possibly allowing a
> VM to change its period (while keeping the budget / period ratio the
> same) might make sense as well; that way you could have an RT
> "appliance" VM that you could just pop onto a system and let it
> configure itself, without the user having to do more than give it
> basic parameters.
> 
Indeed. Definitely interesting as a future work!

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-03 15:33   ` George Dunlap
  2014-09-03 20:52     ` Meng Xu
@ 2014-09-04 14:27     ` George Dunlap
  2014-09-04 14:45       ` Dario Faggioli
  2014-09-04 14:47       ` Meng Xu
  1 sibling, 2 replies; 72+ messages in thread
From: George Dunlap @ 2014-09-04 14:27 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
>
> While the domctl interface is not stable, the libxl interface *is*
> stable, so we definitely need to think carefully about what we want
> this to look like.
>
> Let me give that a think. :-)

OK, so we had a chat about this at our team meeting today, and here is
what we came up with.

The feature freeze for 4.5 is next Wednesday.

The core scheduler is in good enough shape to be checked in as an
"experimental" mode, so it would be really nice to be able to get this
checked in.

The DOMCTL interface isn't stable so we can change that if we need to;
however, the libxl interface *is* stable.

The current libxl scheduler parameter interface assumes one set of
parameters per domain; it's not yet setup for per-vcpu parameters.  It
is unlikely that we would be able to converge on a new interface by
next week.

So the suggestion was this: For the moment, use the existing libxl
interface on a per-domain basis.  Internally, this will set all vcpus
to the same values.  This will allow us to check in a useable version
of the scheduler for people to test and improve.  Then for 4.6 we can
start working on a suitable libxl interface for setting per-vcpu
scheduling parameters.

Dario / Ian, did I miss anything?

Meng / &c, does that sound reasonable?

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 14:27     ` George Dunlap
@ 2014-09-04 14:45       ` Dario Faggioli
  2014-09-04 14:47       ` Meng Xu
  1 sibling, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-04 14:45 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 854 bytes --]

On gio, 2014-09-04 at 15:27 +0100, George Dunlap wrote:
> So the suggestion was this: For the moment, use the existing libxl
> interface on a per-domain basis.  Internally, this will set all vcpus
> to the same values.  This will allow us to check in a useable version
> of the scheduler for people to test and improve.  Then for 4.6 we can
> start working on a suitable libxl interface for setting per-vcpu
> scheduling parameters.
> 
> Dario / Ian, did I miss anything?
> 
Not that I can spot.

And let me say, here too, that I am fully ok with this plan.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 14:27     ` George Dunlap
  2014-09-04 14:45       ` Dario Faggioli
@ 2014-09-04 14:47       ` Meng Xu
  2014-09-04 14:51         ` George Dunlap
  2014-09-04 15:25         ` Dario Faggioli
  1 sibling, 2 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-04 14:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2305 bytes --]

Hi George,


2014-09-04 10:27 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
> >
> > While the domctl interface is not stable, the libxl interface *is*
> > stable, so we definitely need to think carefully about what we want
> > this to look like.
> >
> > Let me give that a think. :-)
>
> OK, so we had a chat about this at our team meeting today, and here is
> what we came up with.
>
> The feature freeze for 4.5 is next Wednesday.
>
> The core scheduler is in good enough shape to be checked in as an
> "experimental" mode, so it would be really nice to be able to get this
> checked in.
>
> The DOMCTL interface isn't stable so we can change that if we need to;
> however, the libxl interface *is* stable.
>
> The current libxl scheduler parameter interface assumes one set of
> parameters per domain; it's not yet setup for per-vcpu parameters.  It
> is unlikely that we would be able to converge on a new interface by
> next week.
>
> So the suggestion was this: For the moment, use the existing libxl
> interface on a per-domain basis.  Internally, this will set all vcpus
> to the same values.  This will allow us to check in a useable version
> of the scheduler for people to test and improve.  Then for 4.6 we can
> start working on a suitable libxl interface for setting per-vcpu
> scheduling parameters.
>
> Dario / Ian, did I miss anything?
>
> Meng / &c, does that sound reasonable?
>

I have a question as to the user interface.
For 4.5, we only allow users to set all vcpus to the same values (I'm
totally fine with it.);
But how about the get function? When users issue the command "xl sched-rt",
how should we display the parameters of vcpus? We just give the "period",
"budget" and "#VCPU" for a domain? I'm fine with this display for 4.5.

However ,my concerns is: In 4.6, when we allow vcpus to have different
parameters and need to display every vcpu's parameters, how should we
display when users use command "xl sched-rt"? When vcpus have different
period and budget, we cannot display like what we did in 4.5 then. :-(

It's just my thought, just in case we neglect it. :-)

Meng



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 3810 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 14:47       ` Meng Xu
@ 2014-09-04 14:51         ` George Dunlap
  2014-09-04 15:07           ` Meng Xu
  2014-09-04 15:25         ` Dario Faggioli
  1 sibling, 1 reply; 72+ messages in thread
From: George Dunlap @ 2014-09-04 14:51 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Jackson, Ian Campbell, Sisu Xi, Stefano Stabellini,
	Dario Faggioli, Ian Jackson, xen-devel, Meng Xu, Jan Beulich,
	Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2743 bytes --]

On 09/04/2014 03:47 PM, Meng Xu wrote:
> Hi George,
>
>
> 2014-09-04 10:27 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com 
> <mailto:George.Dunlap@eu.citrix.com>>:
>
>     On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
>     <George.Dunlap@eu.citrix.com <mailto:George.Dunlap@eu.citrix.com>>
>     wrote:
>     >
>     > While the domctl interface is not stable, the libxl interface *is*
>     > stable, so we definitely need to think carefully about what we want
>     > this to look like.
>     >
>     > Let me give that a think. :-)
>
>     OK, so we had a chat about this at our team meeting today, and here is
>     what we came up with.
>
>     The feature freeze for 4.5 is next Wednesday.
>
>     The core scheduler is in good enough shape to be checked in as an
>     "experimental" mode, so it would be really nice to be able to get this
>     checked in.
>
>     The DOMCTL interface isn't stable so we can change that if we need to;
>     however, the libxl interface *is* stable.
>
>     The current libxl scheduler parameter interface assumes one set of
>     parameters per domain; it's not yet setup for per-vcpu parameters.  It
>     is unlikely that we would be able to converge on a new interface by
>     next week.
>
>     So the suggestion was this: For the moment, use the existing libxl
>     interface on a per-domain basis.  Internally, this will set all vcpus
>     to the same values.  This will allow us to check in a useable version
>     of the scheduler for people to test and improve.  Then for 4.6 we can
>     start working on a suitable libxl interface for setting per-vcpu
>     scheduling parameters.
>
>     Dario / Ian, did I miss anything?
>
>     Meng / &c, does that sound reasonable?
>
>
> I have a question as to the user interface.
> For 4.5, we only allow users to set all vcpus to the same values (I'm 
> totally fine with it.);
> But how about the get function? When users issue the command "xl 
> sched-rt", how should we display the parameters of vcpus? We just give 
> the "period", "budget" and "#VCPU" for a domain? I'm fine with this 
> display for 4.5.
>
> However ,my concerns is: In 4.6, when we allow vcpus to have different 
> parameters and need to display every vcpu's parameters, how should we 
> display when users use command "xl sched-rt"? When vcpus have 
> different period and budget, we cannot display like what we did in 4.5 
> then. :-(
>
> It's just my thought, just in case we neglect it. :-)

I think the xl interface doesn't have quite the same consistency 
guarantees as the libxl interface.  For now, I think just make it print 
one budget / period for the domain; and we can change it later.

I'll ping Ian Jackson to make sure he's OK with that.

  -George

[-- Attachment #1.2: Type: text/html, Size: 5913 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 14:51         ` George Dunlap
@ 2014-09-04 15:07           ` Meng Xu
  2014-09-04 15:44             ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-04 15:07 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Ian Campbell, Sisu Xi, Stefano Stabellini,
	Dario Faggioli, Ian Jackson, xen-devel, Meng Xu, Jan Beulich,
	Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 3081 bytes --]

2014-09-04 10:51 GMT-04:00 George Dunlap <george.dunlap@eu.citrix.com>:

>  On 09/04/2014 03:47 PM, Meng Xu wrote:
>
>  Hi George,
>
>
> 2014-09-04 10:27 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:
>
>> On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
>> <George.Dunlap@eu.citrix.com> wrote:
>> >
>> > While the domctl interface is not stable, the libxl interface *is*
>> > stable, so we definitely need to think carefully about what we want
>> > this to look like.
>> >
>> > Let me give that a think. :-)
>>
>>  OK, so we had a chat about this at our team meeting today, and here is
>> what we came up with.
>>
>> The feature freeze for 4.5 is next Wednesday.
>>
>> The core scheduler is in good enough shape to be checked in as an
>> "experimental" mode, so it would be really nice to be able to get this
>> checked in.
>>
>> The DOMCTL interface isn't stable so we can change that if we need to;
>> however, the libxl interface *is* stable.
>>
>> The current libxl scheduler parameter interface assumes one set of
>> parameters per domain; it's not yet setup for per-vcpu parameters.  It
>> is unlikely that we would be able to converge on a new interface by
>> next week.
>>
>> So the suggestion was this: For the moment, use the existing libxl
>> interface on a per-domain basis.  Internally, this will set all vcpus
>> to the same values.  This will allow us to check in a useable version
>> of the scheduler for people to test and improve.  Then for 4.6 we can
>> start working on a suitable libxl interface for setting per-vcpu
>> scheduling parameters.
>>
>> Dario / Ian, did I miss anything?
>>
>> Meng / &c, does that sound reasonable?
>>
>
>  I have a question as to the user interface.
>   For 4.5, we only allow users to set all vcpus to the same values (I'm
> totally fine with it.);
>  But how about the get function? When users issue the command "xl
> sched-rt", how should we display the parameters of vcpus? We just give the
> "period", "budget" and "#VCPU" for a domain? I'm fine with this display for
> 4.5.
>
>   However ,my concerns is: In 4.6, when we allow vcpus to have different
> parameters and need to display every vcpu's parameters, how should we
> display when users use command "xl sched-rt"? When vcpus have different
> period and budget, we cannot display like what we did in 4.5 then. :-(
>
>   It's just my thought, just in case we neglect it. :-)
>
>
> I think the xl interface doesn't have quite the same consistency
> guarantees as the libxl interface.  For now, I think just make it print one
> budget / period for the domain; and we can change it later.
>
>
​I see. I'm totally ​ok with the decision! :-)
​So I will only use the existing libxl interface without adding an array to
it, to set/get the vcpus' parameters of a domain. Am I right? (Just to
confirm I understand correctly and then I will go to implement it. :-) )

​Thanks,

Meng​


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 6590 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 14:47       ` Meng Xu
  2014-09-04 14:51         ` George Dunlap
@ 2014-09-04 15:25         ` Dario Faggioli
  1 sibling, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-04 15:25 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2465 bytes --]

On gio, 2014-09-04 at 10:47 -0400, Meng Xu wrote:

> 2014-09-04 10:27 GMT-04:00 George Dunlap
> <George.Dunlap@eu.citrix.com>:

>         
>         So the suggestion was this: For the moment, use the existing
>         libxl
>         interface on a per-domain basis.  Internally, this will set
>         all vcpus
>         to the same values.  This will allow us to check in a useable
>         version
>         of the scheduler for people to test and improve.  Then for 4.6
>         we can
>         start working on a suitable libxl interface for setting
>         per-vcpu
>         scheduling parameters.
> 

> I have a question as to the user interface.
> For 4.5, we only allow users to set all vcpus to the same values (I'm
> totally fine with it.); 
>
Right.

> But how about the get function? When users issue the command "xl
> sched-rt", how should we display the parameters of vcpus? We just give
> the "period", "budget" and "#VCPU" for a domain? I'm fine with this
> display for 4.5.
> 
xl builds on top of libxl. If, from libxl, setting and getting per-vcpu
values won't be possible, so it will be for xl.

I'd say printing just one set of params, the ones that applies to all
the vcpus of the domain is fine for 4.5. So, from xl, you'll get
something similar to this:

# xl sched-credit
Cpupool Pool-0: tslice=30ms ratelimit=1000us
Name                                ID Weight  Cap
Domain-0                             0    256    0

> However ,my concerns is: In 4.6, when we allow vcpus to have different
> parameters and need to display every vcpu's parameters, how should we
> display when users use command "xl sched-rt"? When vcpus have
> different period and budget, we cannot display like what we did in 4.5
> then. :-(
> 
It is the libxl API that has stability constraints, not xl sub-commands
output.

Of course, it's not very nice to turn something completely upside down.
But given the fact we're accepting the new scheduler as an experimental
feature, and the fact that `xl sched-rt' will be a new command being
introduced in 4.5, I don't think changing its output in 4.6 would be a
problem.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-04 14:27             ` Dario Faggioli
@ 2014-09-04 15:30               ` Meng Xu
  2014-09-05  9:36                 ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-04 15:30 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1720 bytes --]

2014-09-04 10:27 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> On mer, 2014-09-03 at 18:18 +0100, George Dunlap wrote:
> > On Wed, Sep 3, 2014 at 5:57 PM, Dario Faggioli
> > <dario.faggioli@citrix.com> wrote:
>
> > > For instance, I can put, in an SMP guest, two real-time applications
> > > with different timing requirements, and pin each one to a different
> > > (v)cpu (I mean pin *inside* the guest). At this point, I'd like for
> each
> > > vcpu to have a set of RT scheduling parameters, at the Xen level, that
> > > matches the timing requirements of what's running inside.
> > >
> > > This may not look so typical in a server/cloud environment, but can
> > > happen (at least in my experience) in a mobile/embedded env.
> >
> > But to play devil's advocate for a minute here:
> >
> Hehe, please, be my guest! :-D :-D
>
> > couldn't you just put
> > them in two different single-vcpu VMs then?
> >
>

​Well, let me give a simpler example:
Suppose we have three tasks in one VM, each task has period 4ms and budget
​6ms (its utilization is 2/3). If all these three tasks starts execution at
the same time, we can use two full-capacity vcpus (200% capacity cpu
resource) to schedule these three tasks.
However if you want to get two VMs, each of which has a full capacity vcpu
(100% capacity cpu), we cannot schedule these three tasks, because one
tasks cannot (well, at least very hard) migrate from one VM to another.

​This is just a simple example, we could of course have an example like
this but the vcpus are not full-capacity vcpus. :-)

Meng​



-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 2718 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 15:07           ` Meng Xu
@ 2014-09-04 15:44             ` Dario Faggioli
  2014-09-04 15:55               ` George Dunlap
  0 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-04 15:44 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chong Li, Ian Jackson, xen-devel, Meng Xu, Jan Beulich,
	Chao Wang, Ian Jackson, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 4832 bytes --]

On gio, 2014-09-04 at 11:07 -0400, Meng Xu wrote:

> 2014-09-04 10:51 GMT-04:00 George Dunlap
> <george.dunlap@eu.citrix.com>:
>         On 09/04/2014 03:47 PM, Meng Xu wrote:
>         
>         > Hi George,
>         > 
>         > 
>         > 2014-09-04 10:27 GMT-04:00 George Dunlap
>         > <George.Dunlap@eu.citrix.com>:
>         >         On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
>         >         <George.Dunlap@eu.citrix.com> wrote:
>         >         >
>         >         > While the domctl interface is not stable, the
>         >         libxl interface *is*
>         >         > stable, so we definitely need to think carefully
>         >         about what we want
>         >         > this to look like.
>         >         >
>         >         > Let me give that a think. :-)
>         >         
>         >         
>         >         OK, so we had a chat about this at our team meeting
>         >         today, and here is
>         >         what we came up with.
>         >         
>         >         The feature freeze for 4.5 is next Wednesday.
>         >         
>         >         The core scheduler is in good enough shape to be
>         >         checked in as an
>         >         "experimental" mode, so it would be really nice to
>         >         be able to get this
>         >         checked in.
>         >         
>         >         The DOMCTL interface isn't stable so we can change
>         >         that if we need to;
>         >         however, the libxl interface *is* stable.
>         >         
>         >         The current libxl scheduler parameter interface
>         >         assumes one set of
>         >         parameters per domain; it's not yet setup for
>         >         per-vcpu parameters.  It
>         >         is unlikely that we would be able to converge on a
>         >         new interface by
>         >         next week.
>         >         
>         >         So the suggestion was this: For the moment, use the
>         >         existing libxl
>         >         interface on a per-domain basis.  Internally, this
>         >         will set all vcpus
>         >         to the same values.  This will allow us to check in
>         >         a useable version
>         >         of the scheduler for people to test and improve.
>         >         Then for 4.6 we can
>         >         start working on a suitable libxl interface for
>         >         setting per-vcpu
>         >         scheduling parameters.
>         >         
>         >         Dario / Ian, did I miss anything?
>         >         
>         >         Meng / &c, does that sound reasonable?
>         > 
>         > 
>         > I have a question as to the user interface.
>         > For 4.5, we only allow users to set all vcpus to the same
>         > values (I'm totally fine with it.); 
>         > But how about the get function? When users issue the command
>         > "xl sched-rt", how should we display the parameters of
>         > vcpus? We just give the "period", "budget" and "#VCPU" for a
>         > domain? I'm fine with this display for 4.5.
>         > 
>         > 
>         > However ,my concerns is: In 4.6, when we allow vcpus to have
>         > different parameters and need to display every vcpu's
>         > parameters, how should we display when users use command "xl
>         > sched-rt"? When vcpus have different period and budget, we
>         > cannot display like what we did in 4.5 then. :-(
>         > 
>         > 
>         > It's just my thought, just in case we neglect it. :-)
>         
>         
>         I think the xl interface doesn't have quite the same
>         consistency guarantees as the libxl interface.  For now, I
>         think just make it print one budget / period for the domain;
>         and we can change it later.
>         
>         
> 
> 
> ​I see. I'm totally ​ok with the decision! :-)
> ​So I will only use the existing libxl interface without adding an
> array to it, to set/get the vcpus' parameters of a domain. Am I right?
>
Yep, no array. You just add a 'period' and a 'budget' fields _inside_
libxl_domain_sched_params, without putting them inside any wrapping
struct, union or array.

Then, in the implementation, you just take those twos, an apply them to
all the domain's vcpus (by using the array based interface we agreed
upon for the hypervisor).

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 15:44             ` Dario Faggioli
@ 2014-09-04 15:55               ` George Dunlap
  2014-09-04 16:12                 ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: George Dunlap @ 2014-09-04 15:55 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	Ian Jackson, xen-devel, Meng Xu, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

On Thu, Sep 4, 2014 at 4:44 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On gio, 2014-09-04 at 11:07 -0400, Meng Xu wrote:
>
>> 2014-09-04 10:51 GMT-04:00 George Dunlap
>> <george.dunlap@eu.citrix.com>:
>>         On 09/04/2014 03:47 PM, Meng Xu wrote:
>>
>>         > Hi George,
>>         >
>>         >
>>         > 2014-09-04 10:27 GMT-04:00 George Dunlap
>>         > <George.Dunlap@eu.citrix.com>:
>>         >         On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
>>         >         <George.Dunlap@eu.citrix.com> wrote:
>>         >         >
>>         >         > While the domctl interface is not stable, the
>>         >         libxl interface *is*
>>         >         > stable, so we definitely need to think carefully
>>         >         about what we want
>>         >         > this to look like.
>>         >         >
>>         >         > Let me give that a think. :-)
>>         >
>>         >
>>         >         OK, so we had a chat about this at our team meeting
>>         >         today, and here is
>>         >         what we came up with.
>>         >
>>         >         The feature freeze for 4.5 is next Wednesday.
>>         >
>>         >         The core scheduler is in good enough shape to be
>>         >         checked in as an
>>         >         "experimental" mode, so it would be really nice to
>>         >         be able to get this
>>         >         checked in.
>>         >
>>         >         The DOMCTL interface isn't stable so we can change
>>         >         that if we need to;
>>         >         however, the libxl interface *is* stable.
>>         >
>>         >         The current libxl scheduler parameter interface
>>         >         assumes one set of
>>         >         parameters per domain; it's not yet setup for
>>         >         per-vcpu parameters.  It
>>         >         is unlikely that we would be able to converge on a
>>         >         new interface by
>>         >         next week.
>>         >
>>         >         So the suggestion was this: For the moment, use the
>>         >         existing libxl
>>         >         interface on a per-domain basis.  Internally, this
>>         >         will set all vcpus
>>         >         to the same values.  This will allow us to check in
>>         >         a useable version
>>         >         of the scheduler for people to test and improve.
>>         >         Then for 4.6 we can
>>         >         start working on a suitable libxl interface for
>>         >         setting per-vcpu
>>         >         scheduling parameters.
>>         >
>>         >         Dario / Ian, did I miss anything?
>>         >
>>         >         Meng / &c, does that sound reasonable?
>>         >
>>         >
>>         > I have a question as to the user interface.
>>         > For 4.5, we only allow users to set all vcpus to the same
>>         > values (I'm totally fine with it.);
>>         > But how about the get function? When users issue the command
>>         > "xl sched-rt", how should we display the parameters of
>>         > vcpus? We just give the "period", "budget" and "#VCPU" for a
>>         > domain? I'm fine with this display for 4.5.
>>         >
>>         >
>>         > However ,my concerns is: In 4.6, when we allow vcpus to have
>>         > different parameters and need to display every vcpu's
>>         > parameters, how should we display when users use command "xl
>>         > sched-rt"? When vcpus have different period and budget, we
>>         > cannot display like what we did in 4.5 then. :-(
>>         >
>>         >
>>         > It's just my thought, just in case we neglect it. :-)
>>
>>
>>         I think the xl interface doesn't have quite the same
>>         consistency guarantees as the libxl interface.  For now, I
>>         think just make it print one budget / period for the domain;
>>         and we can change it later.
>>
>>
>>
>>
>> I see. I'm totally ok with the decision! :-)
>> So I will only use the existing libxl interface without adding an
>> array to it, to set/get the vcpus' parameters of a domain. Am I right?
>>
> Yep, no array. You just add a 'period' and a 'budget' fields _inside_
> libxl_domain_sched_params, without putting them inside any wrapping
> struct, union or array.

Except that you don't need to add a "period" field, since there's
already one there (for the SEDF scheduler).

We could re-use the "slice" field instead of adding "budget", but I
think probably for clarity adding "budget" is better (although I'm
open to other opinions on that one).

 -George

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 15:55               ` George Dunlap
@ 2014-09-04 16:12                 ` Meng Xu
  2014-09-05  9:19                   ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-04 16:12 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Ian Campbell, Sisu Xi, Stefano Stabellini,
	Chenyang Lu, Dario Faggioli, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 5320 bytes --]

Hi George and Dario,


2014-09-04 11:55 GMT-04:00 George Dunlap <George.Dunlap@eu.citrix.com>:

> On Thu, Sep 4, 2014 at 4:44 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > On gio, 2014-09-04 at 11:07 -0400, Meng Xu wrote:
> >
> >> 2014-09-04 10:51 GMT-04:00 George Dunlap
> >> <george.dunlap@eu.citrix.com>:
> >>         On 09/04/2014 03:47 PM, Meng Xu wrote:
> >>
> >>         > Hi George,
> >>         >
> >>         >
> >>         > 2014-09-04 10:27 GMT-04:00 George Dunlap
> >>         > <George.Dunlap@eu.citrix.com>:
> >>         >         On Wed, Sep 3, 2014 at 4:33 PM, George Dunlap
> >>         >         <George.Dunlap@eu.citrix.com> wrote:
> >>         >         >
> >>         >         > While the domctl interface is not stable, the
> >>         >         libxl interface *is*
> >>         >         > stable, so we definitely need to think carefully
> >>         >         about what we want
> >>         >         > this to look like.
> >>         >         >
> >>         >         > Let me give that a think. :-)
> >>         >
> >>         >
> >>         >         OK, so we had a chat about this at our team meeting
> >>         >         today, and here is
> >>         >         what we came up with.
> >>         >
> >>         >         The feature freeze for 4.5 is next Wednesday.
> >>         >
> >>         >         The core scheduler is in good enough shape to be
> >>         >         checked in as an
> >>         >         "experimental" mode, so it would be really nice to
> >>         >         be able to get this
> >>         >         checked in.
> >>         >
> >>         >         The DOMCTL interface isn't stable so we can change
> >>         >         that if we need to;
> >>         >         however, the libxl interface *is* stable.
> >>         >
> >>         >         The current libxl scheduler parameter interface
> >>         >         assumes one set of
> >>         >         parameters per domain; it's not yet setup for
> >>         >         per-vcpu parameters.  It
> >>         >         is unlikely that we would be able to converge on a
> >>         >         new interface by
> >>         >         next week.
> >>         >
> >>         >         So the suggestion was this: For the moment, use the
> >>         >         existing libxl
> >>         >         interface on a per-domain basis.  Internally, this
> >>         >         will set all vcpus
> >>         >         to the same values.  This will allow us to check in
> >>         >         a useable version
> >>         >         of the scheduler for people to test and improve.
> >>         >         Then for 4.6 we can
> >>         >         start working on a suitable libxl interface for
> >>         >         setting per-vcpu
> >>         >         scheduling parameters.
> >>         >
> >>         >         Dario / Ian, did I miss anything?
> >>         >
> >>         >         Meng / &c, does that sound reasonable?
> >>         >
> >>         >
> >>         > I have a question as to the user interface.
> >>         > For 4.5, we only allow users to set all vcpus to the same
> >>         > values (I'm totally fine with it.);
> >>         > But how about the get function? When users issue the command
> >>         > "xl sched-rt", how should we display the parameters of
> >>         > vcpus? We just give the "period", "budget" and "#VCPU" for a
> >>         > domain? I'm fine with this display for 4.5.
> >>         >
> >>         >
> >>         > However ,my concerns is: In 4.6, when we allow vcpus to have
> >>         > different parameters and need to display every vcpu's
> >>         > parameters, how should we display when users use command "xl
> >>         > sched-rt"? When vcpus have different period and budget, we
> >>         > cannot display like what we did in 4.5 then. :-(
> >>         >
> >>         >
> >>         > It's just my thought, just in case we neglect it. :-)
> >>
> >>
> >>         I think the xl interface doesn't have quite the same
> >>         consistency guarantees as the libxl interface.  For now, I
> >>         think just make it print one budget / period for the domain;
> >>         and we can change it later.
> >>
> >>
> >>
> >>
> >> I see. I'm totally ok with the decision! :-)
> >> So I will only use the existing libxl interface without adding an
> >> array to it, to set/get the vcpus' parameters of a domain. Am I right?
> >>
> > Yep, no array. You just add a 'period' and a 'budget' fields _inside_
> > libxl_domain_sched_params, without putting them inside any wrapping
> > struct, union or array.
>
> Except that you don't need to add a "period" field, since there's
> already one there (for the SEDF scheduler).
>
> We could re-use the "slice" field instead of adding "budget", but I
> think probably for clarity adding "budget" is better (although I'm
> open to other opinions on that one).
>

​I like the idea of adding "budget" because it's much clearer. ​

​As to other implementation details, I think I got it and will do that now.
:-)​

​Thanks,

Meng​

-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 8095 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-04 16:12                 ` Meng Xu
@ 2014-09-05  9:19                   ` Dario Faggioli
  0 siblings, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05  9:19 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, Ian Jackson, xen-devel,
	Linh Thi Xuan Phan, Meng Xu, Jan Beulich, Chao Wang, Chong Li,
	Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1454 bytes --]

On gio, 2014-09-04 at 12:12 -0400, Meng Xu wrote:

> 2014-09-04 11:55 GMT-04:00 George Dunlap 
>         
>         >> I see. I'm totally ok with the decision! :-)
>         >> So I will only use the existing libxl interface without
>         adding an
>         >> array to it, to set/get the vcpus' parameters of a domain.
>         Am I right?
>         >>
>         > Yep, no array. You just add a 'period' and a 'budget' fields
>         _inside_
>         > libxl_domain_sched_params, without putting them inside any
>         wrapping
>         > struct, union or array.
>         
>         
>         Except that you don't need to add a "period" field, since
>         there's
>         already one there (for the SEDF scheduler).
>         
Right, I overlooked that, sorry. :-P

>         We could re-use the "slice" field instead of adding "budget",
>         but I
>         think probably for clarity adding "budget" is better (although
>         I'm
>         open to other opinions on that one).
> 
> 
> ​I like the idea of adding "budget" because it's much clearer. ​
> 
I'm fine with adding a "budget" parameter too.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-04 15:30               ` Meng Xu
@ 2014-09-05  9:36                 ` Dario Faggioli
  2014-09-05 15:06                   ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05  9:36 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 3225 bytes --]

On gio, 2014-09-04 at 11:30 -0400, Meng Xu wrote:
> 
> 
> 2014-09-04 10:27 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

>         > > For instance, I can put, in an SMP guest, two real-time
>         applications
>         > > with different timing requirements, and pin each one to a
>         different
>         > > (v)cpu (I mean pin *inside* the guest). At this point, I'd
>         like for each
>         > > vcpu to have a set of RT scheduling parameters, at the Xen
>         level, that
>         > > matches the timing requirements of what's running inside.
>         > >
>         > > This may not look so typical in a server/cloud
>         environment, but can
>         > > happen (at least in my experience) in a mobile/embedded
>         env.
>         >
>         > But to play devil's advocate for a minute here:
>         >
>         
>         Hehe, please, be my guest! :-D :-D
>         
>         > couldn't you just put
>         > them in two different single-vcpu VMs then?
>         >
>         
> 
> 
> ​Well, let me give a simpler example:
> Suppose we have three tasks in one VM, each task has period 4ms and
> budget ​6ms (its utilization is 2/3).
>
You mean budget=4ms and period=6ms, don't you? :-)

>  If all these three tasks starts execution at the same time, we can
> use two full-capacity vcpus (200% capacity cpu resource) to schedule
> these three tasks. 
> However if you want to get two VMs, each of which has a full capacity
> vcpu (100% capacity cpu), we cannot schedule these three tasks,
> because one tasks cannot (well, at least very hard) migrate from one
> VM to another. 
> 
But... In this case, in the former configuration (1 VM with 2 vcpus),
each vcpu would (or at least can) have the same bandwidth of 100%, i.e.,
the same parameters... or am I missing something?

What we're tying to assess here, is the usefulness of the possibility of
setting _different_ parameters (and hence different pcpu bandwidth) for
each vcpu.

Also, it looks like you're assuming to have a real-time scheduler inside
the VM, which may or may not be the case.

> ​This is just a simple example, we could of course have an example
> like this but the vcpus are not full-capacity vcpus. :-)
> 
Yeah, well, perhaps it's a bit too simple. :-D

Don't get me wrong, I continue thinking per-vcpu params is something we
really want, it's just the example I'm not sure I'm getting/liking.

I still think the example of multiple, concurrent and strictly related
activities having different timing requirements to be a really sensible
one. In fact, in that case, especially if one does not have a real-time
scheduler inside the guest, mapping those requirements on the Xen
scheduler is the easier (only?) way to port the app from baremetal to
virtual machine!

Regards,
Dario

PS. BTW, Meng, can you use plain text emails when sending to the list?

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 4/4] xl: introduce rt scheduler
  2014-09-03 22:28     ` Meng Xu
@ 2014-09-05  9:40       ` Dario Faggioli
  2014-09-05 14:43         ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05  9:40 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1635 bytes --]

On mer, 2014-09-03 at 18:28 -0400, Meng Xu wrote:

> 2014-09-03 11:52 GMT-04:00 George Dunlap
> <George.Dunlap@eu.citrix.com>:

>         The main thing about this is that you probably want to have an
>         interface where you can set the parameters for multiple vcpus
>         at once;
>         and that's probably the case regardless of whether we end up
>         making
>         the hypercall able to batch vcpus or do them one-at-a-time.
> 
> 
> ​I'm totally fine with allowing users to set the parameters for
> multiple vcpus at once. 
> 
> 
> My question is:
> When users set multiple vcpus at once, how the command looks like? 
> For example, if they want to set two vcpus' parameters for dom 1's at
> once (to be specific, set vcpu 0's period to 20 and budget to 10, and
> set vcpu 1's period to 30 and budget to 20.),  they should use the
> command like "xl sched-rt -d 1 -v 0 -p 20 -b 10 -v 1 -p 30 -b 20"? 
>
So, since we agreed about only allowing, *for* *now*, per-domain
parameter setting, this is not a requirement anymore, is it?

I think we can just stick to a pretty basic implementation of `xl
sched-rt', acting similarly to its credit, credit2 (and sedf)
counterparts.

How to allow an user to set the parameters for multiple vcpus at the
same time from xl has become a Xen 4.6 issue. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-03 13:40   ` George Dunlap
  2014-09-03 14:11     ` Meng Xu
@ 2014-09-05  9:46     ` Dario Faggioli
  1 sibling, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05  9:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Ian Jackson,
	xen-devel, Meng Xu, Meng Xu, Jan Beulich, chaowang, Chong Li,
	dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 2449 bytes --]

On mer, 2014-09-03 at 14:40 +0100, George Dunlap wrote:
> On Sun, Aug 24, 2014 at 11:58 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> > This scheduler follows the pre-emptive Global EDF theory in real-time field.
> > Each VCPU can have a dedicated period and budget.
> > While scheduled, a VCPU burns its budget.
> > A VCPU has its budget replenished at the beginning of each of its periods;
> > The VCPU discards its unused budget at the end of each of its periods.
> > If a VCPU runs out of budget in a period, it has to wait until next period.
> > The mechanism of how to burn a VCPU's budget depends on the server mechanism
> > implemented for each VCPU.
> >
> > Server mechanism: a VCPU is implemented as a deferable server.
> > When a VCPU is scheduled to execute on a PCPU, its budget is continuously
> > burned.
> >
> > Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> > At any scheduling point, the VCPU with earliest deadline has highest
> > priority.
> >
> > Queue scheme: A global Runqueue for each CPU pool.
> > The Runqueue holds all runnable VCPUs.
> > VCPUs in the Runqueue are divided into two parts: with and without budget.
> > At each part, VCPUs are sorted based on gEDF priority scheme.
> >
> > Scheduling quantum: 1 ms;
> >
> > Note: cpumask and cpupool is supported.
> >
> > This is still in the development phase.
> 
> You should probably take this out now that you've removed the RFC. :-)
> 
Should him?

I mean, AFAIUI, we are accepting (well, at least, that's the plan :-) )
the scheduler in an 'experimental' status.

When we merged ARM support for the first time, I remember there were
some similar claims around the code and docs (it was API/ABI/something
stability at the time, I think), and the same, IIRC, for PVH.

Credit2 still warns his users like this, on boot:

    printk("Initializing Credit2 scheduler\n" \
           " WARNING: This is experimental software in development.\n" \
           " Use at your own risk.\n");

So, probably "still in the development phase" may be a bit strong, but
I'd be inclined have something like that somewhere.

Thoughts?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-08-24 22:58 ` [PATCH v1 3/4] libxl: " Meng Xu
  2014-08-25 13:17   ` Wei Liu
  2014-09-03 15:33   ` George Dunlap
@ 2014-09-05 10:21   ` Dario Faggioli
  2014-09-05 15:45     ` Meng Xu
  2 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05 10:21 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	ian.jackson, xen-devel, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 7474 bytes --]

On dom, 2014-08-24 at 18:58 -0400, Meng Xu wrote:
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
>      return 0;
>  }
>  
> +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
> +                               libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rt_params* sdom;
> +    uint16_t num_vcpus;
> +    int rc, i;
> +
> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting num_vcpus of domain sched rt");
> +        return ERROR_FAIL;
> +    }
>
As George pointed out already, you can get the num_vcpus via
xc_domain_getinfo().

I agree with George's review about the rest of this function
(appropriately updated to take what we decided about the interface into
account :-) ).

> +#define SCHED_RT_VCPU_PERIOD_MAX    31536000000000 /* one year in microsecond*/
> +#define SCHED_RT_VCPU_BUDGET_MAX    SCHED_RT_VCPU_PERIOD_MAX
> +
This may be me not remembering correctly the outcome of a preceding
discussion... did we say we were going with _RT_ or with something like
_RTDS_ ?

ISTR the latter...

Also, macros like INT_MAX, UINT_MAX, etc., or << and ~ "tricks" are,
IMO, preferrable to the open coding of the value.

Finally, I wonder whether these would better live in some headers,
closer to the declaration of period and budget (where their type is also
visible) and, as a nice side effect of that, available to libxl callers
as well.

> +/*
> + * Sanity check of the scinfo parameters
> + * return 0 if all values are valid
> + * return 1 if one param is default value
> + * return 2 if the target vcpu's index, period or budget is out of range
> + */
> +static int sched_rt_domain_set_validate_params(libxl__gc *gc,
> +                                               const libxl_domain_sched_params *scinfo,
> +                                               const uint16_t num_vcpus)
> +{
> +    int vcpu_index = scinfo->rt.vcpu_index;
> +
As per the low level interface (Xen and libxc) there should be no need
for any vcpu_index anymore, right?

I'm just double checking, as the discussion was --as it should, on these
things-- long and involved :-D

> +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
> +    {
> +        LOG(ERROR, "VCPU index is not set or out of range, "
> +                    "valid values are within range from 0 to %d", num_vcpus);
> +        return 2;
> +    }
> +
> +    if (scinfo->rt.period < 1 ||
> +        scinfo->rt.period > SCHED_RT_VCPU_PERIOD_MAX)
> +    {
> +        LOG(ERROR, "VCPU period is not set or out of range, "
> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PERIOD_MAX);
> +        return 2;
> +    }
> +
> +    if (scinfo->rt.budget < 1 ||
> +        scinfo->rt.budget > SCHED_RT_VCPU_BUDGET_MAX)
> +    {
> +        LOG(ERROR, "VCPU budget is not set or out of range, "
> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_BUDGET_MAX);
> +        return 2;
> +    }
> +
I think some basics sanity checking are fine done here. E.g., you may
also add a (budget <= period) check. However, as George said already:
 - make sure you limit these to the kind of checks based on info the 
   toolstack should have, and defer the rest to the hypervisor
 - if something goes wrong, make sure to always return libxl error codes
   (typically, ERROR_INVAL), as sched_credit_domain_set() does.

Oh and, just double checking again, I think we decided to handle
(budget ==0 && period == 0) in a special way, so remember that! :-P

It's personal taste, I guess, but I think you don't really need an
helper function for this, and it can leave in sched_rt_domain_set.

> +static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid,
> +                               const libxl_domain_sched_params *scinfo)
> +{
> +    struct xen_domctl_sched_rt_params sdom;
> +    uint16_t num_vcpus;
> +    int rc;
> + 
> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> +    if (rc != 0) {
> +        LOGE(ERROR, "getting domain sched rt");
> +        return ERROR_FAIL;
> +    }
> +    
> +    rc = sched_rt_domain_set_validate_params(gc, scinfo, num_vcpus);
> +    if (rc == 2)
> +        return ERROR_INVAL;
> +    if (rc == 1)
> +        return 0;
> +    if (rc == 0)
> +    {
> +        sdom.index = scinfo->rt.vcpu_index;
> +        sdom.period = scinfo->rt.period;
> +        sdom.budget = scinfo->rt.budget;
> +    }
> +
So, I see that you are actually returning libxl error codes, good. Well,
again, just put the code above directly here, instead of having to
define and then interpret an ad-hoc error handling logic. :-)

> @@ -5177,6 +5310,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
>      case LIBXL_SCHEDULER_ARINC653:
>          ret=sched_arinc653_domain_set(gc, domid, scinfo);
>          break;
> +    case LIBXL_SCHEDULER_RT:
> +        ret=sched_rt_domain_set(gc, domid, scinfo);
> +        break;
Again, I seriously think I remember we agreed upon _SCHEDULER_RTDS (or
_RT_DS) as thee name of this thing. Am I wrong?

> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -153,6 +153,7 @@ libxl_scheduler = Enumeration("scheduler", [
>      (5, "credit"),
>      (6, "credit2"),
>      (7, "arinc653"),
> +    (8, "rt"),
rtds? rt_ds?

> @@ -303,6 +304,19 @@ libxl_domain_restore_params = Struct("domain_restore_params", [
>      ("checkpointed_stream", integer),
>      ])
>  
> +libxl_rt_vcpu = Struct("vcpu",[
> +    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
> +    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> +    ("index",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
> +    ])
> +
> +libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[
> +    ("vcpus",        Array(libxl_rt_vcpu, "num_vcpus")),
> +    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
> +    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> +    ("vcpu_index",   integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
> +    ])
> +
>  libxl_domain_sched_params = Struct("domain_sched_params",[
>      ("sched",        libxl_scheduler),
>      ("weight",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
> @@ -311,6 +325,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
>      ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
>      ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
>      ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
> +    ("rt",           libxl_domain_sched_rt_params),
>      ])
And about this part, we discussed and agreed already in the other
thread. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 2/4] libxc: add rt scheduler
  2014-08-24 22:58 ` [PATCH v1 2/4] libxc: add rt scheduler Meng Xu
@ 2014-09-05 10:34   ` Dario Faggioli
  2014-09-05 17:17     ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05 10:34 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	ian.jackson, xen-devel, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 3333 bytes --]

On dom, 2014-08-24 at 18:58 -0400, Meng Xu wrote:

> --- /dev/null
> +++ b/tools/libxc/xc_rt.c
> @@ -0,0 +1,90 @@
> +/****************************************************************************
> + *
> + *        File: xc_rt.c
> + *      Author: Sisu Xi 
> + *              Meng Xu
> + *
> + * Description: XC Interface to the rt scheduler
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation;
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
> + */
> +
> +#include "xc_private.h"
> +
> +int xc_sched_rt_domain_set(xc_interface *xch,
> +                           uint32_t domid,
> +                           struct xen_domctl_sched_rt_params *sdom)
> +{
> +    int rc;
> +    DECLARE_DOMCTL;
> +
> +    domctl.cmd = XEN_DOMCTL_scheduler_op;
> +    domctl.domain = (domid_t) domid;
> +    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT_DS;
>
Aha! So, I was not dreaming about the whole RT_DS thing! :-D

Perhaps then had that discussion about the low level side of things
then. Well, I think that, if the name of the scheduler is RT_DS, it
should be that for libxl and xl as well.

After all, the reason why we're calling RT_DS in Xen, is that we want to
be able to add others RT_FOO, RT_BAR algorithm/schedulers, in future. If
that will happen, we'll need a way to reference them from the higher
layer of the toolstack as well. When we'll have RT_DS and RT_CBS in Xen,
and RT in libxl, to which one the RT in libxl will refer?

So, just push the RT_DS thing all the way up to libxl and xl. As we said
for the Xen part, you can keep the source filenames _rt.c, but functions
and defines needs to be specific.

So, for instance, this file can continue being xc_rt.c, but this
function needs to be called xc_sched_rtds_domain_set()
(or .._rt_ds_domain_..).

> +    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo;
> +    domctl.u.scheduler_op.u.rt.vcpu_index = sdom->index;
> +    domctl.u.scheduler_op.u.rt.period = sdom->period;
> +    domctl.u.scheduler_op.u.rt.budget = sdom->budget;
> +
> +    rc = do_domctl(xch, &domctl);
> +
> +    return rc;
> +}
> +
These functions are going to change quite a bit in next version, due to
the interface changes we agreed upon. It'd then be quite pointless to
put much more comments, but, overall, the hcall wrapping, the bouncing
logic, and everything I can see here looks fine to me.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 4/4] xl: introduce rt scheduler
  2014-09-05  9:40       ` Dario Faggioli
@ 2014-09-05 14:43         ` Meng Xu
  0 siblings, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-05 14:43 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 2460 bytes --]

Hi Dario,


2014-09-05 5:40 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> On mer, 2014-09-03 at 18:28 -0400, Meng Xu wrote:
>
> > 2014-09-03 11:52 GMT-04:00 George Dunlap
> > <George.Dunlap@eu.citrix.com>:
>
> >         The main thing about this is that you probably want to have an
> >         interface where you can set the parameters for multiple vcpus
> >         at once;
> >         and that's probably the case regardless of whether we end up
> >         making
> >         the hypercall able to batch vcpus or do them one-at-a-time.
> >
> >
> > ​I'm totally fine with allowing users to set the parameters for
> > multiple vcpus at once.
> >
> >
> > My question is:
> > When users set multiple vcpus at once, how the command looks like?
> > For example, if they want to set two vcpus' parameters for dom 1's at
> > once (to be specific, set vcpu 0's period to 20 and budget to 10, and
> > set vcpu 1's period to 30 and budget to 20.),  they should use the
> > command like "xl sched-rt -d 1 -v 0 -p 20 -b 10 -v 1 -p 30 -b 20"?
> >
> So, since we agreed about only allowing, *for* *now*, per-domain
> parameter setting, this is not a requirement anymore, is it?
>
>
​Right! ​I did some happy coding yesterday and removed the "array" parts
that is required to set/get each vcpu's parameters and should be able to
send a new patch today or tomorrow.



> I think we can just stick to a pretty basic implementation of `xl
> sched-rt', acting similarly to its credit, credit2 (and sedf)
> counterparts.
>
>
​Right! Now the 'xl sched-rt' result is:
/*Note: Domain-0 has four vcpus, each vcpu has period 20000us and budget
8000us*/

# xl sched-rt

Cpupool Pool-0: sched=EDF

Name                                ID    Period    Budget

Domain-0                             0     20000      8000


We can change domain's parameter as follows:

#xl sched-rt -d Domain-0 -p 40000 -b 20000

/*This will set each vcpu of domain 0's to have period 40000us and budget
20000us*/

​Is this ok? Or maybe we can discuss this after I send out the simplified
version?​



> How to allow an user to set the parameters for multiple vcpus at the
> same time from xl has become a Xen 4.6 issue. :-)
>

​Sure! That make sense and makes this not urgent. :-P​

​Meng​



-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 4329 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-05  9:36                 ` Dario Faggioli
@ 2014-09-05 15:06                   ` Meng Xu
  2014-09-05 15:09                     ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-05 15:06 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

Hi Dario,

2014-09-05 5:36 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On gio, 2014-09-04 at 11:30 -0400, Meng Xu wrote:
>>
>>
>> 2014-09-04 10:27 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
>
>>         > > For instance, I can put, in an SMP guest, two real-time
>>         applications
>>         > > with different timing requirements, and pin each one to a
>>         different
>>         > > (v)cpu (I mean pin *inside* the guest). At this point, I'd
>>         like for each
>>         > > vcpu to have a set of RT scheduling parameters, at the Xen
>>         level, that
>>         > > matches the timing requirements of what's running inside.
>>         > >
>>         > > This may not look so typical in a server/cloud
>>         environment, but can
>>         > > happen (at least in my experience) in a mobile/embedded
>>         env.
>>         >
>>         > But to play devil's advocate for a minute here:
>>         >
>>
>>         Hehe, please, be my guest! :-D :-D
>>
>>         > couldn't you just put
>>         > them in two different single-vcpu VMs then?
>>         >
>>
>>
>>
>> Well, let me give a simpler example:
>> Suppose we have three tasks in one VM, each task has period 4ms and
>> budget 6ms (its utilization is 2/3).
>>
> You mean budget=4ms and period=6ms, don't you? :-)

Right. My mistake. Thank you for the correct! :-)

>
>>  If all these three tasks starts execution at the same time, we can
>> use two full-capacity vcpus (200% capacity cpu resource) to schedule
>> these three tasks.
>> However if you want to get two VMs, each of which has a full capacity
>> vcpu (100% capacity cpu), we cannot schedule these three tasks,
>> because one tasks cannot (well, at least very hard) migrate from one
>> VM to another.
>>
> But... In this case, in the former configuration (1 VM with 2 vcpus),
> each vcpu would (or at least can) have the same bandwidth of 100%, i.e.,
> the same parameters... or am I missing something?
>
> What we're tying to assess here, is the usefulness of the possibility of
> setting _different_ parameters (and hence different pcpu bandwidth) for
> each vcpu.

I see. I tried to use the simple example to assess why it is not
always a good idea to spread programs to several VMs with one vcpu.
(This is also the reason why global scheduling is better than
partitioned scheduling in many cases.) The simple example I made is
not a good one to show the usefulness of the possibility of
setting_different_parameters for each vcpu. The example you raised is
the good one. :-)

>
> Also, it looks like you're assuming to have a real-time scheduler inside
> the VM, which may or may not be the case.
>
>> This is just a simple example, we could of course have an example
>> like this but the vcpus are not full-capacity vcpus. :-)
>>
> Yeah, well, perhaps it's a bit too simple. :-D
>
> Don't get me wrong, I continue thinking per-vcpu params is something we
> really want, it's just the example I'm not sure I'm getting/liking.

Sorry for the confusion. My example aims for a different goal as I
explained above. :-P

>
> I still think the example of multiple, concurrent and strictly related
> activities having different timing requirements to be a really sensible
> one. In fact, in that case, especially if one does not have a real-time
> scheduler inside the guest, mapping those requirements on the Xen
> scheduler is the easier (only?) way to port the app from baremetal to
> virtual machine!

Right! I think this may be the easiest one, if they don't have a
real-time scheduler inside guest domains.

>
> PS. BTW, Meng, can you use plain text emails when sending to the list?

Ah. It seems that I have used other format emails for a long time and
it must "torture" you for a long time. I'm really sorry for that since
it is a rule for the mailing list. :-( Thank you very much for letting
me know.
If this one is not the plain text email, please let me know. (I
checked it's plain text by sending email to myself, but just in case.
:-) )

Thank you again for your advice!

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-05 15:06                   ` Meng Xu
@ 2014-09-05 15:09                     ` Dario Faggioli
  0 siblings, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05 15:09 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 961 bytes --]

On ven, 2014-09-05 at 11:06 -0400, Meng Xu wrote:

> >> 2014-09-04 10:27 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> > PS. BTW, Meng, can you use plain text emails when sending to the list?
> 
> Ah. It seems that I have used other format emails for a long time and
> it must "torture" you for a long time. I'm really sorry for that since
> it is a rule for the mailing list. :-( Thank you very much for letting
> me know.
>
No big deal, since you seem to have got it now.

> If this one is not the plain text email, please let me know. (I
> checked it's plain text by sending email to myself, but just in case.
> :-) )
> 
This one is perfect. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-05 10:21   ` Dario Faggioli
@ 2014-09-05 15:45     ` Meng Xu
  2014-09-05 17:41       ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-05 15:45 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

Hi Dario,

2014-09-05 6:21 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On dom, 2014-08-24 at 18:58 -0400, Meng Xu wrote:
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
>>      return 0;
>>  }
>>
>> +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
>> +                               libxl_domain_sched_params *scinfo)
>> +{
>> +    struct xen_domctl_sched_rt_params* sdom;
>> +    uint16_t num_vcpus;
>> +    int rc, i;
>> +
>> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
>> +    if (rc != 0) {
>> +        LOGE(ERROR, "getting num_vcpus of domain sched rt");
>> +        return ERROR_FAIL;
>> +    }
>>
> As George pointed out already, you can get the num_vcpus via
> xc_domain_getinfo().
>
> I agree with George's review about the rest of this function
> (appropriately updated to take what we decided about the interface into
> account :-) ).
>
>> +#define SCHED_RT_VCPU_PERIOD_MAX    31536000000000 /* one year in microsecond*/
>> +#define SCHED_RT_VCPU_BUDGET_MAX    SCHED_RT_VCPU_PERIOD_MAX
>> +
> This may be me not remembering correctly the outcome of a preceding
> discussion... did we say we were going with _RT_ or with something like
> _RTDS_ ?
>
> ISTR the latter...

So I also need to change all RT to RT_DS in the tool stack, except
keeping the command 'xl sched-rt'? If so, I will do that.

>
> Also, macros like INT_MAX, UINT_MAX, etc., or << and ~ "tricks" are,
> IMO, preferrable to the open coding of the value.

What does open coding of the value mean? Do you mean (2^32-1) instead
of 4294967295?

>
> Finally, I wonder whether these would better live in some headers,
> closer to the declaration of period and budget (where their type is also
> visible) and, as a nice side effect of that, available to libxl callers
> as well.

I actually considered the possibility of adding it to
xen/include/public/domctl.h, but other schedulers do not have such
range macro in the domctl.h, so I'm not sure if it will cause
inconsistence?

>
>> +/*
>> + * Sanity check of the scinfo parameters
>> + * return 0 if all values are valid
>> + * return 1 if one param is default value
>> + * return 2 if the target vcpu's index, period or budget is out of range
>> + */
>> +static int sched_rt_domain_set_validate_params(libxl__gc *gc,
>> +                                               const libxl_domain_sched_params *scinfo,
>> +                                               const uint16_t num_vcpus)
>> +{
>> +    int vcpu_index = scinfo->rt.vcpu_index;
>> +
> As per the low level interface (Xen and libxc) there should be no need
> for any vcpu_index anymore, right?
>
> I'm just double checking, as the discussion was --as it should, on these
> things-- long and involved :-D
>
>> +    if (vcpu_index < 0 || vcpu_index > num_vcpus)
>> +    {
>> +        LOG(ERROR, "VCPU index is not set or out of range, "
>> +                    "valid values are within range from 0 to %d", num_vcpus);
>> +        return 2;
>> +    }
>> +
>> +    if (scinfo->rt.period < 1 ||
>> +        scinfo->rt.period > SCHED_RT_VCPU_PERIOD_MAX)
>> +    {
>> +        LOG(ERROR, "VCPU period is not set or out of range, "
>> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PERIOD_MAX);
>> +        return 2;
>> +    }
>> +
>> +    if (scinfo->rt.budget < 1 ||
>> +        scinfo->rt.budget > SCHED_RT_VCPU_BUDGET_MAX)
>> +    {
>> +        LOG(ERROR, "VCPU budget is not set or out of range, "
>> +                    "valid values are within range from 0 to %lu", SCHED_RT_VCPU_BUDGET_MAX);
>> +        return 2;
>> +    }
>> +
> I think some basics sanity checking are fine done here. E.g., you may
> also add a (budget <= period) check. However, as George said already:
>  - make sure you limit these to the kind of checks based on info the
>    toolstack should have, and defer the rest to the hypervisor
>  - if something goes wrong, make sure to always return libxl error codes
>    (typically, ERROR_INVAL), as sched_credit_domain_set() does.
>
> Oh and, just double checking again, I think we decided to handle
> (budget ==0 && period == 0) in a special way, so remember that! :-P

Yes. Now I didn't use any array in toolstack or kernel to set/get
domain's parameters. So we don't use (budget == 0 && period == 0) to
indicate the vcpus not changed. (Whenever user set a domain's vcpu's
parameters, we set all vcpus' parameters of this domain, so we don't
need such implication for 4.5 release. :-P) The array comes "only"
when we allow users to set/get a specific vcpu's parameters and vcpus
have different periods and budgets.

As to the 4.5 release, because we decide to not having the
functionality of setting/getting each vcpu's parameters. The current
toolstack implementation is aimed to setting/getting each vcpu's
parameters, the code change could be a lot, (but the result tool stack
code for 4.5 will be very similar to the toolstack code of existing
schedulers.)  I think I will release a version soon to let you guys
have a look. What do you think?

>
> It's personal taste, I guess, but I think you don't really need an
> helper function for this, and it can leave in sched_rt_domain_set.

Right!

>
>> +static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid,
>> +                               const libxl_domain_sched_params *scinfo)
>> +{
>> +    struct xen_domctl_sched_rt_params sdom;
>> +    uint16_t num_vcpus;
>> +    int rc;
>> +
>> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
>> +    if (rc != 0) {
>> +        LOGE(ERROR, "getting domain sched rt");
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    rc = sched_rt_domain_set_validate_params(gc, scinfo, num_vcpus);
>> +    if (rc == 2)
>> +        return ERROR_INVAL;
>> +    if (rc == 1)
>> +        return 0;
>> +    if (rc == 0)
>> +    {
>> +        sdom.index = scinfo->rt.vcpu_index;
>> +        sdom.period = scinfo->rt.period;
>> +        sdom.budget = scinfo->rt.budget;
>> +    }
>> +
> So, I see that you are actually returning libxl error codes, good. Well,
> again, just put the code above directly here, instead of having to
> define and then interpret an ad-hoc error handling logic. :-)

Removed the ad-hoc error code. :-)

>
>> @@ -5177,6 +5310,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
>>      case LIBXL_SCHEDULER_ARINC653:
>>          ret=sched_arinc653_domain_set(gc, domid, scinfo);
>>          break;
>> +    case LIBXL_SCHEDULER_RT:
>> +        ret=sched_rt_domain_set(gc, domid, scinfo);
>> +        break;
> Again, I seriously think I remember we agreed upon _SCHEDULER_RTDS (or
> _RT_DS) as thee name of this thing. Am I wrong?
>

Right! We agreed on RT_DS. I changed the hypervisor code and didn't
realize I also need to change the tool stack part. Will change it the
next patch.

>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -153,6 +153,7 @@ libxl_scheduler = Enumeration("scheduler", [
>>      (5, "credit"),
>>      (6, "credit2"),
>>      (7, "arinc653"),
>> +    (8, "rt"),
> rtds? rt_ds?
>
>> @@ -303,6 +304,19 @@ libxl_domain_restore_params = Struct("domain_restore_params", [
>>      ("checkpointed_stream", integer),
>>      ])
>>
>> +libxl_rt_vcpu = Struct("vcpu",[
>> +    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
>> +    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>> +    ("index",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
>> +    ])
>> +
>> +libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[
>> +    ("vcpus",        Array(libxl_rt_vcpu, "num_vcpus")),
>> +    ("period",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
>> +    ("budget",       uint64, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>> +    ("vcpu_index",   integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
>> +    ])
>> +
>>  libxl_domain_sched_params = Struct("domain_sched_params",[
>>      ("sched",        libxl_scheduler),
>>      ("weight",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
>> @@ -311,6 +325,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
>>      ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
>>      ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
>>      ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
>> +    ("rt",           libxl_domain_sched_rt_params),
>>      ])
> And about this part, we discussed and agreed already in the other
> thread. :-)
>
Yes. Modified for the next patch.

Thank you very much!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 2/4] libxc: add rt scheduler
  2014-09-05 10:34   ` Dario Faggioli
@ 2014-09-05 17:17     ` Meng Xu
  2014-09-05 17:50       ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-05 17:17 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb

2014-09-05 6:34 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On dom, 2014-08-24 at 18:58 -0400, Meng Xu wrote:
>
>> --- /dev/null
>> +++ b/tools/libxc/xc_rt.c
>> @@ -0,0 +1,90 @@
>> +/****************************************************************************
>> + *
>> + *        File: xc_rt.c
>> + *      Author: Sisu Xi
>> + *              Meng Xu
>> + *
>> + * Description: XC Interface to the rt scheduler
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation;
>> + * version 2.1 of the License.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
>> + */
>> +
>> +#include "xc_private.h"
>> +
>> +int xc_sched_rt_domain_set(xc_interface *xch,
>> +                           uint32_t domid,
>> +                           struct xen_domctl_sched_rt_params *sdom)
>> +{
>> +    int rc;
>> +    DECLARE_DOMCTL;
>> +
>> +    domctl.cmd = XEN_DOMCTL_scheduler_op;
>> +    domctl.domain = (domid_t) domid;
>> +    domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT_DS;
>>
> Aha! So, I was not dreaming about the whole RT_DS thing! :-D
>
> Perhaps then had that discussion about the low level side of things
> then. Well, I think that, if the name of the scheduler is RT_DS, it
> should be that for libxl and xl as well.
>
> After all, the reason why we're calling RT_DS in Xen, is that we want to
> be able to add others RT_FOO, RT_BAR algorithm/schedulers, in future. If
> that will happen, we'll need a way to reference them from the higher
> layer of the toolstack as well. When we'll have RT_DS and RT_CBS in Xen,
> and RT in libxl, to which one the RT in libxl will refer?
>
> So, just push the RT_DS thing all the way up to libxl and xl. As we said
> for the Xen part, you can keep the source filenames _rt.c, but functions
> and defines needs to be specific.
>
> So, for instance, this file can continue being xc_rt.c, but this
> function needs to be called xc_sched_rtds_domain_set()
> (or .._rt_ds_domain_..)

As to the function name, if I change it to ..._rt_ds_domain_... in
xc_rt.c file, do I need to change it in libxl and xl files? In
addition, do I need to change the name rt_* functions in
xen/common/sched_rt.c in the hypervisor? This seems a lot of (well
easy) change, but we need some consensus to avoid changing it back
later. :-P

>
>> +    domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo;
>> +    domctl.u.scheduler_op.u.rt.vcpu_index = sdom->index;
>> +    domctl.u.scheduler_op.u.rt.period = sdom->period;
>> +    domctl.u.scheduler_op.u.rt.budget = sdom->budget;
>> +
>> +    rc = do_domctl(xch, &domctl);
>> +
>> +    return rc;
>> +}
>> +
> These functions are going to change quite a bit in next version, due to
> the interface changes we agreed upon. It'd then be quite pointless to
> put much more comments, but, overall, the hcall wrapping, the bouncing
> logic, and everything I can see here looks fine to me.
>

I will send the next version soon.

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
                     ` (2 preceding siblings ...)
  2014-09-03 14:20   ` George Dunlap
@ 2014-09-05 17:17   ` Dario Faggioli
  2014-09-07  3:56     ` Meng Xu
  3 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05 17:17 UTC (permalink / raw)
  To: Meng Xu
  Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
	ian.jackson, xen-devel, xumengpanda, JBeulich, chaowang,
	lichong659, dgolomb


[-- Attachment #1.1: Type: text/plain, Size: 40393 bytes --]

On dom, 2014-08-24 at 18:58 -0400, Meng Xu wrote:
> The mechanism of how to burn a VCPU's budget depends on the server mechanism
> implemented for each VCPU.
> 
> Server mechanism: a VCPU is implemented as a deferable server.
> When a VCPU is scheduled to execute on a PCPU, its budget is continuously
> burned.
> 
Right now, we only support one 'server mechanism', i.e., the one
introduce in this very patch. I appreciate that you're trying to
highlight the fact that the budget burning and replenishment logic is
flexible and easy to change/add another one, but, as of now, I fear this
is more confusing than anything else.

So, I'd say, just kill the first two lines above, avoid mentioning
'Server mechanism:' and go ahead describing how the budget is burned. It
will be when we add another policy, that we'll explain the analogies and
the differences, now it's rather pointless.

> Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> At any scheduling point, the VCPU with earliest deadline has highest
> priority.
> 
Same here. Put this paragraph above, when you first mention EDF, as a
quick explanation of what EDF is (e.g., below the first line "This
scheduler follows the pre-emptive Global EDF theory in real-time
field."). Again, since for now we support only that priority scheme, no
point in hinting at other ones.

Oh, and one thing that may be good to have here, is a few words about
how the budget and period of a vcpu relates to its scheduling deadline. 
In fact, as the changelog stands, one reads about a vcpu having a period
and a budget, and being scheduled according to its deadline, without any
indication of how it is given one! :-)

It's evident to you and me, because we know the algorithm very well, but
may be useful to someone looking at it for the first time and trying to
understand what's happening.

> Scheduling quantum: 1 ms;
> 
What does this mean? I mean, what effect does this have on the
algorithm, as it's described above? It's the granularity of budget
accounting, right? Either state it, or avoid mentioning the information
at all.

> This is still in the development phase.
> 
Ditto (in another message) about this.

> --- /dev/null
> +++ b/xen/common/sched_rt.c
> @@ -0,0 +1,1205 @@
> +/******************************************************************************
> + * Preemptive Global Earliest Deadline First  (EDF) scheduler for Xen
> + * EDF scheduling is one of most popular real-time scheduling algorithm used in
> + * embedded field.
> + *
Is it? :-)

It's true it existed since 1973, and it's a very effective, efficient
and, despite its age, advanced one, but I'm not sure how broadly it is
adopted in practice.

Actually, that's what makes this very cool: we're going to be among the
few platforms that ships it! :-P

> +/*
> + * Design:
> + *
> + * This scheduler follows the Preemptive Global EDF theory in real-time field.
> + * Each VCPU can have a dedicated period and budget. 
> + * While scheduled, a VCPU burns its budget.
> + * A VCPU has its budget replenished at the beginning of each of its periods;
> + * The VCPU discards its unused budget at the end of each of its periods.
> + * If a VCPU runs out of budget in a period, it has to wait until next period.
> + * The mechanism of how to burn a VCPU's budget depends on the server mechanism
> + * implemented for each VCPU.
> + *
> + * Server mechanism: a VCPU is implemented as a deferable server.
> + * When a VCPU has a task running on it, its budget is continuously burned;
> + * When a VCPU has no task but with budget left, its budget is preserved.
> + *
> + * Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> + * At any scheduling point, the VCPU with earliest deadline has highest priority.
> + *
> + * Queue scheme: A global runqueue for each CPU pool. 
> + * The runqueue holds all runnable VCPUs. 
> + * VCPUs in the runqueue are divided into two parts: with and without remaining budget. 
> + * At each part, VCPUs are sorted based on EDF priority scheme.
> + *
> + * Scheduling quanta: 1 ms; but accounting the budget is in microsecond.
> + *
> + * Note: cpumask and cpupool is supported.
> + */
> +
Of course, all I said about the changelog, applies here too.

> +/*
> + * Locking:
> + * Just like credit2, a global system lock is used to protect the RunQ.
> + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus.
> + *
Well, credit2 has one RunQ per socket (or so it should, bugs [that I'm
trying to fix] apart). In any case, why referencing it, just go ahead
describing your solution. :-)

> +#define RT_DEFAULT_PERIOD     (MICROSECS(10))
> +#define RT_DEFAULT_BUDGET     (MICROSECS(4))
> +
RT_DS? Maybe not so important for now, though, as it's entirely an
implementation detail.

> +/*
> + * Virtual CPU
> + */
> +struct rt_vcpu {
> +    struct list_head runq_elem; /* On the runqueue list */
> +    struct list_head sdom_elem; /* On the domain VCPU list */
> +
> +    /* Up-pointers */
> +    struct rt_dom *sdom;
> +    struct vcpu *vcpu;
> +
> +    /* VCPU parameters, in milliseconds */
> +    s_time_t period;
> +    s_time_t budget;
> +
> +    /* VCPU current infomation in nanosecond */
> +    long cur_budget;             /* current budget */
>
Perhaps we said this already, in which case, remind me why cur_budget is
a long and not an s_time_t as the other params? It's a time value...

> +/*
> + * Useful inline functions
> + */
> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
> +{
> +    return _ops->sched_data;
> +}
> +
> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
> +{
> +    return _vcpu->sched_priv;
> +}
> +
> +static inline struct rt_dom *RT_DOM(const struct domain *_dom)
> +{
> +    return _dom->sched_priv;
> +}
> +
> +static inline struct list_head *RUNQ(const struct scheduler *_ops)
> +{
> +    return &RT_PRIV(_ops)->runq;
> +}
> +
I see what's happening, and I remember the suggestion of using static
inline-s, with which I agree. At that point, however, I'd have the
function names lower case-ed.

It's probably not a big deal, and I don't think we have any official
saying about this, but it looks more consistent to me.

Oh and, also, since they're functions now, no need for the '_' in
arguments' names.

> +//#define RT_PRIV(_ops)     ((struct rt_private *)((_ops)->sched_data))
> +//#define RT_VCPU(_vcpu)    ((struct rt_vcpu *)(_vcpu)->sched_priv)
> +//#define RT_DOM(_dom)      ((struct rt_dom *)(_dom)->sched_priv)
> +//#define RUNQ(_ops)              (&RT_PRIV(_ops)->runq)
> +
These obviously need to go away. :-)

> +/*
> + * RunQueue helper functions
> + */
> +static int
> +__vcpu_on_runq(const struct rt_vcpu *svc)
> +{
> +   return !list_empty(&svc->runq_elem);
> +}
> +
> +static struct rt_vcpu *
> +__runq_elem(struct list_head *elem)
> +{
> +    return list_entry(elem, struct rt_vcpu, runq_elem);
> +}
> +
> +/*
> + * Debug related code, dump vcpu/cpu information
> + */
> +static void
> +rt_dump_vcpu(const struct rt_vcpu *svc)
> +{
> +    char cpustr[1024];
> +
> +    ASSERT(svc != NULL);
> +    /* flag vcpu */
> +    if( svc->sdom == NULL )
> +        return;
> +
> +    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
> +    printk("[%5d.%-2u] cpu %u, (%"PRId64", %"PRId64"), cur_b=%"PRId64" cur_d=%"PRId64" last_start=%"PRId64" onR=%d runnable=%d cpu_hard_affinity=%s ",
>
long line.

Also, we have PRI_stime (look in xen/include/xen/time.h).

> +            svc->vcpu->domain->domain_id,
> +            svc->vcpu->vcpu_id,
> +            svc->vcpu->processor,
> +            svc->period,
> +            svc->budget,
> +            svc->cur_budget,
> +            svc->cur_deadline,
> +            svc->last_start,
> +            __vcpu_on_runq(svc),
> +            vcpu_runnable(svc->vcpu),
> +            cpustr);
> +    memset(cpustr, 0, sizeof(cpustr));
> +    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool));
here too.

> +    printk("cpupool=%s\n", cpustr);
> +
> +    /* TRACE */
> +    {
> +        struct {
> +            unsigned dom:16,vcpu:16;
> +            unsigned processor;
> +            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, cur_deadline_hi;
> +            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
> +        } d;
> +        d.dom = svc->vcpu->domain->domain_id;
> +        d.vcpu = svc->vcpu->vcpu_id;
> +        d.processor = svc->vcpu->processor;
> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
> +        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
> +        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
> +        trace_var(TRC_RT_VCPU_DUMP, 1,
> +                  sizeof(d),
> +                  (unsigned char *)&d);
> +    }
Not a too big deal, but is it really useful to trace vcpu dumps? How?

> +/*
> + * should not need lock here. only showing stuff 
> + */
> +static void
> +rt_dump(const struct scheduler *ops)
> +{
> +    struct list_head *iter_sdom, *iter_svc, *runq, *iter;
> +    struct rt_private *prv = RT_PRIV(ops);
> +    struct rt_vcpu *svc;
> +    unsigned int cpu = 0;
> +    unsigned int loop = 0;
> +
> +    printtime();
> +    printk("Priority Scheme: EDF\n");
> +
No value in printing this.

> +/*
> + * Insert a vcpu in the RunQ based on vcpus' deadline: 
> + * EDF schedule policy: vcpu with smaller deadline has higher priority;
> + * The vcpu svc to be inserted will be inserted just before the very first 
> + * vcpu iter_svc in the Runqueue whose deadline is equal or larger than 
> + * svc's deadline.
> + */
>
/*
 * Insert svc in the RunQ according to EDF: vcpus with smaller
 * deadlines go first.
 */

Or, if you want, something like (possibly turn into proper English :-P)
"a vcpu with a smaller deadline goes before one with an higher
deadline", but I'd avoid mentioning 'priority'.

Also, what follows that part is not adding much, and it's evident from
the code.

> +static void
> +__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> +    struct rt_private *prv = RT_PRIV(ops);
> +    struct list_head *runq = RUNQ(ops);
> +    struct list_head *iter;
> +    spinlock_t *schedule_lock;
> +    
> +    schedule_lock = per_cpu(schedule_data, svc->vcpu->processor).schedule_lock;
> +    ASSERT( spin_is_locked(schedule_lock) );
> +    
> +    /* Debug only */
> +    if ( __vcpu_on_runq(svc) )
> +    {
> +        rt_dump(ops);
> +    }
> +    ASSERT( !__vcpu_on_runq(svc) );
> +
The 'Debug only' comment and the subsequent if must be killed, of
course! the ASSERT is already offering all the necessary debugging
help. :-)

> +    /* svc still has budget */
> +    if ( svc->cur_budget > 0 ) 
> +    {
> +        list_for_each(iter, runq) 
> +        {
> +            struct rt_vcpu * iter_svc = __runq_elem(iter);
> +            if ( iter_svc->cur_budget == 0 ||
> +                 svc->cur_deadline <= iter_svc->cur_deadline )
> +                    break;
> +         }
> +        list_add_tail(&svc->runq_elem, iter);
> +     }
> +    else 
> +    { /* svc has no budget */
>
I'd say put the comment on its own line, but I think you can well kill
it.
> +        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
>
Mmm... flag_vcpu, eh? I missed it above where it's declared, but I
really don't like the name. 'depleted_vcpus'? Or something else (sorry,
not very creative in this very moment :-))... but flag_vcpu, I found it
confusing.

> +/* 
> + * point per_cpu spinlock to the global system lock; all cpu have same global system lock 
long line.

> + */
> +static void *
> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
> +{
> +    struct rt_private *prv = RT_PRIV(ops);
> +
> +    cpumask_set_cpu(cpu, &prv->cpus);
> +
> +    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> +
> +    printtime();
> +    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
> +    /* same as credit2, not a bogus pointer */
> +    return (void *)1;
>
Can you put in the comment the reason why it is ok to do this, without
referencing credit2?

> +static void *
> +rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
> +{
> +    unsigned long flags;
> +    struct rt_dom *sdom;
> +    struct rt_private * prv = RT_PRIV(ops);
> +
> +    printtime();
> +    printk("dom=%d\n", dom->domain_id);
> +
I don't think we want to spam the console at each allocation,
deallocation, initialization and destruction.

Put in a tracepoint, if you want these information to be available while
debugging, but please, avoid the printk.

> +    sdom = xzalloc(struct rt_dom);
> +    if ( sdom == NULL ) 
> +    {
> +        printk("%s, xzalloc failed\n", __func__);
> +        return NULL;
> +    }
> +
> +    INIT_LIST_HEAD(&sdom->vcpu);
> +    INIT_LIST_HEAD(&sdom->sdom_elem);
> +    sdom->dom = dom;
> +
> +    /* spinlock here to insert the dom */
> +    spin_lock_irqsave(&prv->lock, flags);
> +    list_add_tail(&sdom->sdom_elem, &(prv->sdom));
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +
> +    return sdom;
> +}
> +
> +static void
> +rt_free_domdata(const struct scheduler *ops, void *data)
> +{
> +    unsigned long flags;
> +    struct rt_dom *sdom = data;
> +    struct rt_private *prv = RT_PRIV(ops);
> +
> +    printtime();
> +    printk("dom=%d\n", sdom->dom->domain_id);
> +
Ditto.

> +    spin_lock_irqsave(&prv->lock, flags);
> +    list_del_init(&sdom->sdom_elem);
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +    xfree(data);
> +}
> +
> +static int
> +rt_dom_init(const struct scheduler *ops, struct domain *dom)
> +{
> +    struct rt_dom *sdom;
> +
> +    printtime();
> +    printk("dom=%d\n", dom->domain_id);
> +
And here.

> +    /* IDLE Domain does not link on rt_private */
> +    if ( is_idle_domain(dom) ) 
> +        return 0;
> +
> +    sdom = rt_alloc_domdata(ops, dom);
> +    if ( sdom == NULL ) 
> +    {
> +        printk("%s, failed\n", __func__);
> +        return -ENOMEM;
> +    }
> +    dom->sched_priv = sdom;
> +
> +    return 0;
> +}
> +
> +static void
> +rt_dom_destroy(const struct scheduler *ops, struct domain *dom)
> +{
> +    printtime();
> +    printk("dom=%d\n", dom->domain_id);
> +
And here too.

> +    rt_free_domdata(ops, RT_DOM(dom));
> +}
> +
> +static void *
> +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
> +{
> +    struct rt_vcpu *svc;
> +    s_time_t now = NOW();
> +    long count;
> +
> +    /* Allocate per-VCPU info */
> +    svc = xzalloc(struct rt_vcpu);
> +    if ( svc == NULL ) 
> +    {
> +        printk("%s, xzalloc failed\n", __func__);
> +        return NULL;
> +    }
> +
> +    INIT_LIST_HEAD(&svc->runq_elem);
> +    INIT_LIST_HEAD(&svc->sdom_elem);
> +    svc->flags = 0U;
> +    svc->sdom = dd;
> +    svc->vcpu = vc;
> +    svc->last_start = 0;            /* init last_start is 0 */
> +
As it is, the comment is not very useful... we can see you're
initializing it to zero. If there's something about the why you do it
that you think it's worth mentioning, go ahead, otherwise, kill it.

> +    svc->period = RT_DEFAULT_PERIOD;
> +    if ( !is_idle_vcpu(vc) )
> +        svc->budget = RT_DEFAULT_BUDGET;
> +
> +    count = (now/MICROSECS(svc->period)) + 1;
> +    /* sync all VCPU's start time to 0 */
> +    svc->cur_deadline += count * MICROSECS(svc->period);
> +
why "+="? What value do you expect cur_deadline to hold at this time?

Also, unless I'm missing something, or doing the math wrong, what you
want to do here is place the deadline one period ahead of NOW().

In my head, this is just:

    svc->cur_deadline = NOW() + svc->period;

What is it that fiddling with count is buying you that I don't see?

About time units, can we do the conversions, once and for all, when the
parameters are assigned to the domain/vcpu and, here inside the core of
the scheduler, only deal with quantities homogeneous with NOW() (since
this scheduler needs to compare and add to what NOW() returns a lot)?

However fast it is these days, or will become in future hardware, to do
a multiplication, I don't see the point of playing all this *1000 and/or
MICROSECS() game down here (and even less in more hot paths).

> +    svc->cur_budget = svc->budget*1000; /* counting in microseconds level */
>
Another perfect example. :-)

> +    /* Debug only: dump new vcpu's info */
> +    printtime();
> +    rt_dump_vcpu(svc);
> +
As said before, this has to go, quite possibly by means of replacing it
by some tracing.

> +    return svc;
> +}
> +
> +static void
> +rt_free_vdata(const struct scheduler *ops, void *priv)
> +{
> +    struct rt_vcpu *svc = priv;
> +
> +    /* Debug only: dump freed vcpu's info */
> +    printtime();
> +    rt_dump_vcpu(svc);
>
To be ditched.

> +    xfree(svc);
> +}
> +
> +/*
> + * TODO: Do we need to add vc to the new Runqueue?
> + * This function is called in sched_move_domain() in schedule.c
> + * When move a domain to a new cpupool, 
> + * may have to add vc to the Runqueue of the new cpupool
> + */
> +static void
> +rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_vcpu *svc = RT_VCPU(vc);
> +
> +    /* Debug only: dump info of vcpu to insert */
> +    printtime();
> +    rt_dump_vcpu(svc);
> +
Ditch.

> +    /* not addlocate idle vcpu to dom vcpu list */
> +    if ( is_idle_vcpu(vc) )
> +        return;
> +
> +    list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu);   /* add to dom vcpu list */
>
Comment above, to prevent the line to become too long.

> +}
> +
> +/*
> + * TODO: same as rt_vcpu_insert()
> + */
> +static void
> +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_vcpu * const svc = RT_VCPU(vc);
> +    struct rt_dom * const sdom = svc->sdom;
> +
> +    printtime();
> +    rt_dump_vcpu(svc);
> +
> +    BUG_ON( sdom == NULL );
> +    BUG_ON( __vcpu_on_runq(svc) );
> +
> +    if ( !is_idle_vcpu(vc) ) 
> +        list_del_init(&svc->sdom_elem);
> +}
> +
I remember commenting about these two functions already, during RFC
phase. It does not look like they've changed much, in response to that
commenting.

Can you please re-look at
http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html , and
either reply/explain, or cope with that? In summary, what I was asking
was whether adding/removing the vcpu to the list of vcpus of a domain
could be done somewhere or somehow else, as doing it like this means,
for some code-path, removing and then re-adding it right away.

Looking more carefully, I do see that credit2 does the same thing but,
for one, that doesn't make it perfect (I still don't like it :-P), and
for two, credit2 does a bunch of other stuff there, that you don't.

Looking even more carefully, this seems to be related to the TODO you
put (which may be a new addition wrt RFCv2, I can't remember). So, yes,
I seriously think you should take care of the fact that, when this
function is called for moving a domain from a cpupool to another, you
need to move the vcpus from the old cpupool's runqueue to the new one's
runqueue.

Have you tested doing such operation (moving domains between cpupools)?
Because, with the code looking like this, it seems a bit unlikely...

Finally, the insert_vcpu hook is called from sched_init_vcpu() too,
still in schedule.c. And in fact, all the other scheduler use this
function to put a vcpu on its runqueue for the first time. Once you do
this, you a fine behaviour, cpupool-wise, almost for free. You
apparently don't. Why is that? If you'd go for a similar approach, you
will get the same benefit. Where is it that you insert a vcpu in the
RunQ for the first time? Again, why there and not here?

> +/* 
> + * Pick a valid CPU for the vcpu vc
> + * Valid CPU of a vcpu is intesection of vcpu's affinity and available cpus
>
long line again. Please, fix all of these, even the ones I may be
missing in this review.

> + */
> +static int
> +rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    cpumask_t cpus;
> +    cpumask_t *online;
> +    int cpu;
> +    struct rt_private * prv = RT_PRIV(ops);
> +
> +    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
> +    cpumask_and(&cpus, &prv->cpus, online);
> +    cpumask_and(&cpus, &cpus, vc->cpu_hard_affinity);
> +
> +    cpu = cpumask_test_cpu(vc->processor, &cpus)
> +            ? vc->processor 
> +            : cpumask_cycle(vc->processor, &cpus);
> +    ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
> +
> +    return cpu;
> +}
> +
> +/*
> + * Burn budget at microsecond level. 
> + */
> +static void
> +burn_budgets(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) 
> +{
> +    s_time_t delta;
> +    long count = 0;
> +
> +    /* don't burn budget for idle VCPU */
> +    if ( is_idle_vcpu(svc->vcpu) ) 
> +    {
> +        return;
> +    }
> +
> +    /* first time called for this svc, update last_start */
> +    if ( svc->last_start == 0 ) 
> +    {
> +        svc->last_start = now;
> +        return;
> +    }
> +
Should not last_start be set to NOW() for the fist time that the vcpu is
selected for execution, rather than the first time this function is
called on the vcpu itself?

> +    /*
> +     * update deadline info: When deadline is in the past,
> +     * it need to be updated to the deadline of the current period,
> +     * and replenish the budget 
> +     */
> +    delta = now - svc->cur_deadline;
> +    if ( delta >= 0 ) 
> +    {
> +        count = ( delta/MICROSECS(svc->period) ) + 1;
> +        svc->cur_deadline += count * MICROSECS(svc->period);
> +        svc->cur_budget = svc->budget * 1000;
> +
Ditto, about time units and conversions.

I also am sure I said before that I prefer an approach like:

    while ( svc->cur_deadline < now )
        svc->cur_deadline += svc->period;+/*
> + * set/get each vcpu info of each domain
> + */
> +static int
> +rt_dom_cntl(
> +    const struct scheduler *ops, 
> +    struct domain *d, 
> +    struct xen_domctl_scheduler_op *op)
> +{
> +    struct rt_dom * const sdom = RT_DOM(d);
> +    struct list_head *iter;
> +    int vcpu_index = 0;
> +    int rc = 0;
> +
> +    switch ( op->cmd )
> +    {
> +    case XEN_DOMCTL_SCHEDOP_getnumvcpus:
> +        op->u.rt.nr_vcpus = 0;
> +        list_for_each( iter, &sdom->vcpu ) 
> +            vcpu_index++;
> +        op->u.rt.nr_vcpus = vcpu_index;
> +        break;
> +    case XEN_DOMCTL_SCHEDOP_getinfo:
> +        /* for debug use, whenever adjust Dom0 parameter, do global
dump */
> +        if ( d->domain_id == 0 ) 
> +            rt_dump(ops);
> +
> +        vcpu_index = 0;
> +        list_for_each( iter, &sdom->vcpu ) 
> +        {
> +            xen_domctl_sched_rt_params_t local_sched;
> +            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu,
sdom_elem);
> +
> +            if( vcpu_index >= op->u.rt.nr_vcpus )
> +                break;
> +
> +            local_sched.budget = svc->budget;
> +            local_sched.period = svc->period;
> +            local_sched.index = vcpu_index;
> +            if( copy_to_guest_offset(op->u.rt.vcpu, vcpu_index,
&local_sched, 1) )
> +            {
> +                rc = -EFAULT;
> +                break;
> +            }
> +            vcpu_index++;
> +        }
> +        break;
> +    case XEN_DOMCTL_SCHEDOP_putinfo:
> +        list_for_each( iter, &sdom->vcpu ) 
> +        {
> +            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu,
sdom_elem);
> +
> +            /* adjust per VCPU parameter */
> +            if ( op->u.rt.vcpu_index == svc->vcpu->vcpu_id ) 
> +            { 
> +                vcpu_index = op->u.rt.vcpu_index;
> +
> +                if ( vcpu_index < 0 ) 
> +                    printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d
\n",
> +                            vcpu_index);
> +                else
> +                    printk("XEN_DOMCTL_SCHEDOP_putinfo: "
> +                            "vcpu_index=%d, period=%"PRId64", budget=
%"PRId64"\n",
> +                            vcpu_index, op->u.rt.period,
op->u.rt.budget);
> +
> +                svc->period = op->u.rt.period;
> +                svc->budget = op->u.rt.budget;
> +
> +                break;
> +            }
> +        }
> +        break;
> +    }
> +
> +    return rc;
> +}
> +
> +static struct rt_private _rt_priv;
> +
> +const struct scheduler sched_rt_def = {
> +    .name           = "SMP RT DS Scheduler",
> +    .opt_name       = "rt_ds",
> +    .sched_id       = XEN_SCHEDULER_RT_DS,
> +    .sched_data     = &_rt_priv,
> ++/*
> + * set/get each vcpu info of each domain
> + */
> +static int
> +rt_dom_cntl(
> +    const struct scheduler *ops, 
> +    struct domain *d, 
> +    struct xen_domctl_scheduler_op *op)
> +{
> +    struct rt_dom * const sdom = RT_DOM(d);
> +    struct list_head *iter;
> +    int vcpu_index = 0;
> +    int rc = 0;
> +
> +    switch ( op->cmd )
> +    {
> +    case XEN_DOMCTL_SCHEDOP_getnumvcpus:
> +        op->u.rt.nr_vcpus = 0;
> +        list_for_each( iter, &sdom->vcpu ) 
> +            vcpu_index++;
> +        op->u.rt.nr_vcpus = vcpu_index;
> +        break;
> +    case XEN_DOMCTL_SCHEDOP_getinfo:
> +        /* for debug use, whenever adjust Dom0 parameter, do global
dump */
> +        if ( d->domain_id == 0 ) 
> +            rt_dump(ops);
> +
> +        vcpu_index = 0;
> +        list_for_each( iter, &sdom->vcpu ) 
> +        {
> +            xen_domctl_sched_rt_params_t local_sched;
> +            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu,
sdom_elem);
> +
> +            if( vcpu_index >= op->u.rt.nr_vcpus )
> +                break;
> +
> +            local_sched.budget = svc->budget;
> +            local_sched.period = svc->period;
> +            local_sched.index = vcpu_index;
> +            if( copy_to_guest_offset(op->u.rt.vcpu, vcpu_index,
&local_sched, 1) )
> +            {
> +                rc = -EFAULT;
> +                break;
> +            }
> +            vcpu_index++;
> +        }
> +        break;
> +    case XEN_DOMCTL_SCHEDOP_putinfo:
> +        list_for_each( iter, &sdom->vcpu ) 
> +        {
> +            struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu,
sdom_elem);
> +
> +            /* adjust per VCPU parameter */
> +            if ( op->u.rt.vcpu_index == svc->vcpu->vcpu_id ) 
> +            { 
> +                vcpu_index = op->u.rt.vcpu_index;
> +
> +                if ( vcpu_index < 0 ) 
> +                    printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d
\n",
> +                            vcpu_index);
> +                else
> +                    printk("XEN_DOMCTL_SCHEDOP_putinfo: "
> +                            "vcpu_index=%d, period=%"PRId64", budget=
%"PRId64"\n",
> +                            vcpu_index, op->u.rt.period,
op->u.rt.budget);
> +
> +                svc->period = op->u.rt.period;
> +                svc->budget = op->u.rt.budget;
> +
> +                break;
> +            }
> +        }
> +        break;
> +    }
> +
> +    return rc;
> +}
> +
> +static struct rt_private _rt_priv;
> +
> +const struct scheduler sched_rt_def = {
> +    .name           = "SMP RT DS Scheduler",
> +    .opt_name       = "rt_ds",
> +    .sched_id       = XEN_SCHEDULER_RT_DS,
> +    .sched_data     = &_rt_priv,
> +
> +    .dump_cpu_state = rt_dump_pcpu,
> +    .dump_settings  = rt_dump,
> +    .init           = rt_init,
> +    .deinit         = rt_deinit,
> +    .alloc_pdata    = rt_alloc_pdata,
> +    .free_pdata     = rt_free_pdata,
> +    .alloc_domdata  = rt_alloc_domdata,
> +    .free_domdata   = rt_free_domdata,
> +    .init_domain    = rt_dom_init,
> +    .destroy_domain = rt_dom_destroy,
> +    .alloc_vdata    = rt_alloc_vdata,
> +    .free_vdata     = rt_free_vdata,
> +    .insert_vcpu    = rt_vcpu_insert,
> +    .remove_vcpu    = rt_vcpu_remove,
> +
> +    .adjust         = rt_dom_cntl,
> +
> +    .pick_cpu       = rt_cpu_pick,
> +    .do_schedule    = rt_schedule,
> +    .sleep          = rt_vcpu_sleep,
> +    .wake           = rt_vcpu_wake,
> +    .context_saved  = rt_context_saved,
> +};
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 55503e0..7d2c6d1 100644
> +    .dump_cpu_state = rt_dump_pcpu,
> +    .dump_settings  = rt_dump,
> +    .init           = rt_init,
> +    .deinit         = rt_deinit,
> +    .alloc_pdata    = rt_alloc_pdata,
> +    .free_pdata     = rt_free_pdata,
> +    .alloc_domdata  = rt_alloc_domdata,
> +    .free_domdata   = rt_free_domdata,
> +    .init_domain    = rt_dom_init,
> +    .destroy_domain = rt_dom_destroy,
> +    .alloc_vdata    = rt_alloc_vdata,
> +    .free_vdata     = rt_free_vdata,
> +    .insert_vcpu    = rt_vcpu_insert,
> +    .remove_vcpu    = rt_vcpu_remove,
> +
> +    .adjust         = rt_dom_cntl,
> +
> +    .pick_cpu       = rt_cpu_pick,
> +    .do_schedule    = rt_schedule,
> +    .sleep          = rt_vcpu_sleep,
> +    .wake           = rt_vcpu_wake,
> +    .context_saved  = rt_context_saved,
> +};
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 55503e0..7d2c6d1 100644
    svc->cur_budget = svc->budget;

Not necessarily for speed reasons, but because it's a lot easier to read
and understand.

> +        /* TRACE */
> +        {
> +            struct {
> +                unsigned dom:16,vcpu:16;
> +                unsigned cur_budget_lo, cur_budget_hi;
> +            } d;
> +            d.dom = svc->vcpu->domain->domain_id;
> +            d.vcpu = svc->vcpu->vcpu_id;
> +            d.cur_budget_lo = (unsigned) svc->cur_budget;
> +            d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
> +            trace_var(TRC_RT_BUDGET_REPLENISH, 1,
> +                      sizeof(d),
> +                      (unsigned char *) &d);
> +        }
> +
> +        return;
> +    }
> +
> +    /* burn at nanoseconds level */
>
The function's doc comments says 'microseconds'. Please, agree with
yourself. :-D :-D

> +    delta = now - svc->last_start;
> +    /* 
> +     * delta < 0 only happens in nested virtualization;
> +     * TODO: how should we handle delta < 0 in a better way? */
>
Ah, nice to hear you tested this in nested virt? Whay config is that,
Xen on top of VMWare (I'm assuming this from what I've seen on the
rt-xen mailing list).

BTW, comment style: '*/' goes to newline.

> +    if ( delta < 0 ) 
> +    {
> +        printk("%s, ATTENTION: now is behind last_start! delta = %ld for ",
> +                __func__, delta);
> +        rt_dump_vcpu(svc);
> +        svc->last_start = now;  /* update last_start */
> +        svc->cur_budget = 0;   /* FIXME: should we recover like this? */
> +        return;
> +    }
> +
Bha, this just should not happen, and if it does, it's either a bug or
an hardware problem, isn't it? I don't think you should do much more
than this. I'd remove the rt_dump_vcpu() but, in this case, I'm fine
with the printk(), as it's something that really should not happen, and
we want to inform sysadmin that something has gone very bad. :-)

Long lines, BTW.

> +    if ( svc->cur_budget == 0 ) 
> +        return;
> +
> +    svc->cur_budget -= delta;
> +    if ( svc->cur_budget < 0 ) 
> +        svc->cur_budget = 0;
> +
I see you're not dealing with overruns (where with dealing I mean try to
compensate). That's fine for now, let's leave this as a future
development.

> +/* 
> + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
> + * lock is grabbed before calling this function 
> + */
> +static struct rt_vcpu *
> +__runq_pick(const struct scheduler *ops, cpumask_t mask)
> +{
> +    struct list_head *runq = RUNQ(ops);
> +    struct list_head *iter;
> +    struct rt_vcpu *svc = NULL;
> +    struct rt_vcpu *iter_svc = NULL;
> +    cpumask_t cpu_common;
> +    cpumask_t *online;
> +    struct rt_private * prv = RT_PRIV(ops);
> +
> +    list_for_each(iter, runq) 
> +    {
> +        iter_svc = __runq_elem(iter);
> +
> +        /* flag vcpu */
> +        if(iter_svc->sdom == NULL)
> +            break;
> +
> +        /* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */
> +        online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
> +        cpumask_and(&cpu_common, online, &prv->cpus);
> +        cpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity);
> +        cpumask_and(&cpu_common, &mask, &cpu_common);
> +        if ( cpumask_empty(&cpu_common) )
> +            continue;
> +
I remember asking this in
http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html:

"What's in priv->cpus, BTW? How is it different form the cpupool online
mask (returned in 'online' by cpupool_scheduler_cpumask() )?"

while I don't remember having received an answer, and I see it's still.
here in the code. Am I missing something? If not, can you explain?

> +/*
> + * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq
> + * lock is grabbed before calling this function 
> + */
> +static void
> +__repl_update(const struct scheduler *ops, s_time_t now)
> +{
> +    struct list_head *runq = RUNQ(ops);
> +    struct list_head *iter;
> +    struct list_head *tmp;
> +    struct rt_vcpu *svc = NULL;
> +
> +    s_time_t diff;
> +    long count;
> +
> +    list_for_each_safe(iter, tmp, runq) 
> +    {
> +        svc = __runq_elem(iter);
> +
> +        /* not update flag_vcpu's budget */
> +        if(svc->sdom == NULL)
> +            continue;
> +
> +        diff = now - svc->cur_deadline;
> +        if ( diff > 0 ) 
> +        {
> +            count = (diff/MICROSECS(svc->period)) + 1;
> +            svc->cur_deadline += count * MICROSECS(svc->period);
> +            svc->cur_budget = svc->budget * 1000;
> +            __runq_remove(svc);
> +            __runq_insert(ops, svc);
> +        }
> +    }
> +}
> +
Ok. As said when reviewing the RFC for this, I see a lot of room for
optimization here (event based replenishments, using timers). However,
I'd be fine with such optimization to happen later on, e.g., if this
makes it for 4.5, in 4.6 cycle.

All I said before about time units and convertions applies here too.
Actually, as I also already said in
http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html ,
I think you can wrap this functionality in an helper function, and call
that from here and from all the other places that right now do the same
things.

> +/* 
> + * schedule function for rt scheduler.
> + * The lock is already grabbed in schedule.c, no need to lock here 
> + */
> +static struct task_slice
> +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
> +{
> +    const int cpu = smp_processor_id();
> +    struct rt_private * prv = RT_PRIV(ops);
> +    struct rt_vcpu * const scurr = RT_VCPU(current);
> +    struct rt_vcpu * snext = NULL;
> +    struct task_slice ret = { .migrated = 0 };
> +
> +    /* clear ticked bit now that we've been scheduled */
                tickled

> +/*
> + * Pick a vcpu on a cpu to kick out to place the running candidate
> + * Called by wake() and context_saved()
> + * We have a running candidate here, the kick logic is:
> + * Among all the cpus that are within the cpu affinity
> + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> + * 2) if there are any idle vcpu, kick it.
> + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
> + *    if snext has higher priority, kick it.
> + *
> + * TODO:
> + * 1) what if these two vcpus belongs to the same domain?
> + *    replace a vcpu belonging to the same domain introduces more overhead
> + *
> + * lock is grabbed before calling this function 
> + */
> +static void
> +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
> +{
> +    struct rt_private * prv = RT_PRIV(ops);
> +    struct rt_vcpu * latest_deadline_vcpu = NULL;    /* lowest priority scheduled */
> +    struct rt_vcpu * iter_svc;
> +    struct vcpu * iter_vc;
> +    int cpu = 0, cpu_to_tickle = 0;
> +    cpumask_t not_tickled;
> +    cpumask_t *online;
> +
> +    if ( new == NULL || is_idle_vcpu(new->vcpu) ) 
> +        return;
> +
> +    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
> +    cpumask_and(&not_tickled, online, &prv->cpus);
> +    cpumask_and(&not_tickled, &not_tickled, new->vcpu->cpu_hard_affinity);
> +    cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
> +
What I said about prv->cpus applies here too...

> +/* 
> + * Should always wake up runnable vcpu, put it back to RunQ. 
> + * Check priority to raise interrupt 
> + * The lock is already grabbed in schedule.c, no need to lock here 
> + * TODO: what if these two vcpus belongs to the same domain? 
> + */
> +static void
> +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> +{
> +    struct rt_vcpu * const svc = RT_VCPU(vc);
> +    s_time_t diff;
> +    s_time_t now = NOW();
> +    long count = 0;
> +    struct rt_private * prv = RT_PRIV(ops);
> +    struct rt_vcpu * snext = NULL;        /* highest priority on RunQ */
> +
> +    BUG_ON( is_idle_vcpu(vc) );
> +
> +    if ( unlikely(curr_on_cpu(vc->processor) == vc) ) 
> +        return;
> +
> +    /* on RunQ, just update info is ok */
> +    if ( unlikely(__vcpu_on_runq(svc)) ) 
> +        return;
> +
> +    /* If context hasn't been saved for this vcpu yet, we can't put it on
> +     * the Runqueue. Instead, we set a flag so that it will be put on the Runqueue
> +     * After the context has been saved. */
> +    if ( unlikely(test_bit(__RT_scheduled, &svc->flags)) ) 
> +    {
> +        set_bit(__RT_delayed_runq_add, &svc->flags);
> +        return;
> +    }
> +
> +    /* update deadline info */
> +    diff = now - svc->cur_deadline;
> +    if ( diff >= 0 ) 
> +    {
> +        count = ( diff/MICROSECS(svc->period) ) + 1;
> +        svc->cur_deadline += count * MICROSECS(svc->period);
> +        svc->cur_budget = svc->budget * 1000;
> +    }
> +
Time units and helper function.

> +    __runq_insert(ops, svc);
> +    __repl_update(ops, now);
> +    snext = __runq_pick(ops, prv->cpus);    /* pick snext from ALL valid cpus */
> +    runq_tickle(ops, snext);
> +
> +    return;
> +}

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -69,6 +69,7 @@ static const struct scheduler *schedulers[] = {
>      &sched_credit_def,
>      &sched_credit2_def,
>      &sched_arinc653_def,
> +    &sched_rt_def,
>
rt_ds?

So, overall, I think this is in a reasonably good state. There are some
issues, but mostly not critical stuff, easy to solve in another round or
two.

One thing I feel like asking is, this time, if you can make sure to
address all the points being raised during the review... I know it
happens to miss/forget things, especially in long and complex pieces of
code, like in this patch, but please, try not to. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 3/4] libxl: add rt scheduler
  2014-09-05 15:45     ` Meng Xu
@ 2014-09-05 17:41       ` Dario Faggioli
  0 siblings, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05 17:41 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 3894 bytes --]

On ven, 2014-09-05 at 11:45 -0400, Meng Xu wrote:
> 2014-09-05 6:21 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> > On dom, 2014-08-24 at 18:58 -0400, Meng Xu wrote:
> >> --- a/tools/libxl/libxl.c
> >> +++ b/tools/libxl/libxl.c
> >> @@ -5154,6 +5154,139 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
> >>      return 0;
> >>  }
> >>
> >> +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid,
> >> +                               libxl_domain_sched_params *scinfo)
> >> +{
> >> +    struct xen_domctl_sched_rt_params* sdom;
> >> +    uint16_t num_vcpus;
> >> +    int rc, i;
> >> +
> >> +    rc = xc_sched_rt_domain_get_num_vcpus(CTX->xch, domid, &num_vcpus);
> >> +    if (rc != 0) {
> >> +        LOGE(ERROR, "getting num_vcpus of domain sched rt");
> >> +        return ERROR_FAIL;
> >> +    }
> >>
> > As George pointed out already, you can get the num_vcpus via
> > xc_domain_getinfo().
> >
> > I agree with George's review about the rest of this function
> > (appropriately updated to take what we decided about the interface into
> > account :-) ).
> >
> >> +#define SCHED_RT_VCPU_PERIOD_MAX    31536000000000 /* one year in microsecond*/
> >> +#define SCHED_RT_VCPU_BUDGET_MAX    SCHED_RT_VCPU_PERIOD_MAX
> >> +
> > This may be me not remembering correctly the outcome of a preceding
> > discussion... did we say we were going with _RT_ or with something like
> > _RTDS_ ?
> >
> > ISTR the latter...
> 
> So I also need to change all RT to RT_DS in the tool stack, except
> keeping the command 'xl sched-rt'? If so, I will do that.
> 
As I said in another email, the same reasons that called for renaming
inside Xen, applies to the other layers, I think.

I'm not sure about `xl sched-rt', but yes, I think that would be fine
like that. For one, xl interface/commands are not set in stone.
Moreover, when/if we'll have more policies, I think it could be fine to
add a new 'policy' parameter to `xl sched-rt', rather than adding
another `xl sched-xxx' thing.

But then again, we can decide this later.

> >
> > Also, macros like INT_MAX, UINT_MAX, etc., or << and ~ "tricks" are,
> > IMO, preferrable to the open coding of the value.
> 
> What does open coding of the value mean? Do you mean (2^32-1) instead
> of 4294967295?
> 
I mean use INT_MAX, UINT_MAX, or similar, if possible. :-)

> >
> > Finally, I wonder whether these would better live in some headers,
> > closer to the declaration of period and budget (where their type is also
> > visible) and, as a nice side effect of that, available to libxl callers
> > as well.
> 
> I actually considered the possibility of adding it to
> xen/include/public/domctl.h, but other schedulers do not have such
> range macro in the domctl.h, so I'm not sure if it will cause
> inconsistence?
> 
This is toolstack stuff, libxl stuff, to be more precise, so the public
header I'm mentioning is something like tools/libxl/libxl*.h.

> Yes. Now I didn't use any array in toolstack or kernel to set/get
> domain's parameters. So we don't use (budget == 0 && period == 0) to
> indicate the vcpus not changed. (Whenever user set a domain's vcpu's
> parameters, we set all vcpus' parameters of this domain, so we don't
> need such implication for 4.5 release. :-P) The array comes "only"
> when we allow users to set/get a specific vcpu's parameters and vcpus
> have different periods and budgets.
> 
True. I was thinking at something different, but hey, let's see the new
version with the new interface implemented and comment on that! :-)

regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 2/4] libxc: add rt scheduler
  2014-09-05 17:17     ` Meng Xu
@ 2014-09-05 17:50       ` Dario Faggioli
  0 siblings, 0 replies; 72+ messages in thread
From: Dario Faggioli @ 2014-09-05 17:50 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel, Meng Xu, Jan Beulich, Chao Wang,
	Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 1971 bytes --]

On ven, 2014-09-05 at 13:17 -0400, Meng Xu wrote:
> 2014-09-05 6:34 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> > So, for instance, this file can continue being xc_rt.c, but this
> > function needs to be called xc_sched_rtds_domain_set()
> > (or .._rt_ds_domain_..)
> 
> As to the function name, if I change it to ..._rt_ds_domain_... in
> xc_rt.c file, do I need to change it in libxl and xl files? In
> addition, do I need to change the name rt_* functions in
> xen/common/sched_rt.c in the hypervisor? This seems a lot of (well
> easy) change, but we need some consensus to avoid changing it back
> later. :-P
>
I see your point, and I'm actually not so sure... Would appreciate
George's and others' view here.

IIRC, we decided for RT_DS for two reasons:
 * make it more clear what kind of real-time scheduler this is, among
   all the existing real-time schedulers
 * make it possible to add more and different RT_xxx schedulers without
   having to rename this one and/or cause confusion

I think both these arguments are valid for whatever component of the Xen
architecture we are dealing with. We decided to keep the filename
sched_rt.c (and xc_rt.c, etc), because it is at least quite likely that
these other real-time scheduling policies, if ever implemented, could
all live there.

About the function names, I would apply the same approach. Most of them,
I think they should be named after the scheduler they deal with. Some of
them may be general enough to just be called bla_rt_foo(), but it's
really hard to judge now.

However, since this also is about interfaces, I'd really appreciate
seeing what others think.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-05 17:17   ` Dario Faggioli
@ 2014-09-07  3:56     ` Meng Xu
  2014-09-08 10:33       ` Dario Faggioli
  0 siblings, 1 reply; 72+ messages in thread
From: Meng Xu @ 2014-09-07  3:56 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

Hi Dario,

I have modified the code based on your comments in the previous email.
Since it has a lot of points to modify,  if I don't reply to one
point, it means I agree with your comments and have changed it. :-)

>
>> +/*
>> + * Useful inline functions
>> + */
>> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
>> +{
>> +    return _ops->sched_data;
>> +}
>> +
>> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
>> +{
>> +    return _vcpu->sched_priv;
>> +}
>> +
>> +static inline struct rt_dom *RT_DOM(const struct domain *_dom)
>> +{
>> +    return _dom->sched_priv;
>> +}
>> +
>> +static inline struct list_head *RUNQ(const struct scheduler *_ops)
>> +{
>> +    return &RT_PRIV(_ops)->runq;
>> +}
>> +
> I see what's happening, and I remember the suggestion of using static
> inline-s, with which I agree. At that point, however, I'd have the
> function names lower case-ed.
>
> It's probably not a big deal, and I don't think we have any official
> saying about this, but it looks more consistent to me.

Because other schedulers uses the upper-case name, (because they are
macros instead of static inline,) in order to keep the style
consistent with other schedulers, I think I will keep the name in
upper-case. I can send another patch set to change all other
schedulers'  macro definition of these functions to static inline, and
then change the name to lower-case. What do you think?

>> +/*
>> + * Debug related code, dump vcpu/cpu information
>> + */
>> +static void
>> +rt_dump_vcpu(const struct rt_vcpu *svc)
>> +{
>> +    char cpustr[1024];
>> +
>> +    ASSERT(svc != NULL);
>> +    /* flag vcpu */
>> +    if( svc->sdom == NULL )
>> +        return;
>> +
>> +    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
>> +    printk("[%5d.%-2u] cpu %u, (%"PRId64", %"PRId64"), cur_b=%"PRId64" cur_d=%"PRId64" last_start=%"PRId64" onR=%d runnable=%d cpu_hard_affinity=%s ",
>>
> long line.
>
> Also, we have PRI_stime (look in xen/include/xen/time.h).
>
>> +            svc->vcpu->domain->domain_id,
>> +            svc->vcpu->vcpu_id,
>> +            svc->vcpu->processor,
>> +            svc->period,
>> +            svc->budget,
>> +            svc->cur_budget,
>> +            svc->cur_deadline,
>> +            svc->last_start,
>> +            __vcpu_on_runq(svc),
>> +            vcpu_runnable(svc->vcpu),
>> +            cpustr);
>> +    memset(cpustr, 0, sizeof(cpustr));
>> +    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool));
> here too.
>
>> +    printk("cpupool=%s\n", cpustr);
>> +
>> +    /* TRACE */
>> +    {
>> +        struct {
>> +            unsigned dom:16,vcpu:16;
>> +            unsigned processor;
>> +            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, cur_deadline_hi;
>> +            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
>> +        } d;
>> +        d.dom = svc->vcpu->domain->domain_id;
>> +        d.vcpu = svc->vcpu->vcpu_id;
>> +        d.processor = svc->vcpu->processor;
>> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
>> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
>> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
>> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
>> +        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
>> +        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
>> +        trace_var(TRC_RT_VCPU_DUMP, 1,
>> +                  sizeof(d),
>> +                  (unsigned char *)&d);
>> +    }
> Not a too big deal, but is it really useful to trace vcpu dumps? How?

Actually, this function is called in other functions in this file.
Whenever we want to look into the details of a vcpu (for example, when
we insert a vcpu to the RunQ), we call this function and it will trace
the vcpu information. I extract this as a function to avoid writing
this trace again and again in all of those functions, such as
rt_alloc_vdata, rt_free_vdata, burn_budgets.

In addition, we use xl sched-rt to dump the RunQ information to check
the RunQ is properly sorted for debug reason.

When it is upstreamed, I will change this function to static inline,
remove the printf in this function and leave the trace there.

>> +        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
>>
> Mmm... flag_vcpu, eh? I missed it above where it's declared, but I
> really don't like the name. 'depleted_vcpus'? Or something else (sorry,
> not very creative in this very moment :-))... but flag_vcpu, I found it
> confusing.

It's declared in struct rt_private. It's inited in rt_init function.

The reason why I need this vcpu is:
The RunQ has two parts, the first part is the vcpus with budget and
the first part is sorted based on priority of vcpus. The second part
is the vcpus without budget and not sorted (because we won't schedule
a depleted vcpu, so we don't need to sort them.) The flag_vcpu is the
dummy vcpu that separate the two parts so that we don't have to scan
the whole RunQ when we insert a vcpu without budget into the RunQ.
(If you remember that your suggested to split the RunQ to two queues:
one keeps vcpus with budget and one keeps vcpus without budget, this
is my implementation to your suggestion. It's virtually split the
RunQ. If I have another depleted queue, I need to add some more
functions, like depleted_queue_insert, etc, which adds much more code
than this current implementation. :-P)

>
>> +/*
>> + * point per_cpu spinlock to the global system lock; all cpu have same global system lock
> long line.
>
>> + */
>> +static void *
>> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
>> +{
>> +    struct rt_private *prv = RT_PRIV(ops);
>> +
>> +    cpumask_set_cpu(cpu, &prv->cpus);
>> +
>> +    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
>> +
>> +    printtime();
>> +    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
>> +    /* same as credit2, not a bogus pointer */
>> +    return (void *)1;
>>
> Can you put in the comment the reason why it is ok to do this, without
> referencing credit2?

Explained in the comment in next patch. In schedule.c, they use the
return value to check if this function correctly allocate pdata by
checking if it returns 1. (Well, this is inconsistent with other code
which uses 0 to indicate success.) The function definition needs
return a void *, so we have to cast the 1 to void *. That's the
reason. :-P

>> +static void *
>> +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>> +{
>> +    struct rt_vcpu *svc;
>> +    s_time_t now = NOW();
>> +    long count;
>> +
>> +    /* Allocate per-VCPU info */
>> +    svc = xzalloc(struct rt_vcpu);
>> +    if ( svc == NULL )
>> +    {
>> +        printk("%s, xzalloc failed\n", __func__);
>> +        return NULL;
>> +    }
>> +
>> +    INIT_LIST_HEAD(&svc->runq_elem);
>> +    INIT_LIST_HEAD(&svc->sdom_elem);
>> +    svc->flags = 0U;
>> +    svc->sdom = dd;
>> +    svc->vcpu = vc;
>> +    svc->last_start = 0;            /* init last_start is 0 */
>> +
> As it is, the comment is not very useful... we can see you're
> initializing it to zero. If there's something about the why you do it
> that you think it's worth mentioning, go ahead, otherwise, kill it.
>
>> +    svc->period = RT_DEFAULT_PERIOD;
>> +    if ( !is_idle_vcpu(vc) )
>> +        svc->budget = RT_DEFAULT_BUDGET;
>> +
>> +    count = (now/MICROSECS(svc->period)) + 1;
>> +    /* sync all VCPU's start time to 0 */
>> +    svc->cur_deadline += count * MICROSECS(svc->period);
>> +
> why "+="? What value do you expect cur_deadline to hold at this time?

It should be =. This does not cause a bug because it's in the
rt_alloc_vdata() and the svc->cur_deadline is 0.  But "+=" is not
correct in logic. I modified it. Thank you!

>
> Also, unless I'm missing something, or doing the math wrong, what you
> want to do here is place the deadline one period ahead of NOW().

Ah, no. I think you are thinking about the CBS server mechanisms. (Now
I start to use some notations in real-time academic field) For the
deferrable server, if the release offset of a implicit-deadline
deferrable vcpu is O_i, its deadline will be O_i + p_i * k, where k is
a natural number. So we want to set deadline to the end of the period
during which NOW() is fall in.

>
> In my head, this is just:
>
>     svc->cur_deadline = NOW() + svc->period;
>
> What is it that fiddling with count is buying you that I don't see?

So this calculation is incorrect. :-P

>
>> +    /* Debug only: dump new vcpu's info */
>> +    printtime();
>> +    rt_dump_vcpu(svc);
>> +
> As said before, this has to go, quite possibly by means of replacing it
> by some tracing.

When no more other comments to this patch set, I will make
rt_dump_vcpu as a static inline function and it only has the tracing
inside. Is that OK? Now I just want to use some dump to help me with
some quick debug. :-)

>> +/*
>> + * TODO: same as rt_vcpu_insert()
>> + */
>> +static void
>> +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
>> +{
>> +    struct rt_vcpu * const svc = RT_VCPU(vc);
>> +    struct rt_dom * const sdom = svc->sdom;
>> +
>> +    printtime();
>> +    rt_dump_vcpu(svc);
>> +
>> +    BUG_ON( sdom == NULL );
>> +    BUG_ON( __vcpu_on_runq(svc) );
>> +
>> +    if ( !is_idle_vcpu(vc) )
>> +        list_del_init(&svc->sdom_elem);
>> +}
>> +
> I remember commenting about these two functions already, during RFC
> phase. It does not look like they've changed much, in response to that
> commenting.
>
> Can you please re-look at
> http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html , and
> either reply/explain, or cope with that? In summary, what I was asking
> was whether adding/removing the vcpu to the list of vcpus of a domain
> could be done somewhere or somehow else, as doing it like this means,
> for some code-path, removing and then re-adding it right away.
>
> Looking more carefully, I do see that credit2 does the same thing but,
> for one, that doesn't make it perfect (I still don't like it :-P), and
> for two, credit2 does a bunch of other stuff there, that you don't.
>
> Looking even more carefully, this seems to be related to the TODO you
> put (which may be a new addition wrt RFCv2, I can't remember). So, yes,
> I seriously think you should take care of the fact that, when this
> function is called for moving a domain from a cpupool to another, you
> need to move the vcpus from the old cpupool's runqueue to the new one's
> runqueue.
>
> Have you tested doing such operation (moving domains between cpupools)?
> Because, with the code looking like this, it seems a bit unlikely...

Hmm, I actually did test this operations. I migrate a dom from Pool-o
to the newly created cpupool test with credit scheduler. That's also
the scenario I showed in the cover page of this patch set. :-)
But I didn't try to migrate the domU back to the Pool-o and check if
its vcpus are inserted back to the RunQ.

Now I add the code of removing the vcpu from the RunQ, and tested it
was removed from the RunQ when I move the domU to the new cpupool.

As to remove the vcpu from the domain's vcpu list. I'm not operating
the vcpu list of the structure domain{}, which are used by other
components in Xen. Actually, we have a scheduler-specific private
structure rt_vcpu which uses its field runq_elem to link the vcpu to
the RunQ of the scheduler; We also have a scheduler-specific private
structure rt_dom to record the scheduler-specific data for the domain
(it's defined in sched_rt.c). Inside the rt_dom, it has a vcpu list to
record the scheduler-specific vcpus of this domain. When we move a
domU from a cpupool to another cpupool (say, with rt scheduler), the
domU's scheduler-specific vcpu should be moved to the destination
cpupool's RunQ. So we need to remove the scheduler-specific vcpu from
the source cpupool's RunQ by using this rt_vcpu_remove() and then
insert it to the dest. cpupool's RunQ by using the rt_vcpu_insert().

Does this make sense to you?

>
> Finally, the insert_vcpu hook is called from sched_init_vcpu() too,
> still in schedule.c. And in fact, all the other scheduler use this
> function to put a vcpu on its runqueue for the first time. Once you do
> this, you a fine behaviour, cpupool-wise, almost for free. You
> apparently don't. Why is that? If you'd go for a similar approach, you
> will get the same benefit. Where is it that you insert a vcpu in the
> RunQ for the first time? Again, why there and not here?

Added the runq_insert() statement in this function to add the vcpu to
the RunQ now.
In the old patch, the vcpu is inserted in the wake-up function. :-(

>> +     * update deadline info: When deadline is in the past,
>> +     * it need to be updated to the deadline of the current period,
>> +     * and replenish the budget
>> +     */
>> +    delta = now - svc->cur_deadline;
>> +    if ( delta >= 0 )
>> +    {
>> +        count = ( delta/MICROSECS(svc->period) ) + 1;
>> +        svc->cur_deadline += count * MICROSECS(svc->period);
>> +        svc->cur_budget = svc->budget * 1000;
>> +
> Ditto, about time units and conversions.
>
> I also am sure I said before that I prefer an approach like:
>
>     while ( svc->cur_deadline < now )
>         svc->cur_deadline += svc->period;+/*

I explained in the previous comment. :-) Now could be  several periods
late after the current deadline. And this is deferrable server. :-)

>> +    delta = now - svc->last_start;
>> +    /*
>> +     * delta < 0 only happens in nested virtualization;
>> +     * TODO: how should we handle delta < 0 in a better way? */
>>
> Ah, nice to hear you tested this in nested virt? Whay config is that,
> Xen on top of VMWare (I'm assuming this from what I've seen on the
> rt-xen mailing list).

I run Xen in VirtualBox on MacBookAir.  The VirtualBox needs to enable
the Intel VT-x in the VM's configuration. Actually, I developed the
code in VirtualBox which can reboot much faster. After it works well
in virtualBox, I tested it on the bare machine.

>
>> +    if ( delta < 0 )
>> +    {
>> +        printk("%s, ATTENTION: now is behind last_start! delta = %ld for ",
>> +                __func__, delta);
>> +        rt_dump_vcpu(svc);
>> +        svc->last_start = now;  /* update last_start */
>> +        svc->cur_budget = 0;   /* FIXME: should we recover like this? */
>> +        return;
>> +    }
>> +
> Bha, this just should not happen, and if it does, it's either a bug or
> an hardware problem, isn't it? I don't think you should do much more
> than this. I'd remove the rt_dump_vcpu() but, in this case, I'm fine
> with the printk(), as it's something that really should not happen, and
> we want to inform sysadmin that something has gone very bad. :-)

When I run Xen in virtualBox and configure 4 (virtual) cores to the VM
of the virtualBox, these 4 cores are four threads for my host machine
(i.e., Macbookair). I'm guessing if the four virtual cores do not have
a good (or timely) time synchronization, when a vcpu migrates to
another virtual core, the time might be late after its last_start
time.

This never happens when I run Xen on the bare machine, but happens
when I run Xen inside virtualBox on Macbookair.


>> +    if ( svc->cur_budget == 0 )
>> +        return;
>> +
>> +    svc->cur_budget -= delta;
>> +    if ( svc->cur_budget < 0 )
>> +        svc->cur_budget = 0;
>> +
> I see you're not dealing with overruns (where with dealing I mean try to
> compensate). That's fine for now, let's leave this as a future
> development.
>

Sure.

>> +/*
>> + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
>> + * lock is grabbed before calling this function
>> + */
>> +static struct rt_vcpu *
>> +__runq_pick(const struct scheduler *ops, cpumask_t mask)
>> +{
>> +    struct list_head *runq = RUNQ(ops);
>> +    struct list_head *iter;
>> +    struct rt_vcpu *svc = NULL;
>> +    struct rt_vcpu *iter_svc = NULL;
>> +    cpumask_t cpu_common;
>> +    cpumask_t *online;
>> +    struct rt_private * prv = RT_PRIV(ops);
>> +
>> +    list_for_each(iter, runq)
>> +    {
>> +        iter_svc = __runq_elem(iter);
>> +
>> +        /* flag vcpu */
>> +        if(iter_svc->sdom == NULL)
>> +            break;
>> +
>> +        /* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */
>> +        online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
>> +        cpumask_and(&cpu_common, online, &prv->cpus);
>> +        cpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity);
>> +        cpumask_and(&cpu_common, &mask, &cpu_common);
>> +        if ( cpumask_empty(&cpu_common) )
>> +            continue;
>> +
> I remember asking this in
> http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html:
>
> "What's in priv->cpus, BTW? How is it different form the cpupool online
> mask (returned in 'online' by cpupool_scheduler_cpumask() )?"
>
> while I don't remember having received an answer, and I see it's still.
> here in the code. Am I missing something? If not, can you explain?

prv->cpus is the online cpus for the scheduler.  It should have the
same value with the cpupool_scheduler_cpumask().

Should I remove this? (I plan to release the next version this
weekend, I will remove this after receiving the  command. :-))


Thank you very much for your time and review!
Hope I have solved all of them, except the ones I asked in the email.
(I think I solved all of them now. :-P )

Thanks,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-07  3:56     ` Meng Xu
@ 2014-09-08 10:33       ` Dario Faggioli
  2014-09-09 13:43         ` Meng Xu
  0 siblings, 1 reply; 72+ messages in thread
From: Dario Faggioli @ 2014-09-08 10:33 UTC (permalink / raw)
  To: Meng Xu
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb


[-- Attachment #1.1: Type: text/plain, Size: 19064 bytes --]

On sab, 2014-09-06 at 23:56 -0400, Meng Xu wrote:
> Hi Dario,
> 
> I have modified the code based on your comments in the previous email.
>
Ok. :-)

Let me reply to some of your replies, then I'll go looking at the new
version of the series.

> >> +/*
> >> + * Useful inline functions
> >> + */
> >> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
> >> +{
> >> +    return _ops->sched_data;
> >> +}
> >> +
> >> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
> >> +{
> >> +    return _vcpu->sched_priv;
> >> +}
> >> +
> >> +static inline struct rt_dom *RT_DOM(const struct domain *_dom)
> >> +{
> >> +    return _dom->sched_priv;
> >> +}
> >> +
> >> +static inline struct list_head *RUNQ(const struct scheduler *_ops)
> >> +{
> >> +    return &RT_PRIV(_ops)->runq;
> >> +}
> >> +
> > I see what's happening, and I remember the suggestion of using static
> > inline-s, with which I agree. At that point, however, I'd have the
> > function names lower case-ed.
> >
> > It's probably not a big deal, and I don't think we have any official
> > saying about this, but it looks more consistent to me.
> 
> Because other schedulers uses the upper-case name, (because they are
> macros instead of static inline,) in order to keep the style
> consistent with other schedulers, I think I will keep the name in
> upper-case. 
>
All upper case is to stress the fact that these are macro. I find that
making the reader aware of something like this (i.e., whether he's
dealing with a macro or a function), is much more important than
consistency between different and unrelated source files.

I recommend lower casing these, although, I guess it's George's, and
maybe Jan's, call to actually decide.

George?

> I can send another patch set to change all other
> schedulers'  macro definition of these functions to static inline, and
> then change the name to lower-case. What do you think?
> 
No need to do that.

> >> +    printk("cpupool=%s\n", cpustr);
> >> +
> >> +    /* TRACE */
> >> +    {
> >> +        struct {
> >> +            unsigned dom:16,vcpu:16;
> >> +            unsigned processor;
> >> +            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, cur_deadline_hi;
> >> +            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
> >> +        } d;
> >> +        d.dom = svc->vcpu->domain->domain_id;
> >> +        d.vcpu = svc->vcpu->vcpu_id;
> >> +        d.processor = svc->vcpu->processor;
> >> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
> >> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
> >> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
> >> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
> >> +        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
> >> +        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
> >> +        trace_var(TRC_RT_VCPU_DUMP, 1,
> >> +                  sizeof(d),
> >> +                  (unsigned char *)&d);
> >> +    }
> > Not a too big deal, but is it really useful to trace vcpu dumps? How?
> 
> Actually, this function is called in other functions in this file.
>
Yeah, I saw that. And IIRC, I asked to remove pretty much all of the
calls of this function from around the file, except then one from the
handling of the debug key that is actually meant at dumping these info.

> Whenever we want to look into the details of a vcpu (for example, when
> we insert a vcpu to the RunQ), we call this function and it will trace
> the vcpu information. 
>
Which does not make sense. It does (I still find it very 'chatty',
though), for developing and debugging reasons, but there should not be
anything like this in an upstreamable patch series.

Also, you're filling up the trace of TRC_RT_VCPU_DUMP events while what
you're doing is not actually that, but rather, allocating a new vcpu,
deallocating, etc.

If you need to see these events, add specific tracing events, like
you're doing already for a bunch of other ones. You've done the right
thing (according to me, at least), introducing those tracing events and
tracing points. Broaden it, if you need, instead of "abusing" dumping
vcpus. :-)

> I extract this as a function to avoid writing
> this trace again and again in all of those functions, such as
> rt_alloc_vdata, rt_free_vdata, burn_budgets.
> 
Ditto. BTW, burning budget seems to me to have its own trace point, as I
am suggesting above, and not using this function. Doing the same
everywhere you think you need is TRT(^TM).

> In addition, we use xl sched-rt to dump the RunQ information to check
> the RunQ is properly sorted for debug reason.
> 
> When it is upstreamed, I will change this function to static inline,
> remove the printf in this function and leave the trace there.
> 
This makes even less sense! :-O

Dumping info via printk() is a valid mean of gathering debug info,
supported by all other schedulers, and by other Xen subsystem as well.
That happens whey a debug key is sent to the Xen console, and that
absolutely needs to stay.

What you need to do is not removing the printk. It's rather removing the
trace point from within here, and avoiding calling this function from
other place than debug key handling.

Also, from a methodology perspective, there is no "when upstream I will
change ...". What you send to xen-devel should always be the code, that,
if properly acked, gets committed upstream as it is, without any further
modification, either right before or right after that.

If you need some more aggressive debugging for your day-to-day
development (which is fine, I often do need something like that :-D),
what you can do is have it in a separate patch, at the bottom of the
series. Then, when sending the patches in, you just do not include that
one, and you're done. This is how I do it, at least, and it works for
me. Both plain git and tools like stgit or quilt/guilt makes it very
easy to deal with this workflow.


> >> +        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
> >>
> > Mmm... flag_vcpu, eh? I missed it above where it's declared, but I
> > really don't like the name. 'depleted_vcpus'? Or something else (sorry,
> > not very creative in this very moment :-))... but flag_vcpu, I found it
> > confusing.
> 
> It's declared in struct rt_private. It's inited in rt_init function.
> 
> The reason why I need this vcpu is:
> The RunQ has two parts, the first part is the vcpus with budget and
> the first part is sorted based on priority of vcpus. The second part
> [...]
>
Wow, wo, woow... hold your horses! :-P :-P

I saw where it lives, and I understood what's it for. All I was asking
was to change the name of the variable. :-)

> >> + */
> >> +static void *
> >> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
> >> +{
> >> +    struct rt_private *prv = RT_PRIV(ops);
> >> +
> >> +    cpumask_set_cpu(cpu, &prv->cpus);
> >> +
> >> +    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> >> +
> >> +    printtime();
> >> +    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
> >> +    /* same as credit2, not a bogus pointer */
> >> +    return (void *)1;
> >>
> > Can you put in the comment the reason why it is ok to do this, without
> > referencing credit2?
> 
> Explained in the comment in next patch. In schedule.c, they use the
> return value to check if this function correctly allocate pdata by
> checking if it returns 1. (Well, this is inconsistent with other code
> which uses 0 to indicate success.) The function definition needs
> return a void *, so we have to cast the 1 to void *. That's the
> reason. :-P
> 
Same here. I do see the reason. My point was: "Either state the reason
in the comment, or remove it." The purpose of a comment like this should
be to explain _why_ something is done in a certain way. Now, about this
code, either one sees (or is able to quickly find out) the reason for
the (void*)1, bu herself, in which case the comment is useless anyway.
OTOH, if one needs help in understanding that, you're not helping him
much by saying <<dude, this is fine, credit2 does the same thing!>>,
i.e., the comments could have been useful in this case, but it's useless
again.

So, if I'd have to choose between an useless comment and no comment, I'd
go for the latter. And that's what I'm saying: make it a useful comments
for someone different than you (or even for you, e.g., in 5 years from
now :-P), or kill it. :-)

> >> +    svc->period = RT_DEFAULT_PERIOD;
> >> +    if ( !is_idle_vcpu(vc) )
> >> +        svc->budget = RT_DEFAULT_BUDGET;
> >> +
> >> +    count = (now/MICROSECS(svc->period)) + 1;
> >> +    /* sync all VCPU's start time to 0 */
> >> +    svc->cur_deadline += count * MICROSECS(svc->period);
> >> +
> > why "+="? What value do you expect cur_deadline to hold at this time?
> 
> It should be =. This does not cause a bug because it's in the
> rt_alloc_vdata() and the svc->cur_deadline is 0.  But "+=" is not
> correct in logic. I modified it. Thank you!
> 
I appreciate it does not harm, but it's harder to read, so thanks for
changing it.

> >
> > Also, unless I'm missing something, or doing the math wrong, what you
> > want to do here is place the deadline one period ahead of NOW().
> 
> Ah, no. I think you are thinking about the CBS server mechanisms. (Now
> I start to use some notations in real-time academic field) For the
> deferrable server, if the release offset of a implicit-deadline
> deferrable vcpu is O_i, its deadline will be O_i + p_i * k, where k is
> a natural number. So we want to set deadline to the end of the period
> during which NOW() is fall in.
> 
Mmm... ok, yes, I have to admit I probably was not considering that. So,
for the first time you're setting the deadline, which is what's
happening here, I don't think I see a quick way to avoid the div+1, to
emulate the ceiling.

It probably can still be avoided in other places, though. In fact, in
that case, you're advancing the deadline (perhaps more than one time)
from the a previous one, and that should manage to get you to the end of
the right period, shouldn't it. Anyway, I guess I'll comment more (if
I'll find it necessary) about this directly on v2.

I'm still convinced about the time conversion part of what I said about
this code, though.

> >> +    /* Debug only: dump new vcpu's info */
> >> +    printtime();
> >> +    rt_dump_vcpu(svc);
> >> +
> > As said before, this has to go, quite possibly by means of replacing it
> > by some tracing.
> 
> When no more other comments to this patch set, I will make
> rt_dump_vcpu as a static inline function and it only has the tracing
> inside. Is that OK? Now I just want to use some dump to help me with
> some quick debug. :-)
> 
Please don't. Do ditch all the occurrences of this, and add tracing if
required.

Personally, I think that in most of the places where you are now calling
rt_dump_vcpu(), you don't even need to add tracing. Perhaps do what I
said above about keeping a private debug patch. In this spot here, I
wouldn't see it bad to have a trace point, and one with its own specific
event and data, as setting the first deadline is quite important an
event to see in a trace, if you what to understand how things works/are
going.

> > Can you please re-look at
> > http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html , and
> > either reply/explain, or cope with that? In summary, what I was asking
> > was whether adding/removing the vcpu to the list of vcpus of a domain
> > could be done somewhere or somehow else, as doing it like this means,
> > for some code-path, removing and then re-adding it right away.
> >
> > Looking more carefully, I do see that credit2 does the same thing but,
> > for one, that doesn't make it perfect (I still don't like it :-P), and
> > for two, credit2 does a bunch of other stuff there, that you don't.
> >
> > Looking even more carefully, this seems to be related to the TODO you
> > put (which may be a new addition wrt RFCv2, I can't remember). So, yes,
> > I seriously think you should take care of the fact that, when this
> > function is called for moving a domain from a cpupool to another, you
> > need to move the vcpus from the old cpupool's runqueue to the new one's
> > runqueue.
> >
> > Have you tested doing such operation (moving domains between cpupools)?
> > Because, with the code looking like this, it seems a bit unlikely...
> 
> Hmm, I actually did test this operations. I migrate a dom from Pool-o
> to the newly created cpupool test with credit scheduler. That's also
> the scenario I showed in the cover page of this patch set. :-)
>
Yes, I saw it. TBF, you only have part of what allows to tell if things
are properly working after the migration, e.g., the output of `xl
vcpu-list', to see actually where the vcpus are running is missing.

In any case, working or not, I think the right thing to do is having
some runq management bits in these functions.

> Now I add the code of removing the vcpu from the RunQ, and tested it
> was removed from the RunQ when I move the domU to the new cpupool.
> 
Perfect, thanks. Let's comment and try this then.

> As to remove the vcpu from the domain's vcpu list. I'm not operating
> the vcpu list of the structure domain{}, which are used by other
> components in Xen. 
>
I know.

> Actually, we have a scheduler-specific private
> structure rt_vcpu which uses its field runq_elem to link the vcpu to
> the RunQ of the scheduler; 
>
I know.

> We also have a scheduler-specific private
> structure rt_dom to record the scheduler-specific data for the domain
> (it's defined in sched_rt.c). 
>
I know.

> Inside the rt_dom, it has a vcpu list to
> record the scheduler-specific vcpus of this domain. 
>
I know.

> When we move a
> domU from a cpupool to another cpupool (say, with rt scheduler), the
> domU's scheduler-specific vcpu should be moved to the destination
> cpupool's RunQ. So we need to remove the scheduler-specific vcpu from
> the source cpupool's RunQ by using this rt_vcpu_remove() and then
> insert it to the dest. cpupool's RunQ by using the rt_vcpu_insert().
> 
I know. And I think this is exactly what you are *NOT* doing in v1.
Since you often looks at what credit2, does, you can see it includes a
call to runq_assign(), the equivalent of which, is what you're missing.

> > Finally, the insert_vcpu hook is called from sched_init_vcpu() too,
> > still in schedule.c. And in fact, all the other scheduler use this
> > function to put a vcpu on its runqueue for the first time. Once you do
> > this, you a fine behaviour, cpupool-wise, almost for free. You
> > apparently don't. Why is that? If you'd go for a similar approach, you
> > will get the same benefit. Where is it that you insert a vcpu in the
> > RunQ for the first time? Again, why there and not here?
> 
> Added the runq_insert() statement in this function to add the vcpu to
> the RunQ now.
> In the old patch, the vcpu is inserted in the wake-up function. :-(
> 
Right, which may be what makes things (at least partially) work when
cpupool are involved as well.

> >> +     * update deadline info: When deadline is in the past,
> >> +     * it need to be updated to the deadline of the current period,
> >> +     * and replenish the budget
> >> +     */
> >> +    delta = now - svc->cur_deadline;
> >> +    if ( delta >= 0 )
> >> +    {
> >> +        count = ( delta/MICROSECS(svc->period) ) + 1;
> >> +        svc->cur_deadline += count * MICROSECS(svc->period);
> >> +        svc->cur_budget = svc->budget * 1000;
> >> +
> > Ditto, about time units and conversions.
> >
> > I also am sure I said before that I prefer an approach like:
> >
> >     while ( svc->cur_deadline < now )
> >         svc->cur_deadline += svc->period;+/*
> 
> I explained in the previous comment. :-) Now could be  several periods
> late after the current deadline. And this is deferrable server. :-)
> 
Are you sure this is not ok this time? As I said, I agree the count
thing is right when assigning the first deadline. However:
 1) I know it can be several periods away, that's the purpose of the 
    while()
 2) advancing in steps of period, starting from the last set deadline, 
    should get you to the end of the right period.

Or am I missing something else? :-)

> >> +    if ( delta < 0 )
> >> +    {
> >> +        printk("%s, ATTENTION: now is behind last_start! delta = %ld for ",
> >> +                __func__, delta);
> >> +        rt_dump_vcpu(svc);
> >> +        svc->last_start = now;  /* update last_start */
> >> +        svc->cur_budget = 0;   /* FIXME: should we recover like this? */
> >> +        return;
> >> +    }
> >> +
> > Bha, this just should not happen, and if it does, it's either a bug or
> > an hardware problem, isn't it? I don't think you should do much more
> > than this. I'd remove the rt_dump_vcpu() but, in this case, I'm fine
> > with the printk(), as it's something that really should not happen, and
> > we want to inform sysadmin that something has gone very bad. :-)
> 
> When I run Xen in virtualBox and configure 4 (virtual) cores to the VM
> of the virtualBox, these 4 cores are four threads for my host machine
> (i.e., Macbookair). I'm guessing if the four virtual cores do not have
> a good (or timely) time synchronization, when a vcpu migrates to
> another virtual core, the time might be late after its last_start
> time.
> 
That's likely what happens, yes.

> > "What's in priv->cpus, BTW? How is it different form the cpupool online
> > mask (returned in 'online' by cpupool_scheduler_cpumask() )?"
> >
> > while I don't remember having received an answer, and I see it's still.
> > here in the code. Am I missing something? If not, can you explain?
> 
> prv->cpus is the online cpus for the scheduler.  It should have the
> same value with the cpupool_scheduler_cpumask().
> 
Should?

> Should I remove this? (I plan to release the next version this
> weekend, I will remove this after receiving the  command. :-))
> 
Well, you tell me. :-) Is there a particular reason why you're keeping
the same information in two places? If there's one, explain it to us. If
there's no, well... :-)

> Thank you very much for your time and review!
> Hope I have solved all of them, except the ones I asked in the email.
> (I think I solved all of them now. :-P )
> 
Thanks to you for taking care of them.

I'll have a look at v2 and let you know.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v1 1/4] xen: add real time scheduler rt
  2014-09-08 10:33       ` Dario Faggioli
@ 2014-09-09 13:43         ` Meng Xu
  0 siblings, 0 replies; 72+ messages in thread
From: Meng Xu @ 2014-09-09 13:43 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap,
	Chenyang Lu, Ian Jackson, xen-devel, Linh Thi Xuan Phan, Meng Xu,
	Jan Beulich, Chao Wang, Chong Li, Dagaen Golomb

2014-09-08 6:33 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On sab, 2014-09-06 at 23:56 -0400, Meng Xu wrote:
>> Hi Dario,
>>
>> I have modified the code based on your comments in the previous email.
>>
> Ok. :-)
>
> Let me reply to some of your replies, then I'll go looking at the new
> version of the series.
>
>> >> +/*
>> >> + * Useful inline functions
>> >> + */
>> >> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
>> >> +{
>> >> +    return _ops->sched_data;
>> >> +}
>> >> +
>> >> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
>> >> +{
>> >> +    return _vcpu->sched_priv;
>> >> +}
>> >> +
>> >> +static inline struct rt_dom *RT_DOM(const struct domain *_dom)
>> >> +{
>> >> +    return _dom->sched_priv;
>> >> +}
>> >> +
>> >> +static inline struct list_head *RUNQ(const struct scheduler *_ops)
>> >> +{
>> >> +    return &RT_PRIV(_ops)->runq;
>> >> +}
>> >> +
>> > I see what's happening, and I remember the suggestion of using static
>> > inline-s, with which I agree. At that point, however, I'd have the
>> > function names lower case-ed.
>> >
>> > It's probably not a big deal, and I don't think we have any official
>> > saying about this, but it looks more consistent to me.
>>
>> Because other schedulers uses the upper-case name, (because they are
>> macros instead of static inline,) in order to keep the style
>> consistent with other schedulers, I think I will keep the name in
>> upper-case.
>>
> All upper case is to stress the fact that these are macro. I find that
> making the reader aware of something like this (i.e., whether he's
> dealing with a macro or a function), is much more important than
> consistency between different and unrelated source files.
>
> I recommend lower casing these, although, I guess it's George's, and
> maybe Jan's, call to actually decide.
>
> George?

I saw George's reply. Now I will change them to lower-case.
>
>> I can send another patch set to change all other
>> schedulers'  macro definition of these functions to static inline, and
>> then change the name to lower-case. What do you think?
>>
> No need to do that.

Roger.

>
>> >> +    printk("cpupool=%s\n", cpustr);
>> >> +
>> >> +    /* TRACE */
>> >> +    {
>> >> +        struct {
>> >> +            unsigned dom:16,vcpu:16;
>> >> +            unsigned processor;
>> >> +            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, cur_deadline_hi;
>> >> +            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
>> >> +        } d;
>> >> +        d.dom = svc->vcpu->domain->domain_id;
>> >> +        d.vcpu = svc->vcpu->vcpu_id;
>> >> +        d.processor = svc->vcpu->processor;
>> >> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
>> >> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
>> >> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
>> >> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
>> >> +        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
>> >> +        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
>> >> +        trace_var(TRC_RT_VCPU_DUMP, 1,
>> >> +                  sizeof(d),
>> >> +                  (unsigned char *)&d);
>> >> +    }
>> > Not a too big deal, but is it really useful to trace vcpu dumps? How?
>>
>> Actually, this function is called in other functions in this file.
>>
> Yeah, I saw that. And IIRC, I asked to remove pretty much all of the
> calls of this function from around the file, except then one from the
> handling of the debug key that is actually meant at dumping these info.

I see. Then I will remove them.

>
>> Whenever we want to look into the details of a vcpu (for example, when
>> we insert a vcpu to the RunQ), we call this function and it will trace
>> the vcpu information.
>>
> Which does not make sense. It does (I still find it very 'chatty',
> though), for developing and debugging reasons, but there should not be
> anything like this in an upstreamable patch series.
>
> Also, you're filling up the trace of TRC_RT_VCPU_DUMP events while what
> you're doing is not actually that, but rather, allocating a new vcpu,
> deallocating, etc.
>
> If you need to see these events, add specific tracing events, like
> you're doing already for a bunch of other ones. You've done the right
> thing (according to me, at least), introducing those tracing events and
> tracing points. Broaden it, if you need, instead of "abusing" dumping
> vcpus. :-)

I see. Then I will remove these dumps and add more trace events.

>
>> I extract this as a function to avoid writing
>> this trace again and again in all of those functions, such as
>> rt_alloc_vdata, rt_free_vdata, burn_budgets.
>>
> Ditto. BTW, burning budget seems to me to have its own trace point, as I
> am suggesting above, and not using this function. Doing the same
> everywhere you think you need is TRT(^TM).

OK. Now I understand and will remove those dumps.

>
>> In addition, we use xl sched-rt to dump the RunQ information to check
>> the RunQ is properly sorted for debug reason.
>>
>> When it is upstreamed, I will change this function to static inline,
>> remove the printf in this function and leave the trace there.
>>
> This makes even less sense! :-O
>
> Dumping info via printk() is a valid mean of gathering debug info,
> supported by all other schedulers, and by other Xen subsystem as well.
> That happens whey a debug key is sent to the Xen console, and that
> absolutely needs to stay.
>
> What you need to do is not removing the printk. It's rather removing the
> trace point from within here, and avoiding calling this function from
> other place than debug key handling.
>
> Also, from a methodology perspective, there is no "when upstream I will
> change ...". What you send to xen-devel should always be the code, that,
> if properly acked, gets committed upstream as it is, without any further
> modification, either right before or right after that.

Thank you very much for correcting my incorrect perspective. :-)

>
> If you need some more aggressive debugging for your day-to-day
> development (which is fine, I often do need something like that :-D),
> what you can do is have it in a separate patch, at the bottom of the
> series. Then, when sending the patches in, you just do not include that
> one, and you're done. This is how I do it, at least, and it works for
> me. Both plain git and tools like stgit or quilt/guilt makes it very
> easy to deal with this workflow.

This is a very good approach! I will do that!

>
>
>> >> +        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
>> >>
>> > Mmm... flag_vcpu, eh? I missed it above where it's declared, but I
>> > really don't like the name. 'depleted_vcpus'? Or something else (sorry,
>> > not very creative in this very moment :-))... but flag_vcpu, I found it
>> > confusing.
>>
>> It's declared in struct rt_private. It's inited in rt_init function.
>>
>> The reason why I need this vcpu is:
>> The RunQ has two parts, the first part is the vcpus with budget and
>> the first part is sorted based on priority of vcpus. The second part
>> [...]
>>
> Wow, wo, woow... hold your horses! :-P :-P
>
> I saw where it lives, and I understood what's it for. All I was asking
> was to change the name of the variable. :-)

Haha. :-P

I also saw George's comment on this. It seems that both you and George
have the idea of using two separate queue instead of one RunQ with a
flag vcpu. Using two separate queue is easier for developers to
understand. So I will change it to use two separate queues and add
some simple helper functions.

>
>> >> + */
>> >> +static void *
>> >> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
>> >> +{
>> >> +    struct rt_private *prv = RT_PRIV(ops);
>> >> +
>> >> +    cpumask_set_cpu(cpu, &prv->cpus);
>> >> +
>> >> +    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
>> >> +
>> >> +    printtime();
>> >> +    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
>> >> +    /* same as credit2, not a bogus pointer */
>> >> +    return (void *)1;
>> >>
>> > Can you put in the comment the reason why it is ok to do this, without
>> > referencing credit2?
>>
>> Explained in the comment in next patch. In schedule.c, they use the
>> return value to check if this function correctly allocate pdata by
>> checking if it returns 1. (Well, this is inconsistent with other code
>> which uses 0 to indicate success.) The function definition needs
>> return a void *, so we have to cast the 1 to void *. That's the
>> reason. :-P
>>
> Same here. I do see the reason. My point was: "Either state the reason
> in the comment, or remove it." The purpose of a comment like this should
> be to explain _why_ something is done in a certain way. Now, about this
> code, either one sees (or is able to quickly find out) the reason for
> the (void*)1, bu herself, in which case the comment is useless anyway.
> OTOH, if one needs help in understanding that, you're not helping him
> much by saying <<dude, this is fine, credit2 does the same thing!>>,
> i.e., the comments could have been useful in this case, but it's useless
> again.
>
> So, if I'd have to choose between an useless comment and no comment, I'd
> go for the latter. And that's what I'm saying: make it a useful comments
> for someone different than you (or even for you, e.g., in 5 years from
> now :-P), or kill it. :-)

I see the point! This is a very useful guideline of adding comments! :-)
In the version 2 patch set I sent, it adds the comments  /* 1
indicates alloc. succeed in schedule.c */, Is this ok? (I think this
could be useful for other developers because it tells them how the
return value will be used.)

>
>> >> +    svc->period = RT_DEFAULT_PERIOD;
>> >> +    if ( !is_idle_vcpu(vc) )
>> >> +        svc->budget = RT_DEFAULT_BUDGET;
>> >> +
>> >> +    count = (now/MICROSECS(svc->period)) + 1;
>> >> +    /* sync all VCPU's start time to 0 */
>> >> +    svc->cur_deadline += count * MICROSECS(svc->period);
>> >> +
>> > why "+="? What value do you expect cur_deadline to hold at this time?
>>
>> It should be =. This does not cause a bug because it's in the
>> rt_alloc_vdata() and the svc->cur_deadline is 0.  But "+=" is not
>> correct in logic. I modified it. Thank you!
>>
> I appreciate it does not harm, but it's harder to read, so thanks for
> changing it.
>
>> >
>> > Also, unless I'm missing something, or doing the math wrong, what you
>> > want to do here is place the deadline one period ahead of NOW().
>>
>> Ah, no. I think you are thinking about the CBS server mechanisms. (Now
>> I start to use some notations in real-time academic field) For the
>> deferrable server, if the release offset of a implicit-deadline
>> deferrable vcpu is O_i, its deadline will be O_i + p_i * k, where k is
>> a natural number. So we want to set deadline to the end of the period
>> during which NOW() is fall in.
>>
> Mmm... ok, yes, I have to admit I probably was not considering that. So,
> for the first time you're setting the deadline, which is what's
> happening here, I don't think I see a quick way to avoid the div+1, to
> emulate the ceiling.
>
> It probably can still be avoided in other places, though. In fact, in
> that case, you're advancing the deadline (perhaps more than one time)
> from the a previous one, and that should manage to get you to the end of
> the right period, shouldn't it. Anyway, I guess I'll comment more (if
> I'll find it necessary) about this directly on v2.
>
> I'm still convinced about the time conversion part of what I said about
> this code, though.

Sure! I will avoid using the division when advance the deadline.
As to time conversion part, all code in hypervisor is in nanosecond.
The only time when we do the conversion is in function rt_dom_cntl,
when we set/get vcpus' parameters (since the toolstack use us as time
unit and hypervisor uses ns as time unit).

>
>> >> +     * update deadline info: When deadline is in the past,
>> >> +     * it need to be updated to the deadline of the current period,
>> >> +     * and replenish the budget
>> >> +     */
>> >> +    delta = now - svc->cur_deadline;
>> >> +    if ( delta >= 0 )
>> >> +    {
>> >> +        count = ( delta/MICROSECS(svc->period) ) + 1;
>> >> +        svc->cur_deadline += count * MICROSECS(svc->period);
>> >> +        svc->cur_budget = svc->budget * 1000;
>> >> +
>> > Ditto, about time units and conversions.
>> >
>> > I also am sure I said before that I prefer an approach like:
>> >
>> >     while ( svc->cur_deadline < now )
>> >         svc->cur_deadline += svc->period;+/*
>>
>> I explained in the previous comment. :-) Now could be  several periods
>> late after the current deadline. And this is deferrable server. :-)
>>
> Are you sure this is not ok this time? As I said, I agree the count
> thing is right when assigning the first deadline. However:
>  1) I know it can be several periods away, that's the purpose of the
>     while()
>  2) advancing in steps of period, starting from the last set deadline,
>     should get you to the end of the right period.
>
> Or am I missing something else? :-)

Yes, you are right! I will change it to the while lool instead of
using the division. :-P


>
>> > "What's in priv->cpus, BTW? How is it different form the cpupool online
>> > mask (returned in 'online' by cpupool_scheduler_cpumask() )?"
>> >
>> > while I don't remember having received an answer, and I see it's still.
>> > here in the code. Am I missing something? If not, can you explain?
>>
>> prv->cpus is the online cpus for the scheduler.  It should have the
>> same value with the cpupool_scheduler_cpumask().
>>
> Should?
>
>> Should I remove this? (I plan to release the next version this
>> weekend, I will remove this after receiving the  command. :-))
>>
> Well, you tell me. :-) Is there a particular reason why you're keeping
> the same information in two places? If there's one, explain it to us. If
> there's no, well... :-)

Will remove in version 3. :-)

Thank you again!

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2014-09-09 13:43 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-24 22:58 Introduce rt real-time scheduler for Xen Meng Xu
2014-08-24 22:58 ` [PATCH v1 1/4] xen: add real time scheduler rt Meng Xu
2014-08-26 14:27   ` Jan Beulich
2014-08-27  2:07     ` Meng Xu
2014-08-27  6:26       ` Jan Beulich
2014-08-27 14:28         ` Meng Xu
2014-08-27 15:04           ` Jan Beulich
2014-08-28 16:06             ` Meng Xu
2014-08-29  9:05               ` Jan Beulich
2014-08-29 19:35                 ` Meng Xu
2014-09-03 14:08                 ` George Dunlap
2014-09-03 14:24                   ` Meng Xu
2014-09-03 14:35                     ` Dario Faggioli
2014-09-03 13:40   ` George Dunlap
2014-09-03 14:11     ` Meng Xu
2014-09-03 14:15       ` George Dunlap
2014-09-03 14:35         ` Meng Xu
2014-09-05  9:46     ` Dario Faggioli
2014-09-03 14:20   ` George Dunlap
2014-09-03 14:45     ` Jan Beulich
2014-09-03 14:59     ` Dario Faggioli
2014-09-03 15:27       ` Meng Xu
2014-09-03 15:46         ` Dario Faggioli
2014-09-03 17:13           ` George Dunlap
2014-09-03 15:13     ` Meng Xu
2014-09-03 16:06       ` George Dunlap
2014-09-03 16:57         ` Dario Faggioli
2014-09-03 17:18           ` George Dunlap
2014-09-04  2:15             ` Meng Xu
2014-09-04 14:27             ` Dario Faggioli
2014-09-04 15:30               ` Meng Xu
2014-09-05  9:36                 ` Dario Faggioli
2014-09-05 15:06                   ` Meng Xu
2014-09-05 15:09                     ` Dario Faggioli
2014-09-04  2:11         ` Meng Xu
2014-09-04 11:00           ` Dario Faggioli
2014-09-04 13:03           ` George Dunlap
2014-09-04 14:00             ` Meng Xu
2014-09-05 17:17   ` Dario Faggioli
2014-09-07  3:56     ` Meng Xu
2014-09-08 10:33       ` Dario Faggioli
2014-09-09 13:43         ` Meng Xu
2014-08-24 22:58 ` [PATCH v1 2/4] libxc: add rt scheduler Meng Xu
2014-09-05 10:34   ` Dario Faggioli
2014-09-05 17:17     ` Meng Xu
2014-09-05 17:50       ` Dario Faggioli
2014-08-24 22:58 ` [PATCH v1 3/4] libxl: " Meng Xu
2014-08-25 13:17   ` Wei Liu
2014-08-25 15:55     ` Meng Xu
2014-08-26  9:51       ` Wei Liu
2014-09-03 15:33   ` George Dunlap
2014-09-03 20:52     ` Meng Xu
2014-09-04 14:27     ` George Dunlap
2014-09-04 14:45       ` Dario Faggioli
2014-09-04 14:47       ` Meng Xu
2014-09-04 14:51         ` George Dunlap
2014-09-04 15:07           ` Meng Xu
2014-09-04 15:44             ` Dario Faggioli
2014-09-04 15:55               ` George Dunlap
2014-09-04 16:12                 ` Meng Xu
2014-09-05  9:19                   ` Dario Faggioli
2014-09-04 15:25         ` Dario Faggioli
2014-09-05 10:21   ` Dario Faggioli
2014-09-05 15:45     ` Meng Xu
2014-09-05 17:41       ` Dario Faggioli
2014-08-24 22:58 ` [PATCH v1 4/4] xl: introduce " Meng Xu
2014-08-25 13:31   ` Wei Liu
2014-08-25 16:12     ` Meng Xu
2014-09-03 15:52   ` George Dunlap
2014-09-03 22:28     ` Meng Xu
2014-09-05  9:40       ` Dario Faggioli
2014-09-05 14:43         ` Meng Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.