On Fri, 2015-06-12 at 15:48 -0500, Chong Li wrote:
> If no more feedbacks, let me summarize the design for the next version.
> 
> For "get" operations, we will implement the following features:
> 
> 1) Use " xl sched-rtds -v all " to output the per-dom parameters of
> all domains. And use, e.g., " xl sched-rtds -d vm1 -v all ", to output
> the per-dom parameters of one specific domain. 
>
Well, I said already that I think the distinction between per-dom and
per-vcpu parameters is meaningless, if done this way.

A parameter is either per-domain or per-vcpu, no matter how the user try
to set or get it. In RTDS, all parameters are per-domain now and, with
your work, all of them are becoming per-vcpu, and that's ok. But then,
per-dom parameters should just no longer exist.

This IMO should look as follows:

 # sched-rtds -d vm1 --> (a) outputs nothing (no per-domain params)
                         (b) outputs params for all vcpus
 # sched-rtds -d vm1 -v 2 --> outputs params for vcpu 2
 # sched-rtds -d vm1 -v all --> outputs params for all vcpus

 # sched-rtds -d vm1 -b 100 -p 1000 --> (c) errors out, as there's no 
                                            per-dom param!
                                        (d) sets params for all vcpus
 # sched-rtds -d vm1 -v all -b 100 -p 1000 --> sets params for all vcpus
 # sched-rtds -d vm1 -v 2 -b 100 -p 1000 --> sets params for vcpu 2

If, OTOH, e.g. in another scheduler, there are both kind of parameters,
then each set should be handled with its own cli params and library
interface.

For instance, let's assume that, for Credit1 and/or, we'll want to have
(implement, in case of Credit2) per-vcpu caps, but leave the weight
per-dom (which is not that unlikely). At the xl level, that should IMO
look as follows:

 # sched-credit2 -d vm1 --> (e) outputs per-dom params only (i.e., the
                                weight)
                            (f) outputs per-dom params (weight) and the 
                                per-vcpu params (the cap) for all vcpus
 # sched-credit2 -d vm1 -v 2 --> outputs cap of vcpu 2
 # sched-credit2 -d vm1 -v all --> outputs caps of all vcpus

 # sched-credit2 -d vm1 -w 128 --> set the weight to 128
 # sched-credit2 -d vm1 -v 2 -c 25 --> set the cap to 25% for vcpu 2
 # sched-credit2 -d vm1 -v 2 -w 128 --> errors out, weight is per-dom!
 # sched-credit2 -d vm1 -v all -c 25 --> set the cap to 25% for all  
                                         vcpus
 # sched-credit2 -d vm1 -c 25 --> (g) errors out, cap is per-vcpu!
                                  (h) sets the cap to 25% for all vcpus

And, for consistency, I'd go either with (a), (c), (e) and (g) OR with
(b), (d), (f) and (h).

> When a domain (say vm1)
> has vcpus with different scheduling parameters but meanwhile the user
> uses "xl sched-rtds -d vm1 -v all " to show the per-dom parameters,
> the actual output result is just the parameters of vcpu with ID=0
> (which is pointless, and should be made clear to the users).
> 
No, this just does not make any sense to me, even if you tell it to the
user. This is especially true at the xl level, where we can do pretty
much what we want, by calling the proper libxl functions.

> These two kinds of "get" operations would be implemented through
> libxl_domain_sched_params_get() and other domain-related functions (no
> changes to all these functions).
> 
This is the most tricky part, IMO, because of the stability
requirements, and I reiterate my request for feedback from George and
the tools' maintainers.

My view is the one I stated above (with all the differences due to the
fact that we are in a library and not in a command line tool). But
still, I think that the distinction between per-vcpu and per-domain
parameters should hit the library rather explicitly.

> We need some new data structures to support per-vcpu operations (for
> all schedulers, not just RTDS).
> 
> 1) In libxl, we will introduce:
> 
> libxl_vcpu_sched_params = Struct("vcpu_sched_params",[
>     ("vcpuid",       integer, { xxx some init val xxx}),
>     ("weight",       integer, {'init_val': 'LIBXL_PARAM_WEIGHT_DEFAULT'}),
>     ("cap",          integer, {'init_val': 'LIBXL_PARAM_CAP_DEFAULT'}),
>     ("period",       integer, {'init_val': 'LIBXL_PARAM_PERIOD_DEFAULT'}),
>     ("slice",        integer, {'init_val': 'LIBXL_PARAM_SLICE_DEFAULT'}),
>     ("latency",      integer, {'init_val': 'LIBXL_PARAM_LATENCY_DEFAULT'}),
>     ("extratime",    integer, {'init_val': 'LIBXL_PARAM_EXTRATIME_DEFAULT'}),
>     ("budget",       integer, {'init_val': 'LIBXL_PARAM_BUDGET_DEFAULT'}),
>     ])
> 
> libxl_sched_params = Struct("sched_params",[
>     ("sched",        libxl_scheduler),
>     ("vcpus",        Array(libxl_sched_params, "num_vcpus")),
>     ])
> 
This would allow, at some point, to turn all the existing scheduling
parameters of all the existing schedulers into per-vcpu parameters. This
may actually be a good thing, as, for instance (sticking to the example
above) if at some point we decide to also support per-vcpu weights, in
Credit1 and Credit2, the API won't have to change.

But then, in the implementation, you'll have, e.g., to deal with the
situation where someone tries to set the weight of a vcpu in Credit1
(and error out). Whether you do it in libxl, or rely on Xen telling
libxl that such thing is forbidden and having libxl propagate the error,
it's probably not a great deal. I'd recommend putting sanity checks in
libxl, though.

You'll have to do something similar for per-domain parameters in any
case, st least for the get side. In fact, we just can't touch:

libxl_domain_sched_params = Struct("domain_sched_params",[
    ("sched",        libxl_scheduler),
    ("weight",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
    ("cap",          integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}),
    ("period",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
    ("slice",        integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
    ("latency",      integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
    ("extratime",    integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
    ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
    ])

So, if for the set case, you can decide to loop through all the vcpus
_inside_ libxl, when one calls libxl_domain_sched_params_get() under
RTDS, where there are no per-domain parameters, and since you can't
return an array of per-vcpu parameters, you need to error out. This is a
change in the behavior of libxl_domain_sched_params_get(), and it's what
I'm less sure of about this interface I'm proposing, but it's how I'd do
it.

The alternative would be to avoid exposing per-domain only parameters in
the new libxl_vcpu_sched_params struct (which would mean, as far as your
work is concerned, only putting RTDS stuff in there). That would make
the API 'cleaner', for now, perhaps, but would require breaking it again
in future (e.g., it's rather likely that we will want per-vcpu caps in
Credit 1 and 2, sooner rather than later).

So, yes, I'd go for what you're showing above, but handle it as I
explained. Thoughts?

> 2) In xen, we will introduce:
> 
> struct xen_domctl_scheduler_op {
>     uint32_t sched_id;  /* XEN_SCHEDULER_* */
>     uint32_t cmd;       /* XEN_DOMCTL_SCHEDOP_* */
>     union {
>         xen_domctl_schedparam_t d;
>         struct {
>             XEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus;
>             uint16_t nr_vcpus;
>         } v;
>     } u;
> };
> typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_scheduler_op_t);
> 
Again, this allows for all parameters of all schedulers to be per-vcpu.
It may look too broad, considering that, for now, only a small subset of
them are, but it makes sense to me for this interface to be generic...
Then, inside the scheduling and the parameter handling code, you'll
enforce the proper semantic.

> Please correct me if something is wrong.
> 
BTW, thanks for this summary... It's important that we agree on what we
want, in order to avoid re-doing things too many times! :-)

What I tried to describe is what I think fits best, let me know if
there's something that is not clear.

Maintainers, your views?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)