On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
> The patch optimized the sorted credit2 runq from simple linked list
> to
> rb-tree implementation. This way we will gain performance and
> scalability when the number of vCPUs are huge.
> 
> Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -600,6 +601,29 @@ static inline bool has_cap(const struct
> csched2_vcpu *svc)
>      return svc->budget != STIME_MAX;
>  }
>  
> +static void runq_insert_rb(struct rb_root *root,
> +                           struct csched2_vcpu *svc,
> +                           int *pos)
>
I'd call this rb_insert() or rb_runq_insert(), but that's taste (and
it's certainly a minor thing).

> +{
> +    struct csched2_vcpu *entry = NULL;
> +    struct rb_node **node = &root->rb_node;
> +    struct rb_node *parent = NULL;
> +
> +    while (*node) {
> +        parent = *node;
> +        entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
> +        // Check if we are maintaining the sorted
>
Pointless comment. I'd leave the line blank (but that's taste again).

> +        if ( svc->credit < entry->credit )
> +            node = &parent->rb_left;
> +        else
> +            node = &parent->rb_right;
> +
> +        (*pos)++;
> +    }
> +    rb_link_node(&svc->runq_elem, parent, node);
> +    rb_insert_color(&svc->runq_elem, root);
> +}
>
Wait, where's the part where we cache which element is the one with the
highest credits? (And the same applies to the tree-removal function, of
course.)

In fact, we need a field for storing such a cache in the runqueue data
structure as well, and we need to keep it updated.

Linux (recently) added an rb-tree variant that do this internally, see
cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a
variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest
dl and rq leftmost caching".

So, I'd say that we either import that from there, or do the caching of
leftmost element explicitly where needed. Note that, however, that
Linux's variant caches leftmost, because they always use rb-trees for
structures where the head of the queue is the element with the
_smallest_ key.

In our case here, we want the queue ordered in descending credit order,
so it must be thought carefully whether or not we could use the new
variant (maybe "just" inverting the binary search tree relationship),
or we'd need another one that caches righmost.

I would suggest we do not try to use the rb_*_cached() functions, and
cache rightmost explicitly in runqueue_data.

> @@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops,
> int cpu,
>       * 2) If there's someone waiting whose credit is positive,
>       *    run until your credit ~= his.
>       */
> -    if ( ! list_empty(runq) )
> +    if ( ! RB_EMPTY_ROOT(runq) )
>      {
> -        struct csched2_vcpu *swait = runq_elem(runq->next);
> +        // Find the left most element, which is the most probable
> candidate
> +        struct rb_node *node = rb_first(runq);
> 
Err... is it? Isn't the leftmost element the one with the _least_
credits? It looks to me that we want rb_last().

And IAC, we don't want to have to traverse the tree to get the runnable
vcpu with the highest credit, we want it available in O(1) time.

That's why we want to cache it.

> +        // TODO Can we take rb_next ?
> +        //struct rb_node *node = &rb_next(root->rb_node);
> +
What do you mean here?

> +        struct csched2_vcpu *swait = runq_elem(node);
>          if ( ! is_idle_vcpu(swait->vcpu)
>               && swait->credit > 0 )
>          {
>              rt_credit = snext->credit - swait->credit;
>          }
>      }

> @@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
>          snext = csched2_vcpu(idle_vcpu[cpu]);
>  
>   check_runq:
> -    list_for_each_safe( iter, temp, &rqd->runq )
> -    {
> -        struct csched2_vcpu * svc = list_entry(iter, struct
> csched2_vcpu, runq_elem);
> +    for (iter = rb_first(&rqd->runq); iter != NULL; iter =
> rb_next(iter)) {
> +        struct csched2_vcpu * svc = rb_entry(iter, struct
> csched2_vcpu, runq_elem);
>
Same as above. I don't think this is from where we want to start. And
no matter whether we want to start from rb_first() or rb_last(), we
certainly don't want to have to traverse the tree to get to there.

> @@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
>              dump_pcpu(ops, j);
>  
>          printk("RUNQ:\n");
> -        list_for_each( iter, runq )
> -        {
> +
> +        for (iter = rb_first(runq); iter != NULL; iter =
> rb_next(iter)) {
>
And the same applies here as well. I think that, if we want the
runqueue printed in the proper order, we need to start from the
righmost, and use rb_prev().

Oh, and about caching, I'd say it is okay if you want to turn this into
a series, the first patch of which does not have it, and you introduce
it in the second patch.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/