On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote: > The patch optimized the sorted credit2 runq from simple linked list > to > rb-tree implementation. This way we will gain performance and > scalability when the number of vCPUs are huge. > > Signed-off-by: Praveen Kumar > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -600,6 +601,29 @@ static inline bool has_cap(const struct > csched2_vcpu *svc) > return svc->budget != STIME_MAX; > } > > +static void runq_insert_rb(struct rb_root *root, > + struct csched2_vcpu *svc, > + int *pos) > I'd call this rb_insert() or rb_runq_insert(), but that's taste (and it's certainly a minor thing). > +{ > + struct csched2_vcpu *entry = NULL; > + struct rb_node **node = &root->rb_node; > + struct rb_node *parent = NULL; > + > + while (*node) { > + parent = *node; > + entry = rb_entry(parent, struct csched2_vcpu, runq_elem); > + // Check if we are maintaining the sorted > Pointless comment. I'd leave the line blank (but that's taste again). > + if ( svc->credit < entry->credit ) > + node = &parent->rb_left; > + else > + node = &parent->rb_right; > + > + (*pos)++; > + } > + rb_link_node(&svc->runq_elem, parent, node); > + rb_insert_color(&svc->runq_elem, root); > +} > Wait, where's the part where we cache which element is the one with the highest credits? (And the same applies to the tree-removal function, of course.) In fact, we need a field for storing such a cache in the runqueue data structure as well, and we need to keep it updated. Linux (recently) added an rb-tree variant that do this internally, see cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest dl and rq leftmost caching". So, I'd say that we either import that from there, or do the caching of leftmost element explicitly where needed. Note that, however, that Linux's variant caches leftmost, because they always use rb-trees for structures where the head of the queue is the element with the _smallest_ key. In our case here, we want the queue ordered in descending credit order, so it must be thought carefully whether or not we could use the new variant (maybe "just" inverting the binary search tree relationship), or we'd need another one that caches righmost. I would suggest we do not try to use the rb_*_cached() functions, and cache rightmost explicitly in runqueue_data. > @@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops, > int cpu, > * 2) If there's someone waiting whose credit is positive, > * run until your credit ~= his. > */ > - if ( ! list_empty(runq) ) > + if ( ! RB_EMPTY_ROOT(runq) ) > { > - struct csched2_vcpu *swait = runq_elem(runq->next); > + // Find the left most element, which is the most probable > candidate > + struct rb_node *node = rb_first(runq); > Err... is it? Isn't the leftmost element the one with the _least_ credits? It looks to me that we want rb_last(). And IAC, we don't want to have to traverse the tree to get the runnable vcpu with the highest credit, we want it available in O(1) time. That's why we want to cache it. > + // TODO Can we take rb_next ? > + //struct rb_node *node = &rb_next(root->rb_node); > + What do you mean here? > + struct csched2_vcpu *swait = runq_elem(node); > if ( ! is_idle_vcpu(swait->vcpu) > && swait->credit > 0 ) > { > rt_credit = snext->credit - swait->credit; > } > } > @@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data > *rqd, > snext = csched2_vcpu(idle_vcpu[cpu]); > > check_runq: > - list_for_each_safe( iter, temp, &rqd->runq ) > - { > - struct csched2_vcpu * svc = list_entry(iter, struct > csched2_vcpu, runq_elem); > + for (iter = rb_first(&rqd->runq); iter != NULL; iter = > rb_next(iter)) { > + struct csched2_vcpu * svc = rb_entry(iter, struct > csched2_vcpu, runq_elem); > Same as above. I don't think this is from where we want to start. And no matter whether we want to start from rb_first() or rb_last(), we certainly don't want to have to traverse the tree to get to there. > @@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops) > dump_pcpu(ops, j); > > printk("RUNQ:\n"); > - list_for_each( iter, runq ) > - { > + > + for (iter = rb_first(runq); iter != NULL; iter = > rb_next(iter)) { > And the same applies here as well. I think that, if we want the runqueue printed in the proper order, we need to start from the righmost, and use rb_prev(). Oh, and about caching, I'd say it is okay if you want to turn this into a series, the first patch of which does not have it, and you introduce it in the second patch. Regards, Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/