All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] xen: credit2: rb-tree for runqueues
@ 2018-04-03 16:55 Praveen Kumar
  2018-04-03 16:55 ` [RFC PATCH 1/1] " Praveen Kumar
  0 siblings, 1 reply; 6+ messages in thread
From: Praveen Kumar @ 2018-04-03 16:55 UTC (permalink / raw)
  To: xen-devel; +Cc: george.dunlap, Praveen Kumar, dfaggioli

Hi All,

This is the continued work with respect to rb-tree usage in Credit2, as
mentioned in previous conversations

https://lists.xen.org/archives/html/xen-devel/2017-06/msg01968.html

The patch optimized the Credit2 runqueue from sorted queue to rb-tree.
This will help in performance and scalability, when we have huge number of
vCPUs.

Please provide your comments over the changes done.

Further, I might need some assistance to run the patch with high end machines
with good amount of CPU cores. This may help to stress the code changes and
uncover any regressions when scale is in consideration.
Please do suggest if any specific test scenarios has to be tested.

Thanks in advance.

Regards,

~Praveen.

Praveen Kumar (1):
  xen: credit2: rb-tree for runqueues

 xen/common/sched_credit2.c | 94 ++++++++++++++++++++++++++++++----------------
 1 file changed, 61 insertions(+), 33 deletions(-)

-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues
  2018-04-03 16:55 [RFC PATCH 0/1] xen: credit2: rb-tree for runqueues Praveen Kumar
@ 2018-04-03 16:55 ` Praveen Kumar
  2018-04-17 10:46   ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: Praveen Kumar @ 2018-04-03 16:55 UTC (permalink / raw)
  To: xen-devel; +Cc: george.dunlap, Praveen Kumar, dfaggioli

The patch optimized the sorted credit2 runq from simple linked list to
rb-tree implementation. This way we will gain performance and
scalability when the number of vCPUs are huge.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
---
 xen/common/sched_credit2.c | 94 ++++++++++++++++++++++++++++++----------------
 1 file changed, 61 insertions(+), 33 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 9a3e71f1c8..3802c2888f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -25,6 +25,7 @@
 #include <xen/trace.h>
 #include <xen/cpu.h>
 #include <xen/keyhandler.h>
+#include <xen/rbtree.h>
 
 /* Meant only for helping developers during debugging. */
 /* #define d2printk printk */
@@ -471,7 +472,7 @@ custom_param("credit2_runqueue", parse_credit2_runqueue);
 struct csched2_runqueue_data {
     spinlock_t lock;           /* Lock for this runqueue                     */
 
-    struct list_head runq;     /* Ordered list of runnable vms               */
+    struct rb_root runq;       /* Runqueue is an rbtree                      */
     int id;                    /* ID of this runqueue (-1 if invalid)        */
 
     int load;                  /* Instantaneous load (num of non-idle vcpus) */
@@ -536,7 +537,7 @@ struct csched2_vcpu {
     s_time_t load_last_update;         /* Last time average was updated       */
     s_time_t avgload;                  /* Decaying queue load                 */
 
-    struct list_head runq_elem;        /* On the runqueue (rqd->runq)         */
+    struct rb_node runq_elem;          /* On the runqueue (rqd->runq)         */
     struct list_head parked_elem;      /* On the parked_vcpus list            */
     struct list_head rqd_elem;         /* On csched2_runqueue_data's svc list */
     struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target */
@@ -600,6 +601,29 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
     return svc->budget != STIME_MAX;
 }
 
+static void runq_insert_rb(struct rb_root *root,
+                           struct csched2_vcpu *svc,
+                           int *pos)
+{
+    struct csched2_vcpu *entry = NULL;
+    struct rb_node **node = &root->rb_node;
+    struct rb_node *parent = NULL;
+
+    while (*node) {
+        parent = *node;
+        entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
+        // Check if we are maintaining the sorted
+        if ( svc->credit < entry->credit )
+            node = &parent->rb_left;
+        else
+            node = &parent->rb_right;
+
+        (*pos)++;
+    }
+    rb_link_node(&svc->runq_elem, parent, node);
+    rb_insert_color(&svc->runq_elem, root);
+}
+
 /*
  * Hyperthreading (SMT) support.
  *
@@ -793,12 +817,12 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
 
 static inline int vcpu_on_runq(struct csched2_vcpu *svc)
 {
-    return !list_empty(&svc->runq_elem);
+    return !RB_EMPTY_NODE(&svc->runq_elem);
 }
 
-static inline struct csched2_vcpu * runq_elem(struct list_head *elem)
+static inline struct csched2_vcpu * runq_elem(struct rb_node *elem)
 {
-    return list_entry(elem, struct csched2_vcpu, runq_elem);
+    return rb_entry(elem, struct csched2_vcpu, runq_elem);
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)
@@ -812,7 +836,7 @@ static void activate_runqueue(struct csched2_private *prv, int rqi)
     rqd->max_weight = 1;
     rqd->id = rqi;
     INIT_LIST_HEAD(&rqd->svc);
-    INIT_LIST_HEAD(&rqd->runq);
+    rqd->runq = RB_ROOT;
     spin_lock_init(&rqd->lock);
 
     __cpumask_set_cpu(rqi, &prv->active_queues);
@@ -1272,9 +1296,8 @@ update_load(const struct scheduler *ops,
 static void
 runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
 {
-    struct list_head *iter;
     unsigned int cpu = svc->vcpu->processor;
-    struct list_head * runq = &c2rqd(ops, cpu)->runq;
+    struct rb_root *runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
     ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
@@ -1287,16 +1310,7 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
     ASSERT(!svc->vcpu->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
-    list_for_each( iter, runq )
-    {
-        struct csched2_vcpu * iter_svc = runq_elem(iter);
-
-        if ( svc->credit > iter_svc->credit )
-            break;
-
-        pos++;
-    }
-    list_add_tail(&svc->runq_elem, iter);
+    runq_insert_rb(runq, svc, &pos);
 
     if ( unlikely(tb_init_done) )
     {
@@ -1315,8 +1329,11 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
 
 static inline void runq_remove(struct csched2_vcpu *svc)
 {
+    // TODO This assert might not be required, as we always have a check before
+    // calling this API
     ASSERT(vcpu_on_runq(svc));
-    list_del_init(&svc->runq_elem);
+    rb_erase(&svc->runq_elem, &svc->rqd->runq);
+    RB_CLEAR_NODE(&svc->runq_elem);
 }
 
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
@@ -1785,6 +1802,7 @@ static void park_vcpu(struct csched2_vcpu *svc)
      * In both cases, we also add it to the list of parked vCPUs of the domain.
      */
     __set_bit(_VPF_parked, &v->pause_flags);
+
     if ( vcpu_on_runq(svc) )
     {
         runq_remove(svc);
@@ -2037,7 +2055,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
         return NULL;
 
     INIT_LIST_HEAD(&svc->rqd_elem);
-    INIT_LIST_HEAD(&svc->runq_elem);
+    RB_CLEAR_NODE(&svc->runq_elem);
 
     svc->sdom = dd;
     svc->vcpu = vc;
@@ -2083,6 +2101,8 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
     }
     else if ( vcpu_on_runq(svc) )
     {
+        printk(XENLOG_WARNING "%s : %d : vcpu on runq rb !\n", __func__, __LINE__);
+
         ASSERT(svc->rqd == c2rqd(ops, vc->processor));
         update_load(ops, svc->rqd, svc, -1, NOW());
         runq_remove(svc);
@@ -2110,6 +2130,7 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 
     if ( unlikely(vcpu_on_runq(svc)) )
     {
+        printk(XENLOG_WARNING "%s : %d : vcpu on runq rb !\n", __func__, __LINE__);
         SCHED_STAT_CRANK(vcpu_wake_onrunq);
         goto out;
     }
@@ -2501,6 +2522,7 @@ static void migrate(const struct scheduler *ops,
         /* It's not running; just move it */
         if ( vcpu_on_runq(svc) )
         {
+            printk(XENLOG_WARNING "%s : %d : vcpu on runq rb !\n", __func__, __LINE__);
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
             on_runq = 1;
@@ -2759,8 +2781,10 @@ csched2_vcpu_migrate(
     if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
     {
         ASSERT(system_state == SYS_STATE_suspend);
+
         if ( vcpu_on_runq(svc) )
         {
+            printk(XENLOG_WARNING "%s : %d : vcpu on runq rb !\n", __func__, __LINE__);
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
         }
@@ -3103,7 +3127,7 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(vc));
-    ASSERT(list_empty(&svc->runq_elem));
+    ASSERT(RB_EMPTY_NODE(&svc->runq_elem));
 
     /* csched2_cpu_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
@@ -3141,7 +3165,7 @@ csched2_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(vc));
-    ASSERT(list_empty(&svc->runq_elem));
+    ASSERT(RB_EMPTY_NODE(&svc->runq_elem));
 
     SCHED_STAT_CRANK(vcpu_remove);
 
@@ -3163,7 +3187,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     s_time_t time, min_time;
     int rt_credit; /* Proposed runtime measured in credits */
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    struct list_head *runq = &rqd->runq;
+    struct rb_root *runq = &rqd->runq;
     struct csched2_private *prv = csched2_priv(ops);
 
     /*
@@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      * 2) If there's someone waiting whose credit is positive,
      *    run until your credit ~= his.
      */
-    if ( ! list_empty(runq) )
+    if ( ! RB_EMPTY_ROOT(runq) )
     {
-        struct csched2_vcpu *swait = runq_elem(runq->next);
+        // Find the left most element, which is the most probable candidate
+        struct rb_node *node = rb_first(runq);
 
+        // TODO Can we take rb_next ?
+        //struct rb_node *node = &rb_next(root->rb_node);
+
+        struct csched2_vcpu *swait = runq_elem(node);
         if ( ! is_idle_vcpu(swait->vcpu)
              && swait->credit > 0 )
         {
             rt_credit = snext->credit - swait->credit;
         }
     }
-
     /*
      * The next guy on the runqueue may actually have a higher credit,
      * if we've tried to avoid migrating him from a different cpu.
@@ -3260,7 +3288,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                int cpu, s_time_t now,
                unsigned int *skipped)
 {
-    struct list_head *iter, *temp;
+    struct rb_node *iter = NULL;
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     bool yield = false, soft_aff_preempt = false;
@@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         snext = csched2_vcpu(idle_vcpu[cpu]);
 
  check_runq:
-    list_for_each_safe( iter, temp, &rqd->runq )
-    {
-        struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
+    for (iter = rb_first(&rqd->runq); iter != NULL; iter = rb_next(iter)) {
+        struct csched2_vcpu * svc = rb_entry(iter, struct csched2_vcpu, runq_elem);
 
         if ( unlikely(tb_init_done) )
         {
@@ -3749,7 +3776,8 @@ csched2_dump(const struct scheduler *ops)
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd = prv->rqd + i;
-        struct list_head *iter, *runq = &rqd->runq;
+        struct rb_root *runq = &rqd->runq;
+        struct rb_node *iter;
         int loop = 0;
 
         /* We need the lock to scan the runqueue. */
@@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
             dump_pcpu(ops, j);
 
         printk("RUNQ:\n");
-        list_for_each( iter, runq )
-        {
+
+        for (iter = rb_first(runq); iter != NULL; iter = rb_next(iter)) {
             struct csched2_vcpu *svc = runq_elem(iter);
 
             if ( svc )
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues
  2018-04-03 16:55 ` [RFC PATCH 1/1] " Praveen Kumar
@ 2018-04-17 10:46   ` Dario Faggioli
  2018-04-24  9:00     ` Praveen Kumar
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2018-04-17 10:46 UTC (permalink / raw)
  To: Praveen Kumar, xen-devel; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 5401 bytes --]

On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
> The patch optimized the sorted credit2 runq from simple linked list
> to
> rb-tree implementation. This way we will gain performance and
> scalability when the number of vCPUs are huge.
> 
> Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -600,6 +601,29 @@ static inline bool has_cap(const struct
> csched2_vcpu *svc)
>      return svc->budget != STIME_MAX;
>  }
>  
> +static void runq_insert_rb(struct rb_root *root,
> +                           struct csched2_vcpu *svc,
> +                           int *pos)
>
I'd call this rb_insert() or rb_runq_insert(), but that's taste (and
it's certainly a minor thing).

> +{
> +    struct csched2_vcpu *entry = NULL;
> +    struct rb_node **node = &root->rb_node;
> +    struct rb_node *parent = NULL;
> +
> +    while (*node) {
> +        parent = *node;
> +        entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
> +        // Check if we are maintaining the sorted
>
Pointless comment. I'd leave the line blank (but that's taste again).

> +        if ( svc->credit < entry->credit )
> +            node = &parent->rb_left;
> +        else
> +            node = &parent->rb_right;
> +
> +        (*pos)++;
> +    }
> +    rb_link_node(&svc->runq_elem, parent, node);
> +    rb_insert_color(&svc->runq_elem, root);
> +}
>
Wait, where's the part where we cache which element is the one with the
highest credits? (And the same applies to the tree-removal function, of
course.)

In fact, we need a field for storing such a cache in the runqueue data
structure as well, and we need to keep it updated.

Linux (recently) added an rb-tree variant that do this internally, see
cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a
variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest
dl and rq leftmost caching".

So, I'd say that we either import that from there, or do the caching of
leftmost element explicitly where needed. Note that, however, that
Linux's variant caches leftmost, because they always use rb-trees for
structures where the head of the queue is the element with the
_smallest_ key.

In our case here, we want the queue ordered in descending credit order,
so it must be thought carefully whether or not we could use the new
variant (maybe "just" inverting the binary search tree relationship),
or we'd need another one that caches righmost.

I would suggest we do not try to use the rb_*_cached() functions, and
cache rightmost explicitly in runqueue_data.

> @@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops,
> int cpu,
>       * 2) If there's someone waiting whose credit is positive,
>       *    run until your credit ~= his.
>       */
> -    if ( ! list_empty(runq) )
> +    if ( ! RB_EMPTY_ROOT(runq) )
>      {
> -        struct csched2_vcpu *swait = runq_elem(runq->next);
> +        // Find the left most element, which is the most probable
> candidate
> +        struct rb_node *node = rb_first(runq);
> 
Err... is it? Isn't the leftmost element the one with the _least_
credits? It looks to me that we want rb_last().

And IAC, we don't want to have to traverse the tree to get the runnable
vcpu with the highest credit, we want it available in O(1) time.

That's why we want to cache it.

> +        // TODO Can we take rb_next ?
> +        //struct rb_node *node = &rb_next(root->rb_node);
> +
What do you mean here?

> +        struct csched2_vcpu *swait = runq_elem(node);
>          if ( ! is_idle_vcpu(swait->vcpu)
>               && swait->credit > 0 )
>          {
>              rt_credit = snext->credit - swait->credit;
>          }
>      }

> @@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
>          snext = csched2_vcpu(idle_vcpu[cpu]);
>  
>   check_runq:
> -    list_for_each_safe( iter, temp, &rqd->runq )
> -    {
> -        struct csched2_vcpu * svc = list_entry(iter, struct
> csched2_vcpu, runq_elem);
> +    for (iter = rb_first(&rqd->runq); iter != NULL; iter =
> rb_next(iter)) {
> +        struct csched2_vcpu * svc = rb_entry(iter, struct
> csched2_vcpu, runq_elem);
>
Same as above. I don't think this is from where we want to start. And
no matter whether we want to start from rb_first() or rb_last(), we
certainly don't want to have to traverse the tree to get to there.

> @@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
>              dump_pcpu(ops, j);
>  
>          printk("RUNQ:\n");
> -        list_for_each( iter, runq )
> -        {
> +
> +        for (iter = rb_first(runq); iter != NULL; iter =
> rb_next(iter)) {
>
And the same applies here as well. I think that, if we want the
runqueue printed in the proper order, we need to start from the
righmost, and use rb_prev().

Oh, and about caching, I'd say it is okay if you want to turn this into
a series, the first patch of which does not have it, and you introduce
it in the second patch.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues
  2018-04-17 10:46   ` Dario Faggioli
@ 2018-04-24  9:00     ` Praveen Kumar
  2018-04-24 11:03       ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: Praveen Kumar @ 2018-04-24  9:00 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: george.dunlap

Hi Dario,

On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:
> On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
>> The patch optimized the sorted credit2 runq from simple linked list
>> to
>> rb-tree implementation. This way we will gain performance and
>> scalability when the number of vCPUs are huge.
>>
>> Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -600,6 +601,29 @@ static inline bool has_cap(const struct
>> csched2_vcpu *svc)
>>       return svc->budget != STIME_MAX;
>>   }
>>   
>> +static void runq_insert_rb(struct rb_root *root,
>> +                           struct csched2_vcpu *svc,
>> +                           int *pos)
>>
> I'd call this rb_insert() or rb_runq_insert(), but that's taste (and
> it's certainly a minor thing).
Sure, would prefer rb_runq_insert()
>> +{
>> +    struct csched2_vcpu *entry = NULL;
>> +    struct rb_node **node = &root->rb_node;
>> +    struct rb_node *parent = NULL;
>> +
>> +    while (*node) {
>> +        parent = *node;
>> +        entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
>> +        // Check if we are maintaining the sorted
>>
> Pointless comment. I'd leave the line blank (but that's taste again).
Will remove this.
>> +        if ( svc->credit < entry->credit )
>> +            node = &parent->rb_left;
>> +        else
>> +            node = &parent->rb_right;
>> +
>> +        (*pos)++;
>> +    }
>> +    rb_link_node(&svc->runq_elem, parent, node);
>> +    rb_insert_color(&svc->runq_elem, root);
>> +}
>>
> Wait, where's the part where we cache which element is the one with the
> highest credits? (And the same applies to the tree-removal function, of
> course.)
>
> In fact, we need a field for storing such a cache in the runqueue data
> structure as well, and we need to keep it updated.
>
> Linux (recently) added an rb-tree variant that do this internally, see
> cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a
> variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest
> dl and rq leftmost caching".
I thought of caching the left most node as done in rb_tree, but I 
thought of taking an easy way to have things working and delaying the 
Linux rb_tree caching variant to be ported in next patch or so.
> So, I'd say that we either import that from there, or do the caching of
> leftmost element explicitly where needed. Note that, however, that
> Linux's variant caches leftmost, because they always use rb-trees for
> structures where the head of the queue is the element with the
> _smallest_ key.
>
> In our case here, we want the queue ordered in descending credit order,
> so it must be thought carefully whether or not we could use the new
> variant (maybe "just" inverting the binary search tree relationship),
> or we'd need another one that caches righmost.
>
> I would suggest we do not try to use the rb_*_cached() functions, and
> cache rightmost explicitly in runqueue_data.

Ok, that sounds better, I will introduce an entry for rightmost element 
to be cached in runqueue_data
Also, lets port the Linux rb_tree cache variant as well ( probably in 
future we may use that ).
>> @@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops,
>> int cpu,
>>        * 2) If there's someone waiting whose credit is positive,
>>        *    run until your credit ~= his.
>>        */
>> -    if ( ! list_empty(runq) )
>> +    if ( ! RB_EMPTY_ROOT(runq) )
>>       {
>> -        struct csched2_vcpu *swait = runq_elem(runq->next);
>> +        // Find the left most element, which is the most probable
>> candidate
>> +        struct rb_node *node = rb_first(runq);
>>
> Err... is it? Isn't the leftmost element the one with the _least_
> credits? It looks to me that we want rb_last().
>
> And IAC, we don't want to have to traverse the tree to get the runnable
> vcpu with the highest credit, we want it available in O(1) time.
>
> That's why we want to cache it.
Yes, it looks like an error. Will update the patch in v2.
>> +        // TODO Can we take rb_next ?
>> +        //struct rb_node *node = &rb_next(root->rb_node);
>> +
> What do you mean here?
>
This won't matter if we do caching.
>> +        struct csched2_vcpu *swait = runq_elem(node);
>>           if ( ! is_idle_vcpu(swait->vcpu)
>>                && swait->credit > 0 )
>>           {
>>               rt_credit = snext->credit - swait->credit;
>>           }
>>       }
>> @@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data
>> *rqd,
>>           snext = csched2_vcpu(idle_vcpu[cpu]);
>>   
>>    check_runq:
>> -    list_for_each_safe( iter, temp, &rqd->runq )
>> -    {
>> -        struct csched2_vcpu * svc = list_entry(iter, struct
>> csched2_vcpu, runq_elem);
>> +    for (iter = rb_first(&rqd->runq); iter != NULL; iter =
>> rb_next(iter)) {
>> +        struct csched2_vcpu * svc = rb_entry(iter, struct
>> csched2_vcpu, runq_elem);
>>
> Same as above. I don't think this is from where we want to start. And
> no matter whether we want to start from rb_first() or rb_last(), we
> certainly don't want to have to traverse the tree to get to there.
>
>> @@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
>>               dump_pcpu(ops, j);
>>   
>>           printk("RUNQ:\n");
>> -        list_for_each( iter, runq )
>> -        {
>> +
>> +        for (iter = rb_first(runq); iter != NULL; iter =
>> rb_next(iter)) {
>>
> And the same applies here as well. I think that, if we want the
> runqueue printed in the proper order, we need to start from the
> righmost, and use rb_prev().
>
> Oh, and about caching, I'd say it is okay if you want to turn this into
> a series, the first patch of which does not have it, and you introduce
> it in the second patch.
>
Sure, let me have 3 series, first; Linux porting , second; rb_tree 
changes which doesn't have caching and third; have rightmost node cached.

Regards,

~Praveen.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues
  2018-04-24  9:00     ` Praveen Kumar
@ 2018-04-24 11:03       ` Dario Faggioli
  2018-04-24 13:34         ` Praveen Kumar
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2018-04-24 11:03 UTC (permalink / raw)
  To: Praveen Kumar, xen-devel; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 3221 bytes --]

On Tue, 2018-04-24 at 14:30 +0530, Praveen Kumar wrote:
> Hi Dario,
> 
Hi!

> On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:
> > On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
> > > 
> > > +        if ( svc->credit < entry->credit )
> > > +            node = &parent->rb_left;
> > > +        else
> > > +            node = &parent->rb_right;
> > > +
> > > +        (*pos)++;
> > > +    }
> > > +    rb_link_node(&svc->runq_elem, parent, node);
> > > +    rb_insert_color(&svc->runq_elem, root);
> > > +}
> > > 
> > Wait, where's the part where we cache which element is the one with
> > the
> > highest credits? (And the same applies to the tree-removal
> > function, of
> > course.)
> > 
> > In fact, we need a field for storing such a cache in the runqueue
> > data
> > structure as well, and we need to keep it updated.
> 
> I thought of caching the left most node as done in rb_tree, but I 
> thought of taking an easy way to have things working and delaying
> the 
> Linux rb_tree caching variant to be ported in next patch or so.
>
That is fine, as soon as the fact that you are not doing it right now,
but planning to do it in another patch is stated somewhere (e.g., cover
letter or a changelog).

> > I would suggest we do not try to use the rb_*_cached() functions,
> > and
> > cache rightmost explicitly in runqueue_data.
> 
> Ok, that sounds better, I will introduce an entry for rightmost
> element 
> to be cached in runqueue_data
> Also, lets port the Linux rb_tree cache variant as well ( probably
> in 
> future we may use that ).
>
I'm not sure about this last part. I mean, I can see other uses of rb-
trees, TBH, and ones where such caching would be useful. Still, I'll do
the port when we actually decide to use the new functionallity (or
when, e.g., we run into issues retro-fitting a Linux fix, etc).

> > Err... is it? Isn't the leftmost element the one with the _least_
> > credits? It looks to me that we want rb_last().
> > 
> > And IAC, we don't want to have to traverse the tree to get the
> > runnable
> > vcpu with the highest credit, we want it available in O(1) time.
> > 
> > That's why we want to cache it.
> 
> Yes, it looks like an error. Will update the patch in v2.
>
Right. So, I think the main problem with this patch was this one, i.e.,
the fact that the runqueue was sorted in the wrong order.

Then there is the lack of caching, for O(1) access to the head of the
runqueue itself. As said, it is ok for that to come in its own patch of
this series. It is also ok if this comes as a later patch, as soon as
that is clearly stated.

> Sure, let me have 3 series, first; Linux porting , second; rb_tree 
> changes which doesn't have caching and third; have rightmost node
> cached.
> 
I'd actually skip doing the porting right now... Although, in this
case, it's not really my call, and others may have different a opinion.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues
  2018-04-24 11:03       ` Dario Faggioli
@ 2018-04-24 13:34         ` Praveen Kumar
  0 siblings, 0 replies; 6+ messages in thread
From: Praveen Kumar @ 2018-04-24 13:34 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: george.dunlap

On Tuesday 24 April 2018 04:33 PM, Dario Faggioli wrote:
> On Tue, 2018-04-24 at 14:30 +0530, Praveen Kumar wrote:
>> Hi Dario,
>>
> Hi!
> 
>> On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:
>>> On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
>>>>
>>>> +        if ( svc->credit < entry->credit )
>>>> +            node = &parent->rb_left;
>>>> +        else
>>>> +            node = &parent->rb_right;
>>>> +
>>>> +        (*pos)++;
>>>> +    }
>>>> +    rb_link_node(&svc->runq_elem, parent, node);
>>>> +    rb_insert_color(&svc->runq_elem, root);
>>>> +}
>>>>
>>> Wait, where's the part where we cache which element is the one with
>>> the
>>> highest credits? (And the same applies to the tree-removal
>>> function, of
>>> course.)
>>>
>>> In fact, we need a field for storing such a cache in the runqueue
>>> data
>>> structure as well, and we need to keep it updated.
>>
>> I thought of caching the left most node as done in rb_tree, but I
>> thought of taking an easy way to have things working and delaying
>> the
>> Linux rb_tree caching variant to be ported in next patch or so.
>>
> That is fine, as soon as the fact that you are not doing it right now,
> but planning to do it in another patch is stated somewhere (e.g., cover
> letter or a changelog).
> 
>>> I would suggest we do not try to use the rb_*_cached() functions,
>>> and
>>> cache rightmost explicitly in runqueue_data.
>>
>> Ok, that sounds better, I will introduce an entry for rightmost
>> element
>> to be cached in runqueue_data
>> Also, lets port the Linux rb_tree cache variant as well ( probably
>> in
>> future we may use that ).
>>
> I'm not sure about this last part. I mean, I can see other uses of rb-
> trees, TBH, and ones where such caching would be useful. Still, I'll do
> the port when we actually decide to use the new functionallity (or
> when, e.g., we run into issues retro-fitting a Linux fix, etc).
> 

Sure, I am fine with that too.

>>> Err... is it? Isn't the leftmost element the one with the _least_
>>> credits? It looks to me that we want rb_last().
>>>
>>> And IAC, we don't want to have to traverse the tree to get the
>>> runnable
>>> vcpu with the highest credit, we want it available in O(1) time.
>>>
>>> That's why we want to cache it.
>>
>> Yes, it looks like an error. Will update the patch in v2.
>>
> Right. So, I think the main problem with this patch was this one, i.e.,
> the fact that the runqueue was sorted in the wrong order.
> 
> Then there is the lack of caching, for O(1) access to the head of the
> runqueue itself. As said, it is ok for that to come in its own patch of
> this series. It is also ok if this comes as a later patch, as soon as
> that is clearly stated.
> 
Ok. Working on the same.

>> Sure, let me have 3 series, first; Linux porting , second; rb_tree
>> changes which doesn't have caching and third; have rightmost node
>> cached.
>>
> I'd actually skip doing the porting right now... Although, in this
> case, it's not really my call, and others may have different a opinion.
> 
Sure, I am fine with not including the porting work as part of this 
patch. Probably, we may take this as a different activity as you 
mentioned above.

Regards,

~Praveen.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-24 13:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 16:55 [RFC PATCH 0/1] xen: credit2: rb-tree for runqueues Praveen Kumar
2018-04-03 16:55 ` [RFC PATCH 1/1] " Praveen Kumar
2018-04-17 10:46   ` Dario Faggioli
2018-04-24  9:00     ` Praveen Kumar
2018-04-24 11:03       ` Dario Faggioli
2018-04-24 13:34         ` Praveen Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.