All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
@ 2015-03-13 18:11 Uma Sharma
  2015-03-13 18:29 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Uma Sharma @ 2015-03-13 18:11 UTC (permalink / raw)
  To: xen-devel; +Cc: dario.faggioli, George.Dunlap, JBeulich

This patch do the following things:
-Insertion of runqueue_per_core code
-Boot paarmeter creation to select runqueue
-Update of xen-command-line.markdown
 
Signed-off-by : Uma Sharma <uma.sharma523@gmail.com>
---
 docs/misc/xen-command-line.markdown |  7 +++++++
 xen/common/sched_credit2.c          | 31 ++++++++++++++++++++++++-------
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 63871cb..12e5c01 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -460,6 +460,13 @@ combination with the `low_crashinfo` command line option.
 ### credit2\_load\_window\_shift
 > `= <integer>`
 
+### credit2\_runqueue
+> `= core | socket`
+
+> Default: `credit2_runqueue = core`
+
+Choose the credit2 runqueue.
+
 ### dbgp
 > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..2067b3b 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -85,8 +85,8 @@
  * to a small value, and a fixed credit is added to everyone.
  *
  * The plan is for all cores that share an L2 will share the same
- * runqueue.  At the moment, there is one global runqueue for all
- * cores.
+ * runqueue.  At the moment, the code allows the user to choose runqueue
+ * to be used. Default used core runqueue.
  */
 
 /*
@@ -161,10 +161,16 @@
  */
 #define __CSFLAG_runq_migrate_request 3
 #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
-
+/* CREDIT2_OPT_RUNQUEUE: Used to define the runqueue used
+ */
+#define CREDIT2_OPT_RUNQUEUE_CORE 1
+#define CREDIT2_OPT_RUNQUEUE_SOCKET 2
 
 int opt_migrate_resist=500;
 integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
+char __initdata opt_credit2_runqueue_string[10] = "core";
+string_param("credit2_runqueue", opt_credit2_runqueue_string);
+int opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_CORE;
 
 /*
  * Useful macros
@@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 
     /* Figure out which runqueue to put it in */
     /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
-    if ( cpu == 0 )
-        rqi = 0;
+    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
+        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();
     else
-        rqi = cpu_to_socket(cpu);
+        rqi = cpu ? cpu_to_core(cpu) : boot_cpu_to_core();
 
     if ( rqi < 0 )
     {
@@ -1988,7 +1994,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     /* Check to see if the cpu is online yet */
     /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
+    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 || cpu_to_core(cpu) >= 0 )
         init_pcpu(ops, cpu);
     else
         printk("%s: cpu %d not online yet, deferring initializatgion\n",
@@ -2109,6 +2115,17 @@ csched2_init(struct scheduler *ops)
         opt_load_window_shift = LOADAVG_WINDOW_SHIFT_MIN;
     }
 
+    /* Defines the runqueue used. */
+    if ( !strcmp(opt_credit2_runqueue_string, "socket") )
+        opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_SOCKET;
+    else if ( strcmp(opt_credit2_runqueue_string, "core") )
+        printk("WARNING, unrecognized credit2_runqueue option %s, using core\n",
+                opt_credit2_runqueue_string);
+
+    printk("Runqueue: Using %s\n",
+            opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_CORE ? "core" :
+            opt_credit2_runqueue_string);
+
     /* Basically no CPU information is available at this point; just
      * set up basic structures, and a callback when the CPU info is
      * available. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-13 18:11 [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Uma Sharma
@ 2015-03-13 18:29 ` Andrew Cooper
  2015-03-13 19:13   ` George Dunlap
  2015-03-16 12:45 ` Jan Beulich
  2015-03-16 12:49 ` Jan Beulich
  2 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2015-03-13 18:29 UTC (permalink / raw)
  To: Uma Sharma, xen-devel; +Cc: dario.faggioli, George.Dunlap, JBeulich

On 13/03/15 18:11, Uma Sharma wrote:
> This patch do the following things:
> -Insertion of runqueue_per_core code
> -Boot paarmeter creation to select runqueue
> -Update of xen-command-line.markdown
>  
> Signed-off-by : Uma Sharma <uma.sharma523@gmail.com>
> ---
>  docs/misc/xen-command-line.markdown |  7 +++++++
>  xen/common/sched_credit2.c          | 31 ++++++++++++++++++++++++-------
>  2 files changed, 31 insertions(+), 7 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 63871cb..12e5c01 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -460,6 +460,13 @@ combination with the `low_crashinfo` command line option.
>  ### credit2\_load\_window\_shift
>  > `= <integer>`
>  
> +### credit2\_runqueue
> +> `= core | socket`
> +
> +> Default: `credit2_runqueue = core`
> +
> +Choose the credit2 runqueue.

This is not a helpful description.  I can work out it allows me to
choose the credit2 runqueue simply given the parameters name.

This description should explain what `core` and `socket` mean, and what
behavioural different it has for the scheduler.

> +
>  ### dbgp
>  > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
>  
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index ad0a5d4..2067b3b 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -85,8 +85,8 @@
>   * to a small value, and a fixed credit is added to everyone.
>   *
>   * The plan is for all cores that share an L2 will share the same
> - * runqueue.  At the moment, there is one global runqueue for all
> - * cores.
> + * runqueue.  At the moment, the code allows the user to choose runqueue
> + * to be used. Default used core runqueue.
>   */
>  
>  /*
> @@ -161,10 +161,16 @@
>   */
>  #define __CSFLAG_runq_migrate_request 3
>  #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
> -
> +/* CREDIT2_OPT_RUNQUEUE: Used to define the runqueue used
> + */

Xen coding style is either

/* $TEXT */

or

/*
 * $TEXT
 */

I suggest the former.

> +#define CREDIT2_OPT_RUNQUEUE_CORE 1
> +#define CREDIT2_OPT_RUNQUEUE_SOCKET 2

You can drop _OPT out of the name.  It serves only to make the constant
longer.

Also, this should probably be an enum as the exact numbers themselves
are not interesting.

>  
>  int opt_migrate_resist=500;
>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
> +char __initdata opt_credit2_runqueue_string[10] = "core";
> +string_param("credit2_runqueue", opt_credit2_runqueue_string);
> +int opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_CORE;
>  
>  /*
>   * Useful macros
> @@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>  
>      /* Figure out which runqueue to put it in */
>      /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
> -    if ( cpu == 0 )
> -        rqi = 0;
> +    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
> +        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();

This conditional is bogus.  If cpu0 is offlined and re-onlined, it must
use cpu_to_core()

This entire hunk should probably be

rqi = (opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET) ?
cpu_to_socket(cpu) : cpu_to_core(cpu);

(with suitable alignment)

~Andrew

>      else
> -        rqi = cpu_to_socket(cpu);
> +        rqi = cpu ? cpu_to_core(cpu) : boot_cpu_to_core();
>  
>      if ( rqi < 0 )
>      {
> @@ -1988,7 +1994,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>  {
>      /* Check to see if the cpu is online yet */
>      /* Note: cpu 0 doesn't get a STARTING callback */
> -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> +    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 || cpu_to_core(cpu) >= 0 )
>          init_pcpu(ops, cpu);
>      else
>          printk("%s: cpu %d not online yet, deferring initializatgion\n",
> @@ -2109,6 +2115,17 @@ csched2_init(struct scheduler *ops)
>          opt_load_window_shift = LOADAVG_WINDOW_SHIFT_MIN;
>      }
>  
> +    /* Defines the runqueue used. */
> +    if ( !strcmp(opt_credit2_runqueue_string, "socket") )
> +        opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_SOCKET;
> +    else if ( strcmp(opt_credit2_runqueue_string, "core") )
> +        printk("WARNING, unrecognized credit2_runqueue option %s, using core\n",
> +                opt_credit2_runqueue_string);
> +
> +    printk("Runqueue: Using %s\n",
> +            opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_CORE ? "core" :
> +            opt_credit2_runqueue_string);
> +
>      /* Basically no CPU information is available at this point; just
>       * set up basic structures, and a callback when the CPU info is
>       * available. */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-13 18:29 ` Andrew Cooper
@ 2015-03-13 19:13   ` George Dunlap
  2015-03-16 12:48     ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2015-03-13 19:13 UTC (permalink / raw)
  To: Andrew Cooper, Uma Sharma, xen-devel
  Cc: dario.faggioli, George.Dunlap, JBeulich

On 03/13/2015 06:29 PM, Andrew Cooper wrote:

>> +#define CREDIT2_OPT_RUNQUEUE_CORE 1
>> +#define CREDIT2_OPT_RUNQUEUE_SOCKET 2
> 
> You can drop _OPT out of the name.  It serves only to make the constant
> longer.

I suggested _OPT so that nobody would be confused into thinking they
were used in the core mechanism.

> Also, this should probably be an enum as the exact numbers themselves
> are not interesting.

I also suggested #defines, since there are only 2, and the rest of the
code doesn't really use enums.

> 
>>  
>>  int opt_migrate_resist=500;
>>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>> +char __initdata opt_credit2_runqueue_string[10] = "core";
>> +string_param("credit2_runqueue", opt_credit2_runqueue_string);
>> +int opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_CORE;
>>  
>>  /*
>>   * Useful macros
>> @@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>>  
>>      /* Figure out which runqueue to put it in */
>>      /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
>> -    if ( cpu == 0 )
>> -        rqi = 0;
>> +    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
>> +        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();
> 
> This conditional is bogus.  If cpu0 is offlined and re-onlined, it must
> use cpu_to_core()
> 
> This entire hunk should probably be
> 
> rqi = (opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET) ?
> cpu_to_socket(cpu) : cpu_to_core(cpu);
> 
> (with suitable alignment)

You're ignoring the fact that she's following suit from existing code;
and that that code is there for a reason: When this is first called for
cpu 0, cpu_to_socket() (and cpu_to_core()) return garbage since they
haven't been initialized yet.

That is something that needs to be fixed, but it's not Uma's job to fix it.

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-13 18:11 [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Uma Sharma
  2015-03-13 18:29 ` Andrew Cooper
@ 2015-03-16 12:45 ` Jan Beulich
  2015-03-16 12:49 ` Jan Beulich
  2 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2015-03-16 12:45 UTC (permalink / raw)
  To: Uma Sharma; +Cc: dario.faggioli, George.Dunlap, xen-devel

> @@ -161,10 +161,16 @@
>   */
>  #define __CSFLAG_runq_migrate_request 3
>  #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
> -
> +/* CREDIT2_OPT_RUNQUEUE: Used to define the runqueue used
> + */
> +#define CREDIT2_OPT_RUNQUEUE_CORE 1
> +#define CREDIT2_OPT_RUNQUEUE_SOCKET 2
>  
>  int opt_migrate_resist=500;
>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
> +char __initdata opt_credit2_runqueue_string[10] = "core";
> +string_param("credit2_runqueue", opt_credit2_runqueue_string);
> +int opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_CORE;

static (as said before - please don't have people repeat comments
given on earlier versions). Also placing opt_credit2_runqueue_string[]
into .init.data is incompatible with accessing it ...

> @@ -2109,6 +2115,17 @@ csched2_init(struct scheduler *ops)
>          opt_load_window_shift = LOADAVG_WINDOW_SHIFT_MIN;
>      }
>  
> +    /* Defines the runqueue used. */
> +    if ( !strcmp(opt_credit2_runqueue_string, "socket") )
> +        opt_credit2_runqueue = CREDIT2_OPT_RUNQUEUE_SOCKET;
> +    else if ( strcmp(opt_credit2_runqueue_string, "core") )
> +        printk("WARNING, unrecognized credit2_runqueue option %s, using core\n",
> +                opt_credit2_runqueue_string);

... here. Rather than simply dropping the __initdata, making this
a custom_param() would seem the right course of action (allowing
the char array to be dropped altogether).

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-13 19:13   ` George Dunlap
@ 2015-03-16 12:48     ` Jan Beulich
  2015-03-16 12:51       ` George Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2015-03-16 12:48 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, dario.faggioli, xen-devel, George.Dunlap, Uma Sharma

>>> On 13.03.15 at 20:13, <george.dunlap@eu.citrix.com> wrote:
> On 03/13/2015 06:29 PM, Andrew Cooper wrote:
>>> @@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>>>  
>>>      /* Figure out which runqueue to put it in */
>>>      /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
>>> -    if ( cpu == 0 )
>>> -        rqi = 0;
>>> +    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
>>> +        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();
>> 
>> This conditional is bogus.  If cpu0 is offlined and re-onlined, it must
>> use cpu_to_core()
>> 
>> This entire hunk should probably be
>> 
>> rqi = (opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET) ?
>> cpu_to_socket(cpu) : cpu_to_core(cpu);
>> 
>> (with suitable alignment)
> 
> You're ignoring the fact that she's following suit from existing code;
> and that that code is there for a reason: When this is first called for
> cpu 0, cpu_to_socket() (and cpu_to_core()) return garbage since they
> haven't been initialized yet.
> 
> That is something that needs to be fixed, but it's not Uma's job to fix it.

Them returning garbage isn't what needs fixing. Instead the code
here should use a different condition to check whether this is the
boot CPU (e.g. looking at system_state). And that can very well be
done directly in this patch.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-13 18:11 [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Uma Sharma
  2015-03-13 18:29 ` Andrew Cooper
  2015-03-16 12:45 ` Jan Beulich
@ 2015-03-16 12:49 ` Jan Beulich
  2 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2015-03-16 12:49 UTC (permalink / raw)
  To: Uma Sharma; +Cc: dario.faggioli, George.Dunlap, xen-devel

>>> On 13.03.15 at 19:11, <uma.sharma523@gmail.com> wrote:
> @@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>  
>      /* Figure out which runqueue to put it in */
>      /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
> -    if ( cpu == 0 )
> -        rqi = 0;
> +    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
> +        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();
>      else
> -        rqi = cpu_to_socket(cpu);
> +        rqi = cpu ? cpu_to_core(cpu) : boot_cpu_to_core();

Oh, and btw - does this build on ARM at all?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-16 12:48     ` Jan Beulich
@ 2015-03-16 12:51       ` George Dunlap
  2015-03-16 12:56         ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2015-03-16 12:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, dario.faggioli, xen-devel, George.Dunlap, Uma Sharma

On 03/16/2015 12:48 PM, Jan Beulich wrote:
>>>> On 13.03.15 at 20:13, <george.dunlap@eu.citrix.com> wrote:
>> On 03/13/2015 06:29 PM, Andrew Cooper wrote:
>>>> @@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>>>>  
>>>>      /* Figure out which runqueue to put it in */
>>>>      /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
>>>> -    if ( cpu == 0 )
>>>> -        rqi = 0;
>>>> +    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
>>>> +        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();
>>>
>>> This conditional is bogus.  If cpu0 is offlined and re-onlined, it must
>>> use cpu_to_core()
>>>
>>> This entire hunk should probably be
>>>
>>> rqi = (opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET) ?
>>> cpu_to_socket(cpu) : cpu_to_core(cpu);
>>>
>>> (with suitable alignment)
>>
>> You're ignoring the fact that she's following suit from existing code;
>> and that that code is there for a reason: When this is first called for
>> cpu 0, cpu_to_socket() (and cpu_to_core()) return garbage since they
>> haven't been initialized yet.
>>
>> That is something that needs to be fixed, but it's not Uma's job to fix it.
> 
> Them returning garbage isn't what needs fixing. Instead the code
> here should use a different condition to check whether this is the
> boot CPU (e.g. looking at system_state). And that can very well be
> done directly in this patch.

What do you suggest, then?

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-16 12:51       ` George Dunlap
@ 2015-03-16 12:56         ` Jan Beulich
  2015-03-16 13:26           ` Dario Faggioli
  2015-03-17 18:18           ` Dario Faggioli
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2015-03-16 12:56 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, dario.faggioli, xen-devel, George.Dunlap, Uma Sharma

>>> On 16.03.15 at 13:51, <george.dunlap@eu.citrix.com> wrote:
> On 03/16/2015 12:48 PM, Jan Beulich wrote:
>>>>> On 13.03.15 at 20:13, <george.dunlap@eu.citrix.com> wrote:
>>> On 03/13/2015 06:29 PM, Andrew Cooper wrote:
>>>>> @@ -1940,10 +1946,10 @@ static void init_pcpu(const struct scheduler *ops, 
> int cpu)
>>>>>  
>>>>>      /* Figure out which runqueue to put it in */
>>>>>      /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to 
> runqueue 0. */
>>>>> -    if ( cpu == 0 )
>>>>> -        rqi = 0;
>>>>> +    if ( opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET )
>>>>> +        rqi = cpu ? cpu_to_socket(cpu) : boot_cpu_to_socket();
>>>>
>>>> This conditional is bogus.  If cpu0 is offlined and re-onlined, it must
>>>> use cpu_to_core()
>>>>
>>>> This entire hunk should probably be
>>>>
>>>> rqi = (opt_credit2_runqueue == CREDIT2_OPT_RUNQUEUE_SOCKET) ?
>>>> cpu_to_socket(cpu) : cpu_to_core(cpu);
>>>>
>>>> (with suitable alignment)
>>>
>>> You're ignoring the fact that she's following suit from existing code;
>>> and that that code is there for a reason: When this is first called for
>>> cpu 0, cpu_to_socket() (and cpu_to_core()) return garbage since they
>>> haven't been initialized yet.
>>>
>>> That is something that needs to be fixed, but it's not Uma's job to fix it.
>> 
>> Them returning garbage isn't what needs fixing. Instead the code
>> here should use a different condition to check whether this is the
>> boot CPU (e.g. looking at system_state). And that can very well be
>> done directly in this patch.
> 
> What do you suggest, then?

My preferred solution would be, as said, to leverage system_state.
Provided the state to look for is consistent between x86 and ARM.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-16 12:56         ` Jan Beulich
@ 2015-03-16 13:26           ` Dario Faggioli
  2015-03-17 18:18           ` Dario Faggioli
  1 sibling, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2015-03-16 13:26 UTC (permalink / raw)
  To: JBeulich; +Cc: Andrew Cooper, xen-devel, George Dunlap, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 673 bytes --]

On Mon, 2015-03-16 at 12:56 +0000, Jan Beulich wrote:
> >>> On 16.03.15 at 13:51, <george.dunlap@eu.citrix.com> wrote:
> > On 03/16/2015 12:48 PM, Jan Beulich wrote:
>  
> >> Them returning garbage isn't what needs fixing. Instead the code
> >> here should use a different condition to check whether this is the
> >> boot CPU (e.g. looking at system_state). And that can very well be
> >> done directly in this patch.
> > 
> > What do you suggest, then?
> 
> My preferred solution would be, as said, to leverage system_state.
> Provided the state to look for is consistent between x86 and ARM.
> 
Ok, I'll look into using that.

Thanks and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-16 12:56         ` Jan Beulich
  2015-03-16 13:26           ` Dario Faggioli
@ 2015-03-17 18:18           ` Dario Faggioli
  2015-03-18  7:56             ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2015-03-17 18:18 UTC (permalink / raw)
  To: JBeulich; +Cc: Andrew Cooper, xen-devel, George Dunlap, uma.sharma523


[-- Attachment #1.1.1: Type: text/plain, Size: 2045 bytes --]

On Mon, 2015-03-16 at 12:56 +0000, Jan Beulich wrote:
> >>> On 16.03.15 at 13:51, <george.dunlap@eu.citrix.com> wrote:
> > On 03/16/2015 12:48 PM, Jan Beulich wrote:

> >> Them returning garbage isn't what needs fixing. Instead the code
> >> here should use a different condition to check whether this is the
> >> boot CPU (e.g. looking at system_state). And that can very well be
> >> done directly in this patch.
> > 
> > What do you suggest, then?
> 
> My preferred solution would be, as said, to leverage system_state.
> Provided the state to look for is consistent between x86 and ARM.
> 
Would something like this make sense?

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index cfca5a7..2f2aa73 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     }
 
     /* Figure out which runqueue to put it in */
-    rqi = 0;
-
-    /* Figure out which runqueue to put it in */
-    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
-    if ( cpu == 0 )
-        rqi = 0;
+    if ( system_state == SYS_STATE_boot )
+        rqi = boot_cpu_to_socket(cpu);
     else
         rqi = cpu_to_socket(cpu);
 
@@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 static void *
 csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
+    /*
+     * Actual initialization is deferred to when the pCPU will be
+     * online, via a STARTING callback. The only exception is
+     * the boot cpu, which does not get such a notification, and
+     * hence needs to be taken care of here.
+     */
+    if ( system_state == SYS_STATE_boot )
         init_pcpu(ops, cpu);
     else
         printk("%s: cpu %d not online yet, deferring initializatgion\n",


[-- Attachment #1.1.2: credit2-system-state.patch --]
[-- Type: text/x-patch, Size: 1358 bytes --]

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index cfca5a7..2f2aa73 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     }
 
     /* Figure out which runqueue to put it in */
-    rqi = 0;
-
-    /* Figure out which runqueue to put it in */
-    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
-    if ( cpu == 0 )
-        rqi = 0;
+    if ( system_state == SYS_STATE_boot )
+        rqi = boot_cpu_to_socket(cpu);
     else
         rqi = cpu_to_socket(cpu);
 
@@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 static void *
 csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
+    /*
+     * Actual initialization is deferred to when the pCPU will be
+     * online, via a STARTING callback. The only exception is
+     * the boot cpu, which does not get such a notification, and
+     * hence needs to be taken care of here.
+     */
+    if ( system_state == SYS_STATE_boot )
         init_pcpu(ops, cpu);
     else
         printk("%s: cpu %d not online yet, deferring initializatgion\n",

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-17 18:18           ` Dario Faggioli
@ 2015-03-18  7:56             ` Jan Beulich
  2015-03-18  8:53               ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2015-03-18  7:56 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, xen-devel, George Dunlap, uma.sharma523

>>> On 17.03.15 at 19:18, <dario.faggioli@citrix.com> wrote:
> On Mon, 2015-03-16 at 12:56 +0000, Jan Beulich wrote:
>> >>> On 16.03.15 at 13:51, <george.dunlap@eu.citrix.com> wrote:
>> > On 03/16/2015 12:48 PM, Jan Beulich wrote:
>> My preferred solution would be, as said, to leverage system_state.
>> Provided the state to look for is consistent between x86 and ARM.
>> 
> Would something like this make sense?

Yes, albeit ...

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>      }
>  
>      /* Figure out which runqueue to put it in */
> -    rqi = 0;
> -
> -    /* Figure out which runqueue to put it in */
> -    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
> -    if ( cpu == 0 )
> -        rqi = 0;
> +    if ( system_state == SYS_STATE_boot )

... I'd suggest using <= SYS_STATE_boot or < SYS_STATE_smp_boot.

> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>  static void *
>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>  {
> -    /* Check to see if the cpu is online yet */
> -    /* Note: cpu 0 doesn't get a STARTING callback */
> -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> +    /*
> +     * Actual initialization is deferred to when the pCPU will be
> +     * online, via a STARTING callback. The only exception is
> +     * the boot cpu, which does not get such a notification, and
> +     * hence needs to be taken care of here.
> +     */
> +    if ( system_state == SYS_STATE_boot )
>          init_pcpu(ops, cpu);

Same here, plus the new condition at the first glance isn't matching
the old one, but perhaps that's intentional.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18  7:56             ` Jan Beulich
@ 2015-03-18  8:53               ` Dario Faggioli
  2015-03-18 15:26                 ` George Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2015-03-18  8:53 UTC (permalink / raw)
  To: JBeulich; +Cc: Andrew Cooper, xen-devel, George Dunlap, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 2744 bytes --]

On Wed, 2015-03-18 at 07:56 +0000, Jan Beulich wrote:
> >>> On 17.03.15 at 19:18, <dario.faggioli@citrix.com> wrote:

> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
> >      }
> >  
> >      /* Figure out which runqueue to put it in */
> > -    rqi = 0;
> > -
> > -    /* Figure out which runqueue to put it in */
> > -    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
> > -    if ( cpu == 0 )
> > -        rqi = 0;
> > +    if ( system_state == SYS_STATE_boot )
> 
> ... I'd suggest using <= SYS_STATE_boot or < SYS_STATE_smp_boot.
> 
Ok.

> > @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
> >  static void *
> >  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
> >  {
> > -    /* Check to see if the cpu is online yet */
> > -    /* Note: cpu 0 doesn't get a STARTING callback */
> > -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> > +    /*
> > +     * Actual initialization is deferred to when the pCPU will be
> > +     * online, via a STARTING callback. The only exception is
> > +     * the boot cpu, which does not get such a notification, and
> > +     * hence needs to be taken care of here.
> > +     */
> > +    if ( system_state == SYS_STATE_boot )
> >          init_pcpu(ops, cpu);
> 
> Same here, plus the new condition at the first glance isn't matching
> the old one, but perhaps that's intentional.
> 
It is intentional, and that is why we're changing it! :-) Let me try to
explain:

The idea, in both old and new code, is to call init_pcpu() either:
 a) on the boot cpu, during boot
 b) on non-boot cpu, *only* if they are online.

The issue is that, for assessing b), it checks whether
cpu_to_socket(cpu) returns >=0 and, if it does, it assumes the cpu is
online. That is not true, as cpu_data.phys_proc_id (which is what
cpu_to_socket() go reading) is 0 even before any onlining and topolog
identification has happened (it's a global).

Therefore, all the pcpus are initialized right away, and all are
assigned to runqueue 0, as returned by cpu_to_socket() at this stage,
which is clearly wrong.

In fact, this is the reason why, previous versions of this took the
approach of initializing things such as cpu_to_socket() returned -1
before topology identification.

In the new version, as you suggested, I'm using system_state to figure
out whether we are dealing with the boot cpu, and that's it. :-)

Hope this clarifies things... If yes, I'll make sure to put a similar
explanation in the changelog, when sending the patch officially.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18  8:53               ` Dario Faggioli
@ 2015-03-18 15:26                 ` George Dunlap
  2015-03-18 15:59                   ` Jan Beulich
  2015-03-18 16:49                   ` Dario Faggioli
  0 siblings, 2 replies; 25+ messages in thread
From: George Dunlap @ 2015-03-18 15:26 UTC (permalink / raw)
  To: Dario Faggioli, JBeulich
  Cc: Andrew Cooper, xen-devel, George Dunlap, uma.sharma523

On 03/18/2015 08:53 AM, Dario Faggioli wrote:
>>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>>>  static void *
>>>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>>>  {
>>> -    /* Check to see if the cpu is online yet */
>>> -    /* Note: cpu 0 doesn't get a STARTING callback */
>>> -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
>>> +    /*
>>> +     * Actual initialization is deferred to when the pCPU will be
>>> +     * online, via a STARTING callback. The only exception is
>>> +     * the boot cpu, which does not get such a notification, and
>>> +     * hence needs to be taken care of here.
>>> +     */
>>> +    if ( system_state == SYS_STATE_boot )
>>>          init_pcpu(ops, cpu);

I think this will break adding a cpu to a cpupool after boot, won't it?

Don't we want effectively, "if ( is_boot_cpu() ||
is_cpu_topology_set_up() )"?

is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I
suppose?

Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test?

>>
>> Same here, plus the new condition at the first glance isn't matching
>> the old one, but perhaps that's intentional.
>>
> It is intentional, and that is why we're changing it! :-) Let me try to
> explain:
> 
> The idea, in both old and new code, is to call init_pcpu() either:
>  a) on the boot cpu, during boot
>  b) on non-boot cpu, *only* if they are online.
> 
> The issue is that, for assessing b), it checks whether
> cpu_to_socket(cpu) returns >=0 and, if it does, it assumes the cpu is
> online. That is not true, as cpu_data.phys_proc_id (which is what
> cpu_to_socket() go reading) is 0 even before any onlining and topolog
> identification has happened (it's a global).
> 
> Therefore, all the pcpus are initialized right away, and all are
> assigned to runqueue 0, as returned by cpu_to_socket() at this stage,
> which is clearly wrong.
> 
> In fact, this is the reason why, previous versions of this took the
> approach of initializing things such as cpu_to_socket() returned -1
> before topology identification.
> 
> In the new version, as you suggested, I'm using system_state to figure
> out whether we are dealing with the boot cpu, and that's it. :-)
> 
> Hope this clarifies things... If yes, I'll make sure to put a similar
> explanation in the changelog, when sending the patch officially.

So let's step back for a minute, and let me see if I can describe the
situation.

Unlike the other schedulers, credit2 needs to know the topology of a
cpus when setting it up, so that we can place it in the proper runqueue.

Setting up the per-cpu structures would normally happen in alloc_pdata.
 This is called in three different ways:

* During boot, on the boot processer, alloc_pdata is called from
scheduler_init().

* During boot, on the secondary processors, alloc_pdata is called from
a CPU_UP_PREPARE callback, which happens just before the cpu is actually
brought up.

* When switching a cpu to a new cpupool after boot, alloc_pdata is also
called from cpu_schedule_switch().

The "normal" place the cpu topology information can be found is in
global the cpu_data[] array, typically accessed by the cpu_to_socket()
or cpu_to_core() macros.

This topology information is written to cpu_data[] by
smp_store_cpu_info().  For the boot cpu, it happens in
smp_prepare_cpus();  For secondary cpus, it's happens in
start_secondary(), which is the code run on the cpu itself as it's being
brought up.

Unfortunately, at the moment, both of these places are after the
respective places where the alloc pdata is called for those cpus.
Flattening the entire x86 setup at the moment you'd see:
  alloc_pdata for boot cpu
  smp_store_cpu_info for boot cpu
  for each secondary cpu
    alloc_pdata for secondary cpu
    smp_store_cpu_info for secondary cpu

scheduler_init() is called before smp_prepare_cpus() in part because of
a dependency chain elsewhere: we cannot set up the idle domain until
scheduler_init() is called; and other places further on in the
initialization but before setting up cpu topology information assume
that the idle domain has been set up.

I haven't actually tracked down this dependency yet; nor have I explored
the possibility of populating cpu_data[] for the boot cpu earier than
it's done now.

I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather
than CPU_STARTING -- whether that's just what seemed most sensible when
cpupools were created, or whether there's a dependency somewhere between
those two points.

I *think* that there probably cannot be a dependency: no cpu boot code
can (or should) depend on the cpu having been initialized, because it
doesn't happen when you're booting credit2; and no scheduler code in
alloc_pdata can (or should) depend on being in a pre-initialized state,
because it also happens when assigning an already-initialized cpu to a
cpupool.

At the moment we deal with the fact that the topology information for a
CPU is not available on CPU_UP_PREPARE by setting a callback that will
call init_pcpu() on the CPU_STARTING action.

The boot cpu will not get such a callback; but information about the
topology of the boot cpu *is* available in boot_cpu_data[].

We can use system_state to detect whether we've been called from
scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've
been called after the whole thing has been done (system_state >=
SYS_STATE_active).  But while system_state == SYS_STATE_smp_boot (which
is where both of the normal schedule callback and the csched2-specific
callback happen at the moment), we can't determine whether a given cpu's
cpu_data[] has been initialized yet.

So one thing we could do is this:

* In csched2_alloc_pdata(), call init_pcpu() if system_state <=
SYS_STATE_smp_boot (to catch scheduler_init) or system_state >=
SYS_STATE_active (to catch cpupool callbacks).

* Keep the credit2-specific callback on CPU_STARTING.  In
csched2_cpu_starting(), call init_pcpu().  (Not sure if this handles cpu
offlining / onlining properly or not.)

* In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that
we've been called from scheduler_init and use boot_cpu_data[].
Otherwise, assume that the calls from the CPU_UP_PREPARE callback have
been filtered out, and that we're being called either from CPU_STARTING
or from cpu_schedlue_switch(), and use cpu_data[].

Another thing we could explore is this:

* Change scheduler.c to call alloc_pdata on CPU_STARTING rather than
CPU_UP_PREPARE, and get rid of the csched2-specific callback.

* In csched2_alloc_pdata, always call init_pcpu().  (Or perhaps just
rename init_pcpu() to csched2_alloc_pdata().)

* In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've
been called from scheduler_init and use boot_cpu_data[]; otherwise,
assume that we're being called either from CPU_STARTING, or from
cpu_schedule_switch(), and use cpu_data[].

If this would work, I think it would be a lot better.  It would remove
the credit2-specific callback, and it would mean we wouldn't need an
additional test to filter out

In both cases there's a slight risk in using system_state to determine
whether to rely on cpu_data[] or not, because there's actually a window
for each processor after system_state == SYS_STATE_smp_boot where
cpu_data[] is *not* initialized, but it's not obvious from looking at
the data itself.  If the callback mechanisms ever change order with the
cpu_data[] initializations in the future, we risk a situation where
credit2 silently regresses to using a single massive runqueue.

So I think that at least for debug builds, we should make some way to
determine whether a given cpu_data[] has been initialized, so that at
least we an ASSERT() that it has.  Setting the values to BAD_APICID
before system_state == SYS_STATE_smp_boot would be one way to do that;
having an "initialized" flag in the structure (which defaults to zero)
would be another.

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18 15:26                 ` George Dunlap
@ 2015-03-18 15:59                   ` Jan Beulich
  2015-03-18 16:08                     ` George Dunlap
  2015-03-18 16:49                   ` Dario Faggioli
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2015-03-18 15:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, Dario Faggioli, xen-devel, George.Dunlap, uma.sharma523

>>> On 18.03.15 at 16:26, <george.dunlap@eu.citrix.com> wrote:
> In both cases there's a slight risk in using system_state to determine
> whether to rely on cpu_data[] or not, because there's actually a window
> for each processor after system_state == SYS_STATE_smp_boot where
> cpu_data[] is *not* initialized, but it's not obvious from looking at
> the data itself.  If the callback mechanisms ever change order with the
> cpu_data[] initializations in the future, we risk a situation where
> credit2 silently regresses to using a single massive runqueue.

But such an ordering change is extremely unlikely, since the
CPU_STARTING notification specifically tells you that the given
CPU is now ready for normal use, which includes it having its
topology related data set up.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18 15:59                   ` Jan Beulich
@ 2015-03-18 16:08                     ` George Dunlap
  2015-03-18 16:30                       ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2015-03-18 16:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Dario Faggioli, xen-devel, George.Dunlap, uma.sharma523

On 03/18/2015 03:59 PM, Jan Beulich wrote:
>>>> On 18.03.15 at 16:26, <george.dunlap@eu.citrix.com> wrote:
>> In both cases there's a slight risk in using system_state to determine
>> whether to rely on cpu_data[] or not, because there's actually a window
>> for each processor after system_state == SYS_STATE_smp_boot where
>> cpu_data[] is *not* initialized, but it's not obvious from looking at
>> the data itself.  If the callback mechanisms ever change order with the
>> cpu_data[] initializations in the future, we risk a situation where
>> credit2 silently regresses to using a single massive runqueue.
> 
> But such an ordering change is extremely unlikely, since the
> CPU_STARTING notification specifically tells you that the given
> CPU is now ready for normal use, which includes it having its
> topology related data set up.

I didn't mean so much that CPU_STARTING might change meaning, but that
someone looking only at schedule.c might think that it would make sense
to call alloc_pdata() somewhere else, before cpu_data[] had been
populated.  Even if they look at init_pcpu() in credit2, they're
unlikely to know that cpu_to_socket() isn't valid until after
boot_secondary() has happened; and if they try it and boot it,
everything will seem to work perfectly for credit2 -- except that there
will be a single global runqueue.

If you, Dario and I don't happen to catch it -- or don't happen to
remember the exact details of the constraints -- nobody may notice until
a year later.

This is the point of having ASSERTs -- so that we don't have to worry
about remembering and catching all these crazy constraints.

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18 16:08                     ` George Dunlap
@ 2015-03-18 16:30                       ` Dario Faggioli
  0 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2015-03-18 16:30 UTC (permalink / raw)
  Cc: Andrew Cooper, xen-devel, George Dunlap, JBeulich, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 1910 bytes --]

On Wed, 2015-03-18 at 16:08 +0000, George Dunlap wrote:
> On 03/18/2015 03:59 PM, Jan Beulich wrote:
> >>>> On 18.03.15 at 16:26, <george.dunlap@eu.citrix.com> wrote:
> >> In both cases there's a slight risk in using system_state to determine
> >> whether to rely on cpu_data[] or not, because there's actually a window
> >> for each processor after system_state == SYS_STATE_smp_boot where
> >> cpu_data[] is *not* initialized, but it's not obvious from looking at
> >> the data itself.  If the callback mechanisms ever change order with the
> >> cpu_data[] initializations in the future, we risk a situation where
> >> credit2 silently regresses to using a single massive runqueue.
> > 
> > But such an ordering change is extremely unlikely, since the
> > CPU_STARTING notification specifically tells you that the given
> > CPU is now ready for normal use, which includes it having its
> > topology related data set up.

> Even if they look at init_pcpu() in credit2, they're
> unlikely to know that cpu_to_socket() isn't valid until after
> boot_secondary() has happened; and if they try it and boot it,
> everything will seem to work perfectly for credit2 -- except that there
> will be a single global runqueue.
> 
Agreed... and I'm still replying to your (George's) email, but just
wanted to say that, for this, having system_state referenced from
Credit2's code would help making it clear the fact that code has
dependencies with the boot process, wouldn't it?

> If you, Dario and I don't happen to catch it -- or don't happen to
> remember the exact details of the constraints -- nobody may notice until
> a year later.
> 
:-)

> This is the point of having ASSERTs -- so that we don't have to worry
> about remembering and catching all these crazy constraints.
> 
I'm all for finding a way for ASSERT()ing something meaningful to this
effect.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18 15:26                 ` George Dunlap
  2015-03-18 15:59                   ` Jan Beulich
@ 2015-03-18 16:49                   ` Dario Faggioli
  2015-03-18 17:05                     ` George Dunlap
  1 sibling, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2015-03-18 16:49 UTC (permalink / raw)
  Cc: Andrew Cooper, xen-devel, George Dunlap, JBeulich, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 8955 bytes --]

On Wed, 2015-03-18 at 15:26 +0000, George Dunlap wrote:
> On 03/18/2015 08:53 AM, Dario Faggioli wrote:
> >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
> >>>  static void *
> >>>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
> >>>  {
> >>> -    /* Check to see if the cpu is online yet */
> >>> -    /* Note: cpu 0 doesn't get a STARTING callback */
> >>> -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> >>> +    /*
> >>> +     * Actual initialization is deferred to when the pCPU will be
> >>> +     * online, via a STARTING callback. The only exception is
> >>> +     * the boot cpu, which does not get such a notification, and
> >>> +     * hence needs to be taken care of here.
> >>> +     */
> >>> +    if ( system_state == SYS_STATE_boot )
> >>>          init_pcpu(ops, cpu);
> 
> I think this will break adding a cpu to a cpupool after boot, won't it?
> 
Oh, yeah, that. Yes, I knew it, and probably should have mentioned it...
I assumed it was clear enough that I was asking an opinion about the
shown use of system_state, as opposed to using a particular init value
of cpu_data.phys_proc_id to the same effect, during the boot process.

> Don't we want effectively, "if ( is_boot_cpu() ||
> is_cpu_topology_set_up() )"?
> 
> is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I
> suppose?
> 
That is what I was thinking. Alternatively, I thought we can keep the
'<= SYS_STATE_boot' part for recognizing that the call comes from within
the boot process, and actually do tweak the initialization/assignment
logic of phys_proc_id, as it was also mentioned previously in the thread
(and as you also mention below, IIUIC), to make it better represent
whether topology info are valid or not.

> Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test?
> 
It is a valid test for that, right now, AFAICT, but Jan would like for
it to stop being one. :-D

> So let's step back for a minute, and let me see if I can describe the
> situation.
> 
Yeah... thanks for this, it's been extremely helpful, IMO!

> Unlike the other schedulers, credit2 needs to know the topology of a
> cpus when setting it up, so that we can place it in the proper runqueue.
> 
> Setting up the per-cpu structures would normally happen in alloc_pdata.
>  This is called in three different ways:
> 
> * During boot, on the boot processer, alloc_pdata is called from
> scheduler_init().
> 
> * During boot, on the secondary processors, alloc_pdata is called from
> a CPU_UP_PREPARE callback, which happens just before the cpu is actually
> brought up.
> 
> * When switching a cpu to a new cpupool after boot, alloc_pdata is also
> called from cpu_schedule_switch().
> 
> The "normal" place the cpu topology information can be found is in
> global the cpu_data[] array, typically accessed by the cpu_to_socket()
> or cpu_to_core() macros.
> 
> This topology information is written to cpu_data[] by
> smp_store_cpu_info().  For the boot cpu, it happens in
> smp_prepare_cpus();  For secondary cpus, it's happens in
> start_secondary(), which is the code run on the cpu itself as it's being
> brought up.
> 
> Unfortunately, at the moment, both of these places are after the
> respective places where the alloc pdata is called for those cpus.
> Flattening the entire x86 setup at the moment you'd see:
>   alloc_pdata for boot cpu
>   smp_store_cpu_info for boot cpu
>   for each secondary cpu
>     alloc_pdata for secondary cpu
>     smp_store_cpu_info for secondary cpu
> 
> scheduler_init() is called before smp_prepare_cpus() in part because of
> a dependency chain elsewhere: we cannot set up the idle domain until
> scheduler_init() is called; and other places further on in the
> initialization but before setting up cpu topology information assume
> that the idle domain has been set up.
> 
However, as you say yourself below, there is boot_cpu_data, which
stashes the info we need for the boot cpu. It gets filled after
init_idle_domain()->scheduler_init() right now, but I don't see a reason
why it can't be pulled up a bit (and from my tests, it seems to work).

That happens during system_state==SYS_STATE_boot, which also means
system_state<SYS_STATE_smp_boot. This means that, as far as scheduling
initialization is concerned, if (system_state < SYS_STATE_smp_boot) we
can and should use boot_cpu_data from the scheduler code, because it's
for that cpu that we're being called.

I do agree that this may look fragile, as it depends on the relative
positioning of identify_cpu() wrt scheduler_init() wrt system_state
values, but I think we can find something to ASSERT() about (from
Credit2's code).

> I haven't actually tracked down this dependency yet; nor have I explored
> the possibility of populating cpu_data[] for the boot cpu earier than
> it's done now.
> 
That looks like what boot_cpu_data is for, AFAICT.

> I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather
> than CPU_STARTING -- whether that's just what seemed most sensible when
> cpupools were created, or whether there's a dependency somewhere between
> those two points.
> 
Me neither, this may be worth a try.

> At the moment we deal with the fact that the topology information for a
> CPU is not available on CPU_UP_PREPARE by setting a callback that will
> call init_pcpu() on the CPU_STARTING action.
> 
> The boot cpu will not get such a callback; but information about the
> topology of the boot cpu *is* available in boot_cpu_data[].
> 
Exactly.

> We can use system_state to detect whether we've been called from
> scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've
> been called after the whole thing has been done (system_state >=
> SYS_STATE_active).  
>
Exactly again. :-)

> But while system_state == SYS_STATE_smp_boot (which
> is where both of the normal schedule callback and the csched2-specific
> callback happen at the moment), we can't determine whether a given cpu's
> cpu_data[] has been initialized yet.
> 
But we know this from the fact that, if system_state is
SYS_STATE_smp_boot and not SYS_STATE_boot, we are being called from the
callback, don't we?

> So one thing we could do is this:
> 
> * In csched2_alloc_pdata(), call init_pcpu() if system_state <=
> SYS_STATE_smp_boot (to catch scheduler_init) or system_state >=
> SYS_STATE_active (to catch cpupool callbacks).
> 
> * Keep the credit2-specific callback on CPU_STARTING.  In
> csched2_cpu_starting(), call init_pcpu().  (Not sure if this handles cpu
> offlining / onlining properly or not.)
> 
> * In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that
> we've been called from scheduler_init and use boot_cpu_data[].
> Otherwise, assume that the calls from the CPU_UP_PREPARE callback have
> been filtered out, and that we're being called either from CPU_STARTING
> or from cpu_schedlue_switch(), and use cpu_data[].
> 
Oh, right, so after all, we at least sort of are on the same page: this
is exactly what I was suggesting/wanted to do. :-)

> Another thing we could explore is this:
> 
> * Change scheduler.c to call alloc_pdata on CPU_STARTING rather than
> CPU_UP_PREPARE, and get rid of the csched2-specific callback.
> 
> * In csched2_alloc_pdata, always call init_pcpu().  (Or perhaps just
> rename init_pcpu() to csched2_alloc_pdata().)
> 
> * In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've
> been called from scheduler_init and use boot_cpu_data[]; otherwise,
> assume that we're being called either from CPU_STARTING, or from
> cpu_schedule_switch(), and use cpu_data[].
> 
Right too, I think.

> If this would work, I think it would be a lot better.  It would remove
> the credit2-specific callback, and it would mean we wouldn't need an
> additional test to filter out
> 
I see. Ok, I guess I can give this a try. If it does not explode,
because some weird dependency we can't see right now, we can either try
harder to track it, or use the other solution outlined above as a
fallback.

> In both cases there's a slight risk in using system_state to determine
> whether to rely on cpu_data[] or not, because there's actually a window
> for each processor after system_state == SYS_STATE_smp_boot where
> cpu_data[] is *not* initialized, but it's not obvious from looking at
> the data itself.  If the callback mechanisms ever change order with the
> cpu_data[] initializations in the future, we risk a situation where
> credit2 silently regresses to using a single massive runqueue.
> 
> So I think that at least for debug builds, we should make some way to
> determine whether a given cpu_data[] has been initialized, so that at
> least we an ASSERT() that it has.
>
Yep, agreed too. :-)

Thanks again for this and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18 16:49                   ` Dario Faggioli
@ 2015-03-18 17:05                     ` George Dunlap
  2015-03-19 10:03                       ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2015-03-18 17:05 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap
  Cc: Andrew Cooper, xen-devel, JBeulich, uma.sharma523

On 03/18/2015 04:49 PM, Dario Faggioli wrote:
>> scheduler_init() is called before smp_prepare_cpus() in part because of
>> a dependency chain elsewhere: we cannot set up the idle domain until
>> scheduler_init() is called; and other places further on in the
>> initialization but before setting up cpu topology information assume
>> that the idle domain has been set up.
>>
> However, as you say yourself below, there is boot_cpu_data, which
> stashes the info we need for the boot cpu. It gets filled after
> init_idle_domain()->scheduler_init() right now, but I don't see a reason
> why it can't be pulled up a bit (and from my tests, it seems to work).

Well yes, my goal was first to primarily talk about the state of the
world and get all the facts and constraints out there; and then after
that to talk about what we can / could do to fix it. :-)

I do think we want to minimize the special casing as much as possible;
and I think it makes sense for setup to be the one trying to sort out
constraints, rather than the callees: i.e., for setup to say, "I know
scheduler_init may need the topology information for this cpu to be in
cpu_data[], so I'm going to make sure that's in place before I call it".

So if it's possible to arrange for cpu_data[] to be populated -- at
least with the topology information -- for each cpu before alloc_pdata
happens, I think that's ideal.

>> If this would work, I think it would be a lot better.  It would remove
>> the credit2-specific callback, and it would mean we wouldn't need an
>> additional test to filter out
>>
> I see. Ok, I guess I can give this a try. If it does not explode,
> because some weird dependency we can't see right now, we can either try
> harder to track it, or use the other solution outlined above as a
> fallback.

If you could look into this, that would be awesome. :-)

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-18 17:05                     ` George Dunlap
@ 2015-03-19 10:03                       ` Dario Faggioli
  2015-03-19 10:50                         ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2015-03-19 10:03 UTC (permalink / raw)
  Cc: Andrew Cooper, xen-devel, George Dunlap, JBeulich, uma.sharma523


[-- Attachment #1.1.1: Type: text/plain, Size: 9070 bytes --]

On Wed, 2015-03-18 at 17:05 +0000, George Dunlap wrote:
> On 03/18/2015 04:49 PM, Dario Faggioli wrote:

> >> If this would work, I think it would be a lot better.  It would remove
> >> the credit2-specific callback, and it would mean we wouldn't need an
> >> additional test to filter out
> >>
> > I see. Ok, I guess I can give this a try. If it does not explode,
> > because some weird dependency we can't see right now, we can either try
> > harder to track it, or use the other solution outlined above as a
> > fallback.
> 
> If you could look into this, that would be awesome. :-)
> 
So, I did look into this. In summary, I think it could work, but some
rework is required. Here's my findings in some more details.

So, with the quick-&-dirty(TM) patch attached to this email, Credit2
works-a-charm:

root@Zhaman:~# xl  dmesg |grep runqueue
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 1 to runqueue 1
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 2 to runqueue 1
(XEN) Adding cpu 3 to runqueue 1
(XEN) Adding cpu 4 to runqueue 1
(XEN) Adding cpu 5 to runqueue 1
(XEN) Adding cpu 6 to runqueue 1
(XEN) Adding cpu 7 to runqueue 1
(XEN) Adding cpu 8 to runqueue 0
(XEN) Adding cpu 9 to runqueue 0
(XEN) Adding cpu 10 to runqueue 0
(XEN) Adding cpu 11 to runqueue 0
(XEN) Adding cpu 12 to runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Adding cpu 15 to runqueue 0

(Yes, cpu0 is assigned to the wrong runqueue, but that is because I'm
ignoring the system_state <==> boot_cpu_data thing for now, so let's
also ignore this, it'll be fixed).

So, things are working, less complexity (i.e., no ad-hoc callback for
credit2), less code (i.e., lots of '-' in the patch), outside is warm
and sunny, spring is coming, birds are singing, etc... :-D :-D

Unfortunately, the patch also makes Credit1 go splat like this:

(XEN) Xen call trace:
(XEN)    [<ffff82d08012d14c>] check_lock+0x37/0x3b
(XEN)    [<ffff82d08012d17b>] _spin_lock+0x11/0x54
(XEN)    [<ffff82d08013498f>] xmem_pool_alloc+0x11c/0x46c
(XEN)    [<ffff82d0801350a6>] _xmalloc+0x106/0x316
(XEN)    [<ffff82d0801352c7>] _xzalloc+0x11/0x32
(XEN)    [<ffff82d08011f5ae>] csched_alloc_pdata+0x47/0x1c6
(XEN)    [<ffff82d0801293d6>] cpu_schedule_callback+0x5a/0xcc
(XEN)    [<ffff82d08011ab8a>] notifier_call_chain+0x67/0x87
(XEN)    [<ffff82d08010169a>] notify_cpu_starting+0x1c/0x24
(XEN)    [<ffff82d080189f5b>] start_secondary+0x203/0x256
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Xen BUG at spinlock.c:48
(XEN) ****************************************

(...and, suddenly, birds shut up :-( )

That is because pool->lock in xmem_pool_alloc() wants to find IRQs
enabled when acquired, as it's been acquired with IRQs enabled before,
and we don't mix IRQ enabled and disabled spinlocks (that's what the BUG
at spinlock:48 is for).

The difference between before and after the patch is that, bebore,
csched_alloc_pdata() is called during CPU_UP_PREPARE, after, during
CPU_STARTING. More specifically, CPU_UP_PREPARE callbacks are invoked as
a consequence of cpu_up() being called in a loop in __start_xen(), and
it itself calls notifier_call_chain(..CPU_UP_PREPARE..). This means they
run:
 - on the boot cpu;
 - with interrupt enabled (see how the call to local_irq_enable() in
   __start_xen() is *before* the for_each_present_cpu() { cpu_up(i); }
   loop).

OTOH, CPU_STARTING callbacks run:
 - on the cpu being brought up;
 - with interrupt disabled (see how the call to local_irq_enable(), in
   start_secondary(), is *after* the invocation of
   notify_cpu_starting()).

Here we are. And the reason why things works ok in Credit2, is that
csched2_alloc_pdata() doesn't really allocate anything! In fact, in
general, handling alloc_pdata() during CPU_STARTING would mean that we
can't allocate any memory which, given the name of the function, would
look rather odd. :-)

Nevertheless I see the value of doing so, and hence I think what we
could do would be to introduce a new hook in the scheduler interface,
called .init_pdata or .init_pcpu, and, in sched_*.c, split the
allocation and the initialization parts. The former will be handled
during CPU_UP_PREPARE, when allocation is possible, the latter during
CPU_STARTING, when we have more info available to perform actual
initializations.

For Credit2, alloc_pdata() wouldn't even need to exist, and all the work
will be done in init_pcpu(), so less complexity than now, and no need
for the ad-hoc starting callback inside sched_credit2.c. For Credit1,
both will be required, but the added complexity in sched_credit.c
doesn't look too bad at all to me.

Of couse, I still have to code it up, and see whether this plays well
with cpupools, cpu on/offlining, etc., but I'd like to hear whether you
like the idea, before getting to do that. :-)

Let me know what you think.

Regards,
Dario
---
 sched_credit2.c |   63 ++------------------------------------------------------
 schedule.c      |   10 +++++---
 2 files changed, 9 insertions(+), 64 deletions(-)
---
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..ac7a7f2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1918,7 +1918,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void init_pcpu(const struct scheduler *ops, int cpu)
+static void *
+csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     int rqi;
     unsigned long flags;
@@ -1932,7 +1933,7 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     {
         printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
         spin_unlock_irqrestore(&prv->lock, flags);
-        return;
+        return NULL;
     }
 
     /* Figure out which runqueue to put it in */
@@ -1980,20 +1981,6 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 
     spin_unlock_irqrestore(&prv->lock, flags);
 
-    return;
-}
-
-static void *
-csched2_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
-        init_pcpu(ops, cpu);
-    else
-        printk("%s: cpu %d not online yet, deferring initializatgion\n",
-               __func__, cpu);
-
     return (void *)1;
 }
 
@@ -2046,49 +2033,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static int
-csched2_cpu_starting(int cpu)
-{
-    struct scheduler *ops;
-
-    /* Hope this is safe from cpupools switching things around. :-) */
-    ops = per_cpu(scheduler, cpu);
-
-    if ( ops->alloc_pdata == csched2_alloc_pdata )
-        init_pcpu(ops, cpu);
-
-    return NOTIFY_DONE;
-}
-
-static int cpu_credit2_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    int rc = 0;
-
-    switch ( action )
-    {
-    case CPU_STARTING:
-        csched2_cpu_starting(cpu);
-        break;
-    default:
-        break;
-    }
-
-    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
-}
-
-static struct notifier_block cpu_credit2_nfb = {
-    .notifier_call = cpu_credit2_callback
-};
-
-static int
-csched2_global_init(void)
-{
-    register_cpu_notifier(&cpu_credit2_nfb);
-    return 0;
-}
-
-static int
 csched2_init(struct scheduler *ops)
 {
     int i;
@@ -2168,7 +2112,6 @@ const struct scheduler sched_credit2_def = {
 
     .dump_cpu_state = csched2_dump_pcpu,
     .dump_settings  = csched2_dump,
-    .global_init    = csched2_global_init,
     .init           = csched2_init,
     .deinit         = csched2_deinit,
     .alloc_vdata    = csched2_alloc_vdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..5e7cdc9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1327,10 +1327,6 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
-    if ( (ops.alloc_pdata != NULL) &&
-         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
-        return -ENOMEM;
-
     return 0;
 }
 
@@ -1348,10 +1344,16 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     switch ( action )
     {
+    case CPU_STARTING:
+        if ( (ops.alloc_pdata != NULL) &&
+             ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
+            return -ENOMEM;
+        break;
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;


[-- Attachment #1.1.2: xen-sched-cpustarting.patch --]
[-- Type: text/x-patch, Size: 3695 bytes --]

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..ac7a7f2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1918,7 +1918,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void init_pcpu(const struct scheduler *ops, int cpu)
+static void *
+csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     int rqi;
     unsigned long flags;
@@ -1932,7 +1933,7 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     {
         printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
         spin_unlock_irqrestore(&prv->lock, flags);
-        return;
+        return NULL;
     }
 
     /* Figure out which runqueue to put it in */
@@ -1980,20 +1981,6 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 
     spin_unlock_irqrestore(&prv->lock, flags);
 
-    return;
-}
-
-static void *
-csched2_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
-        init_pcpu(ops, cpu);
-    else
-        printk("%s: cpu %d not online yet, deferring initializatgion\n",
-               __func__, cpu);
-
     return (void *)1;
 }
 
@@ -2046,49 +2033,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static int
-csched2_cpu_starting(int cpu)
-{
-    struct scheduler *ops;
-
-    /* Hope this is safe from cpupools switching things around. :-) */
-    ops = per_cpu(scheduler, cpu);
-
-    if ( ops->alloc_pdata == csched2_alloc_pdata )
-        init_pcpu(ops, cpu);
-
-    return NOTIFY_DONE;
-}
-
-static int cpu_credit2_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    int rc = 0;
-
-    switch ( action )
-    {
-    case CPU_STARTING:
-        csched2_cpu_starting(cpu);
-        break;
-    default:
-        break;
-    }
-
-    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
-}
-
-static struct notifier_block cpu_credit2_nfb = {
-    .notifier_call = cpu_credit2_callback
-};
-
-static int
-csched2_global_init(void)
-{
-    register_cpu_notifier(&cpu_credit2_nfb);
-    return 0;
-}
-
-static int
 csched2_init(struct scheduler *ops)
 {
     int i;
@@ -2168,7 +2112,6 @@ const struct scheduler sched_credit2_def = {
 
     .dump_cpu_state = csched2_dump_pcpu,
     .dump_settings  = csched2_dump,
-    .global_init    = csched2_global_init,
     .init           = csched2_init,
     .deinit         = csched2_deinit,
     .alloc_vdata    = csched2_alloc_vdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..5e7cdc9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1327,10 +1327,6 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
-    if ( (ops.alloc_pdata != NULL) &&
-         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
-        return -ENOMEM;
-
     return 0;
 }
 
@@ -1348,10 +1344,16 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     switch ( action )
     {
+    case CPU_STARTING:
+        if ( (ops.alloc_pdata != NULL) &&
+             ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
+            return -ENOMEM;
+        break;
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-19 10:03                       ` Dario Faggioli
@ 2015-03-19 10:50                         ` Jan Beulich
  2015-03-19 11:23                           ` Dario Faggioli
  2015-03-19 11:40                           ` George Dunlap
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2015-03-19 10:50 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: AndrewCooper, xen-devel, George Dunlap, uma.sharma523

>>> On 19.03.15 at 11:03, <dario.faggioli@citrix.com> wrote:
> OTOH, CPU_STARTING callbacks run:
>  - on the cpu being brought up;
>  - with interrupt disabled (see how the call to local_irq_enable(), in
>    start_secondary(), is *after* the invocation of
>    notify_cpu_starting()).
> 
> Here we are. And the reason why things works ok in Credit2, is that
> csched2_alloc_pdata() doesn't really allocate anything! In fact, in
> general, handling alloc_pdata() during CPU_STARTING would mean that we
> can't allocate any memory which, given the name of the function, would
> look rather odd. :-)
> 
> Nevertheless I see the value of doing so, and hence I think what we
> could do would be to introduce a new hook in the scheduler interface,
> called .init_pdata or .init_pcpu, and, in sched_*.c, split the
> allocation and the initialization parts. The former will be handled
> during CPU_UP_PREPARE, when allocation is possible, the latter during
> CPU_STARTING, when we have more info available to perform actual
> initializations.

Another alternative would be a new CPU_ALIVE (name subject to
change) notification after interrupts got enabled. That would (as
a follow-up cleanup) also allow the MTRR and microcode setup on
the CPU to no longer need explicit calls (which look reversed
anyway - surely we should update microcode before fiddling with
MTRRs).

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-19 10:50                         ` Jan Beulich
@ 2015-03-19 11:23                           ` Dario Faggioli
  2015-03-19 11:40                           ` George Dunlap
  1 sibling, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2015-03-19 11:23 UTC (permalink / raw)
  To: JBeulich; +Cc: Andrew Cooper, xen-devel, George Dunlap, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 2624 bytes --]

On Thu, 2015-03-19 at 10:50 +0000, Jan Beulich wrote:
> >>> On 19.03.15 at 11:03, <dario.faggioli@citrix.com> wrote:
> > OTOH, CPU_STARTING callbacks run:
> >  - on the cpu being brought up;
> >  - with interrupt disabled (see how the call to local_irq_enable(), in
> >    start_secondary(), is *after* the invocation of
> >    notify_cpu_starting()).
> > 
> > Here we are. And the reason why things works ok in Credit2, is that
> > csched2_alloc_pdata() doesn't really allocate anything! In fact, in
> > general, handling alloc_pdata() during CPU_STARTING would mean that we
> > can't allocate any memory which, given the name of the function, would
> > look rather odd. :-)
> > 
> > Nevertheless I see the value of doing so, and hence I think what we
> > could do would be to introduce a new hook in the scheduler interface,
> > called .init_pdata or .init_pcpu, and, in sched_*.c, split the
> > allocation and the initialization parts. The former will be handled
> > during CPU_UP_PREPARE, when allocation is possible, the latter during
> > CPU_STARTING, when we have more info available to perform actual
> > initializations.
> 
> Another alternative would be a new CPU_ALIVE (name subject to
> change) notification after interrupts got enabled. 
>
I'd be certainly up to try this too, if it is considered better.

BTW, according to my greps, there aren't much users of the CPU_STARTING
callback... Credit2 really seems to be the only one:

[dario@Solace xen.git] $ grep CPU_STARTING xen/* -R
xen/common/sched_credit2.c:    case CPU_STARTING:
xen/common/cpu.c:        &cpu_chain, CPU_STARTING, hcpu, NULL);
xen/include/xen/cpu.h: *  CPU_UP_PREPARE -> CPU_STARTING -> CPU_ONLINE -- successful CPU up
xen/include/xen/cpu.h:/* CPU_STARTING: CPU nearly online. Runs on new CPU, irqs still disabled. */
xen/include/xen/cpu.h:#define CPU_STARTING     (0x0003 | NOTIFY_FORWARD)
xen/include/xen/cpu.h:/* From arch code, send CPU_STARTING notification. */

With my proposal, it will still be used by the scheduler code only,
although (potentially) by all schedulers, not by just Credit2.

If we bring in CPU_ALIVE, and we move _all_ scheduler related pCPU init
code (both allocation and actual initialization) in there, CPU_STARTING
will become unused.

I'm not sure this means we could then remove it, or either move it below
IRQ enabling and just use it (instead of adding another). In fact, it
perhaps would still be good to have the chance, in future, to execute
code on the cpu with IRQ disabled, or not? In any case, I thought this
was worth mentioning.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-19 10:50                         ` Jan Beulich
  2015-03-19 11:23                           ` Dario Faggioli
@ 2015-03-19 11:40                           ` George Dunlap
  2015-03-19 12:29                             ` Dario Faggioli
  1 sibling, 1 reply; 25+ messages in thread
From: George Dunlap @ 2015-03-19 11:40 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: AndrewCooper, xen-devel, George Dunlap, uma.sharma523

On 03/19/2015 10:50 AM, Jan Beulich wrote:
>>>> On 19.03.15 at 11:03, <dario.faggioli@citrix.com> wrote:
>> OTOH, CPU_STARTING callbacks run:
>>  - on the cpu being brought up;
>>  - with interrupt disabled (see how the call to local_irq_enable(), in
>>    start_secondary(), is *after* the invocation of
>>    notify_cpu_starting()).
>>
>> Here we are. And the reason why things works ok in Credit2, is that
>> csched2_alloc_pdata() doesn't really allocate anything! In fact, in
>> general, handling alloc_pdata() during CPU_STARTING would mean that we
>> can't allocate any memory which, given the name of the function, would
>> look rather odd. :-)
>>
>> Nevertheless I see the value of doing so, and hence I think what we
>> could do would be to introduce a new hook in the scheduler interface,
>> called .init_pdata or .init_pcpu, and, in sched_*.c, split the
>> allocation and the initialization parts. The former will be handled
>> during CPU_UP_PREPARE, when allocation is possible, the latter during
>> CPU_STARTING, when we have more info available to perform actual
>> initializations.
> 
> Another alternative would be a new CPU_ALIVE (name subject to
> change) notification after interrupts got enabled. That would (as
> a follow-up cleanup) also allow the MTRR and microcode setup on
> the CPU to no longer need explicit calls (which look reversed
> anyway - surely we should update microcode before fiddling with
> MTRRs).

local_irq_enable() happens after setting the cpu as online in
cpu_online_map; not having the scheduler ready to actually schedule on
it at that time seems like it's asking for trouble.

/me pokes around and thinks some more...

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-19 11:40                           ` George Dunlap
@ 2015-03-19 12:29                             ` Dario Faggioli
  2015-03-19 12:35                               ` George Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2015-03-19 12:29 UTC (permalink / raw)
  Cc: Andrew Cooper, xen-devel, George Dunlap, JBeulich, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 1377 bytes --]

On Thu, 2015-03-19 at 11:40 +0000, George Dunlap wrote:
> On 03/19/2015 10:50 AM, Jan Beulich wrote:
> >>>> On 19.03.15 at 11:03, <dario.faggioli@citrix.com> wrote:

> >> Nevertheless I see the value of doing so, and hence I think what we
> >> could do would be to introduce a new hook in the scheduler interface,
> >> called .init_pdata or .init_pcpu, and, in sched_*.c, split the
> >> allocation and the initialization parts. The former will be handled
> >> during CPU_UP_PREPARE, when allocation is possible, the latter during
> >> CPU_STARTING, when we have more info available to perform actual
> >> initializations.
> > 
> > Another alternative would be a new CPU_ALIVE (name subject to
> > change) notification after interrupts got enabled. That would (as
> > a follow-up cleanup) also allow the MTRR and microcode setup on
> > the CPU to no longer need explicit calls (which look reversed
> > anyway - surely we should update microcode before fiddling with
> > MTRRs).
> 
> local_irq_enable() happens after setting the cpu as online in
> cpu_online_map; not having the scheduler ready to actually schedule on
> it at that time seems like it's asking for trouble.
> 
Right.

> /me pokes around and thinks some more...
> 
So, if I can ask, how about my idea of splitting alloc_ and init_ parts
of pCPU initialization ? :-)

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-19 12:29                             ` Dario Faggioli
@ 2015-03-19 12:35                               ` George Dunlap
  2015-03-19 13:00                                 ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2015-03-19 12:35 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap
  Cc: Andrew Cooper, xen-devel, JBeulich, uma.sharma523

On 03/19/2015 12:29 PM, Dario Faggioli wrote:
> On Thu, 2015-03-19 at 11:40 +0000, George Dunlap wrote:
>> On 03/19/2015 10:50 AM, Jan Beulich wrote:
>>>>>> On 19.03.15 at 11:03, <dario.faggioli@citrix.com> wrote:
> 
>>>> Nevertheless I see the value of doing so, and hence I think what we
>>>> could do would be to introduce a new hook in the scheduler interface,
>>>> called .init_pdata or .init_pcpu, and, in sched_*.c, split the
>>>> allocation and the initialization parts. The former will be handled
>>>> during CPU_UP_PREPARE, when allocation is possible, the latter during
>>>> CPU_STARTING, when we have more info available to perform actual
>>>> initializations.
>>>
>>> Another alternative would be a new CPU_ALIVE (name subject to
>>> change) notification after interrupts got enabled. That would (as
>>> a follow-up cleanup) also allow the MTRR and microcode setup on
>>> the CPU to no longer need explicit calls (which look reversed
>>> anyway - surely we should update microcode before fiddling with
>>> MTRRs).
>>
>> local_irq_enable() happens after setting the cpu as online in
>> cpu_online_map; not having the scheduler ready to actually schedule on
>> it at that time seems like it's asking for trouble.
>>
> Right.
> 
>> /me pokes around and thinks some more...
>>
> So, if I can ask, how about my idea of splitting alloc_ and init_ parts
> of pCPU initialization ? :-)

Architecturally, from some points of view it makes sense -- actually
having "alloc_pdata" mean "initialize the cpu" is a bit weird; from
other points of view, it would be nicer not to multiply callbacks and
make the interface more complicated.

But from a practical point of view, this path is already more work than
I was expecting it to be, so I don't think we should spend *too* much
time looking for alternatives.  If that seems like the best option at
the moment, then I'm fine with it.

 -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
  2015-03-19 12:35                               ` George Dunlap
@ 2015-03-19 13:00                                 ` Dario Faggioli
  0 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2015-03-19 13:00 UTC (permalink / raw)
  Cc: Andrew Cooper, xen-devel, George Dunlap, JBeulich, uma.sharma523


[-- Attachment #1.1: Type: text/plain, Size: 1561 bytes --]

On Thu, 2015-03-19 at 12:35 +0000, George Dunlap wrote:
> On 03/19/2015 12:29 PM, Dario Faggioli wrote:

> > So, if I can ask, how about my idea of splitting alloc_ and init_ parts
> > of pCPU initialization ? :-)
> 
> Architecturally, from some points of view it makes sense
>
It does, doesn't it? That's why I like it: it fixes the issue we have,
but in an architecturally sensible and non hackish way, IMO.

> from
> other points of view, it would be nicer not to multiply callbacks and
> make the interface more complicated.
> 
I see what you mean, and I agree. Well, from an arithmetic point of
view, this will allow us to get rid of the .global_init hook, so the
numbers of hook will be unchanged! ;-P

Jokes apart, complexity has to be added to solve the issue, it's either
this patch or the one from earlier in the thread which checked
system_state==SYS_STATE_boot in csched2_alloc_pdata  (with the added >=
SYS_STATE_active for the cpupool case, of course).

As you said in the first place, reducing dependencies, or at least
making them easier to track, it's of some value, and I think the
alloc_/init_ splitting approach goes in that direction.

> But from a practical point of view, this path is already more work than
> I was expecting it to be, so I don't think we should spend *too* much
> time looking for alternatives.  If that seems like the best option at
> the moment, then I'm fine with it.
> 
Great. I'll put a proper series together, and let's see how it'll look
like. :-)

Thanks and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-03-19 13:00 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-13 18:11 [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Uma Sharma
2015-03-13 18:29 ` Andrew Cooper
2015-03-13 19:13   ` George Dunlap
2015-03-16 12:48     ` Jan Beulich
2015-03-16 12:51       ` George Dunlap
2015-03-16 12:56         ` Jan Beulich
2015-03-16 13:26           ` Dario Faggioli
2015-03-17 18:18           ` Dario Faggioli
2015-03-18  7:56             ` Jan Beulich
2015-03-18  8:53               ` Dario Faggioli
2015-03-18 15:26                 ` George Dunlap
2015-03-18 15:59                   ` Jan Beulich
2015-03-18 16:08                     ` George Dunlap
2015-03-18 16:30                       ` Dario Faggioli
2015-03-18 16:49                   ` Dario Faggioli
2015-03-18 17:05                     ` George Dunlap
2015-03-19 10:03                       ` Dario Faggioli
2015-03-19 10:50                         ` Jan Beulich
2015-03-19 11:23                           ` Dario Faggioli
2015-03-19 11:40                           ` George Dunlap
2015-03-19 12:29                             ` Dario Faggioli
2015-03-19 12:35                               ` George Dunlap
2015-03-19 13:00                                 ` Dario Faggioli
2015-03-16 12:45 ` Jan Beulich
2015-03-16 12:49 ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.