All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Credit2: fix per-socket runqueue setup
@ 2014-08-22 17:15 Dario Faggioli
  2014-08-22 17:15 ` [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu Dario Faggioli
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dario Faggioli @ 2014-08-22 17:15 UTC (permalink / raw)
  To: xen-devel; +Cc: george.dunlap, Andrew Cooper, keir, Jan Beulich

Hi everyone,

Code is already there in the credit2 scheduler, for setting up one runqueue per
each socket found in the system, despite the fact that comments in
sched_credit2.c still report this (well, something like this )to be a TODO
item.

However, such code is not functional. In fact, on the following host:

cpu_topology           :
cpu:    core    socket     node
  0:       0        1        0
  1:       0        1        0
  2:       1        1        0
  3:       1        1        0
  4:       2        1        0
  5:       2        1        0
  6:       8        1        0
  7:       8        1        0
  8:       9        1        0
  9:       9        1        0
 10:      10        1        0
 11:      10        1        0
 12:       0        0        1
 13:       0        0        1
 14:       1        0        1
 15:       1        0        1
 16:       2        0        1
 17:       2        0        1
 18:       8        0        1
 19:       8        0        1
 20:       9        0        1
 21:       9        0        1
 22:      10        0        1
 23:      10        0        1
numa_info              :
node:    memsize    memfree    distances
   0:     18432      17488      10,20
   1:     19200      17932      20,10

What I see during boot and scheduler initialization is as follows (for all the
24 pCPUs):

(XEN) init_pcpu: Strange, cpu 1 already initialized!
(XEN) init_pcpu: Strange, cpu 2 already initialized!
...
(XEN) init_pcpu: Strange, cpu 23 already initialized!

Also, once the system booted, here's what happened:
root@tg03:~# xl dmesg |grep -i runqueue
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 1 to runqueue 0
(XEN) Adding cpu 2 to runqueue 0
...
(XEN) Adding cpu 23 to runqueue 0

I.e., only one runqueue was created, despite the host having 2 sockets, and all
the pCPUs have been assigned to it.

This series fixes that by doing the following three things:
- anticipate the identification of the topology of the boot CPU to happen
  _before_ the scheduler's initialization. This is necessary to make sure that,
  when credit2 will try to figure out on which socket CPU 0 it, it will find the
  correct info (patch 1);
- properly initialize the cpuinfo_x86 data structure, and more specifically the
  phys_proc_id field, that hosts the socket, to something invalid (-1). In fact,
  having that field in all elements set to 0 would induce credit2 to think that
  the pCPU have already been initialized, and that all are on socket 0 (patch 1); 
- use the info made available by early boot CPU identification in credit2
  (patch 2).

After this series is applied, no more "Strange, cpu X already initialized!" can
be found in the logs. Also, here's how the scheduler is being setup:

root@tg03:~# xl dmesg |grep -i runqueue
(XEN) Adding cpu 0 to runqueue 1
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 1 to runqueue 1
(XEN) Adding cpu 2 to runqueue 1
(XEN) Adding cpu 3 to runqueue 1
(XEN) Adding cpu 4 to runqueue 1
(XEN) Adding cpu 5 to runqueue 1
(XEN) Adding cpu 6 to runqueue 1
(XEN) Adding cpu 7 to runqueue 1
(XEN) Adding cpu 8 to runqueue 1
(XEN) Adding cpu 9 to runqueue 1
(XEN) Adding cpu 10 to runqueue 1
(XEN) Adding cpu 11 to runqueue 1
(XEN) Adding cpu 12 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 13 to runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Adding cpu 15 to runqueue 0
(XEN) Adding cpu 16 to runqueue 0
(XEN) Adding cpu 17 to runqueue 0
(XEN) Adding cpu 18 to runqueue 0
(XEN) Adding cpu 19 to runqueue 0
(XEN) Adding cpu 20 to runqueue 0
(XEN) Adding cpu 21 to runqueue 0
(XEN) Adding cpu 22 to runqueue 0
(XEN) Adding cpu 23 to runqueue 0

Which makes a lot more sense. :-)

Regards,
Dario

---
Dario Faggioli (2):
      x86: during boot, anticipate identifying the boot cpu
      sched: credit2: use boot CPU info for CPU #0


 xen/arch/x86/setup.c            |    8 ++++++--
 xen/arch/x86/smpboot.c          |    3 ++-
 xen/common/sched_credit2.c      |   12 +++++-------
 xen/include/asm-x86/processor.h |    6 ++++--
 4 files changed, 17 insertions(+), 12 deletions(-)

--
s happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-08-22 17:15 [PATCH 0/2] Credit2: fix per-socket runqueue setup Dario Faggioli
@ 2014-08-22 17:15 ` Dario Faggioli
  2014-08-22 17:28   ` Andrew Cooper
  2014-08-25  8:39   ` Jan Beulich
  2014-08-22 17:15 ` [PATCH 2/2] sched: credit2: use boot CPU info for CPU #0 Dario Faggioli
  2014-08-25  8:31 ` [PATCH 0/2] Credit2: fix per-socket runqueue setup Jan Beulich
  2 siblings, 2 replies; 14+ messages in thread
From: Dario Faggioli @ 2014-08-22 17:15 UTC (permalink / raw)
  To: xen-devel; +Cc: george.dunlap, Andrew Cooper, keir, Jan Beulich

and provide helpers to access the core and socket IDs,
resulting from that identification phase.

Also, initialize the socket ID of all the cpus to something
invalid (-1), rather than leaving it as 0, which is at risk
of being confused with "this CPU is on socket 0".

All this is right now relevant to credit2 only, but may be
useful in other schedulers, or elsewhere in general.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/x86/setup.c            |    8 ++++++--
 xen/arch/x86/smpboot.c          |    3 ++-
 xen/include/asm-x86/processor.h |    6 ++++--
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 6a814cd..f39fdf0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1262,6 +1262,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     timer_init();
 
+    /*
+     * Identify the boot CPU, in case the scheduler initialization
+     * needs to know about it (e.g., topology, etc.)
+     */
+    identify_cpu(&boot_cpu_data);
+
     init_idle_domain();
 
     trap_init();
@@ -1272,8 +1278,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     arch_init_memory();
 
-    identify_cpu(&boot_cpu_data);
-
     if ( cpu_has_fxsr )
         set_in_cr4(X86_CR4_OSFXSR);
     if ( cpu_has_xmm )
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 84f2d25..16a7474 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -59,7 +59,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
-struct cpuinfo_x86 cpu_data[NR_CPUS];
+struct cpuinfo_x86 cpu_data[NR_CPUS] =
+	{ [0 ... NR_CPUS-1] = { .phys_proc_id=-1 } };
 
 u32 x86_cpu_to_apicid[NR_CPUS] __read_mostly =
 	{ [0 ... NR_CPUS-1] = BAD_APICID };
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index a156e01..76d47de 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -213,8 +213,10 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
 
 extern void detect_ht(struct cpuinfo_x86 *c);
 
-#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
-#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
+#define cpu_to_core(_cpu)    (cpu_data[_cpu].cpu_core_id)
+#define cpu_to_socket(_cpu)  (cpu_data[_cpu].phys_proc_id)
+#define boot_cpu_to_core()   (boot_cpu_data.phys_core_id)
+#define boot_cpu_to_socket() (boot_cpu_data.phys_proc_id)
 
 unsigned int apicid_to_socket(unsigned int);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] sched: credit2: use boot CPU info for CPU #0
  2014-08-22 17:15 [PATCH 0/2] Credit2: fix per-socket runqueue setup Dario Faggioli
  2014-08-22 17:15 ` [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu Dario Faggioli
@ 2014-08-22 17:15 ` Dario Faggioli
  2014-08-25  8:41   ` Jan Beulich
  2014-08-25  8:31 ` [PATCH 0/2] Credit2: fix per-socket runqueue setup Jan Beulich
  2 siblings, 1 reply; 14+ messages in thread
From: Dario Faggioli @ 2014-08-22 17:15 UTC (permalink / raw)
  To: xen-devel; +Cc: george.dunlap, Andrew Cooper, keir, Jan Beulich

when deciding, during pCPU initialization, inside the scheduler
what runqueue should be associated to it.

This, thanks to the fact that topology identification of the
boot CPU now happens earlier than the scheduler initialization,
fixes the bug that CPU 0 was always being associated with runqueue
0, no matter what the actual topology is.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/common/sched_credit2.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 1ca521b..1bcd6c0 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -54,7 +54,6 @@
  * + Immediate bug-fixes
  *  - Do per-runqueue, grab proper lock for dump debugkey
  * + Multiple sockets
- *  - Detect cpu layout and make runqueue map, one per L2 (make_runq_map())
  *  - Simple load balancer / runqueue assignment
  *  - Runqueue load measurement
  *  - Load-based load balancer
@@ -85,8 +84,8 @@
  * to a small value, and a fixed credit is added to everyone.
  *
  * The plan is for all cores that share an L2 will share the same
- * runqueue.  At the moment, there is one global runqueue for all
- * cores.
+ * runqueue.  At the moment, there is one runqueue for each physical
+ * socket.
  */
 
 /*
@@ -1935,13 +1934,12 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
         return;
     }
 
-    /* Figure out which runqueue to put it in */
+    /* Figure out which runqueue to put cpu in */
     rqi = 0;
 
-    /* Figure out which runqueue to put it in */
-    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
+    /* cpu 0 doesn't get a STARTING callback, so use boot CPU data for it */
     if ( cpu == 0 )
-        rqi = 0;
+        rqi = boot_cpu_to_socket();
     else
         rqi = cpu_to_socket(cpu);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-08-22 17:15 ` [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu Dario Faggioli
@ 2014-08-22 17:28   ` Andrew Cooper
  2014-08-22 18:40     ` Dario Faggioli
  2014-08-25  8:35     ` Jan Beulich
  2014-08-25  8:39   ` Jan Beulich
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Cooper @ 2014-08-22 17:28 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: george.dunlap, keir, Jan Beulich

On 22/08/14 18:15, Dario Faggioli wrote:
> and provide helpers to access the core and socket IDs,
> resulting from that identification phase.
>
> Also, initialize the socket ID of all the cpus to something
> invalid (-1), rather than leaving it as 0, which is at risk
> of being confused with "this CPU is on socket 0".
>
> All this is right now relevant to credit2 only, but may be
> useful in other schedulers, or elsewhere in general.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
>  xen/arch/x86/setup.c            |    8 ++++++--
>  xen/arch/x86/smpboot.c          |    3 ++-
>  xen/include/asm-x86/processor.h |    6 ++++--
>  3 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 6a814cd..f39fdf0 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1262,6 +1262,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      timer_init();
>  
> +    /*
> +     * Identify the boot CPU, in case the scheduler initialization
> +     * needs to know about it (e.g., topology, etc.)
> +     */
> +    identify_cpu(&boot_cpu_data);
> +
>      init_idle_domain();
>  
>      trap_init();
> @@ -1272,8 +1278,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      arch_init_memory();
>  
> -    identify_cpu(&boot_cpu_data);
> -
>      if ( cpu_has_fxsr )
>          set_in_cr4(X86_CR4_OSFXSR);
>      if ( cpu_has_xmm )
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 84f2d25..16a7474 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -59,7 +59,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>  
> -struct cpuinfo_x86 cpu_data[NR_CPUS];
> +struct cpuinfo_x86 cpu_data[NR_CPUS] =
> +	{ [0 ... NR_CPUS-1] = { .phys_proc_id=-1 } };

This moves a huge chunk of data from .bss to .data.  Can it not be fixed
by enumerating topology on APs before setting scheduling up?

>  
>  u32 x86_cpu_to_apicid[NR_CPUS] __read_mostly =
>  	{ [0 ... NR_CPUS-1] = BAD_APICID };
> diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
> index a156e01..76d47de 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -213,8 +213,10 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
>  
>  extern void detect_ht(struct cpuinfo_x86 *c);
>  
> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
> +#define cpu_to_core(_cpu)    (cpu_data[_cpu].cpu_core_id)
> +#define cpu_to_socket(_cpu)  (cpu_data[_cpu].phys_proc_id)
> +#define boot_cpu_to_core()   (boot_cpu_data.phys_core_id)
> +#define boot_cpu_to_socket() (boot_cpu_data.phys_proc_id)

This change is pointless.  Anything explicitly referencing the bsp
should just use boot_cpu_data directly, rather than obscuring the IDs
behind macros like this.

~Andrew

>  
>  unsigned int apicid_to_socket(unsigned int);
>  
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-08-22 17:28   ` Andrew Cooper
@ 2014-08-22 18:40     ` Dario Faggioli
  2014-08-25  8:35     ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Dario Faggioli @ 2014-08-22 18:40 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), George Dunlap, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3602 bytes --]

Hi,

Thanks Andrew for the quick review, and sorry for the HTML and for top posting.

Just letting know that I'm on vacation next week. I'll look at the comments when back on Sept 1st.

Regards, Dario


-------- Messaggio originale --------
Da: Andrew Cooper
Data:22/08/2014 19:28 (GMT+01:00)
A: Dario Faggioli ,xen-devel
Cc: George Dunlap ,"Keir (Xen.org)" ,Jan Beulich
Oggetto: Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu

On 22/08/14 18:15, Dario Faggioli wrote:
> and provide helpers to access the core and socket IDs,
> resulting from that identification phase.
>
> Also, initialize the socket ID of all the cpus to something
> invalid (-1), rather than leaving it as 0, which is at risk
> of being confused with "this CPU is on socket 0".
>
> All this is right now relevant to credit2 only, but may be
> useful in other schedulers, or elsewhere in general.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
>  xen/arch/x86/setup.c            |    8 ++++++--
>  xen/arch/x86/smpboot.c          |    3 ++-
>  xen/include/asm-x86/processor.h |    6 ++++--
>  3 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 6a814cd..f39fdf0 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1262,6 +1262,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>
>      timer_init();
>
> +    /*
> +     * Identify the boot CPU, in case the scheduler initialization
> +     * needs to know about it (e.g., topology, etc.)
> +     */
> +    identify_cpu(&boot_cpu_data);
> +
>      init_idle_domain();
>
>      trap_init();
> @@ -1272,8 +1278,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>
>      arch_init_memory();
>
> -    identify_cpu(&boot_cpu_data);
> -
>      if ( cpu_has_fxsr )
>          set_in_cr4(X86_CR4_OSFXSR);
>      if ( cpu_has_xmm )
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 84f2d25..16a7474 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -59,7 +59,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>
> -struct cpuinfo_x86 cpu_data[NR_CPUS];
> +struct cpuinfo_x86 cpu_data[NR_CPUS] =
> +     { [0 ... NR_CPUS-1] = { .phys_proc_id=-1 } };

This moves a huge chunk of data from .bss to .data.  Can it not be fixed
by enumerating topology on APs before setting scheduling up?

>
>  u32 x86_cpu_to_apicid[NR_CPUS] __read_mostly =
>        { [0 ... NR_CPUS-1] = BAD_APICID };
> diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
> index a156e01..76d47de 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -213,8 +213,10 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
>
>  extern void detect_ht(struct cpuinfo_x86 *c);
>
> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
> +#define cpu_to_core(_cpu)    (cpu_data[_cpu].cpu_core_id)
> +#define cpu_to_socket(_cpu)  (cpu_data[_cpu].phys_proc_id)
> +#define boot_cpu_to_core()   (boot_cpu_data.phys_core_id)
> +#define boot_cpu_to_socket() (boot_cpu_data.phys_proc_id)

This change is pointless.  Anything explicitly referencing the bsp
should just use boot_cpu_data directly, rather than obscuring the IDs
behind macros like this.

~Andrew

>
>  unsigned int apicid_to_socket(unsigned int);
>
>


[-- Attachment #1.2: Type: text/html, Size: 5595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] Credit2: fix per-socket runqueue setup
  2014-08-22 17:15 [PATCH 0/2] Credit2: fix per-socket runqueue setup Dario Faggioli
  2014-08-22 17:15 ` [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu Dario Faggioli
  2014-08-22 17:15 ` [PATCH 2/2] sched: credit2: use boot CPU info for CPU #0 Dario Faggioli
@ 2014-08-25  8:31 ` Jan Beulich
  2014-09-01 13:59   ` George Dunlap
  2 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-08-25  8:31 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: george.dunlap, Andrew Cooper, keir, xen-devel

>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
> root@tg03:~# xl dmesg |grep -i runqueue
> (XEN) Adding cpu 0 to runqueue 1
> (XEN)  First cpu on runqueue, activating
> (XEN) Adding cpu 1 to runqueue 1
> (XEN) Adding cpu 2 to runqueue 1
> (XEN) Adding cpu 3 to runqueue 1
> (XEN) Adding cpu 4 to runqueue 1
> (XEN) Adding cpu 5 to runqueue 1
> (XEN) Adding cpu 6 to runqueue 1
> (XEN) Adding cpu 7 to runqueue 1
> (XEN) Adding cpu 8 to runqueue 1
> (XEN) Adding cpu 9 to runqueue 1
> (XEN) Adding cpu 10 to runqueue 1
> (XEN) Adding cpu 11 to runqueue 1
> (XEN) Adding cpu 12 to runqueue 0
> (XEN)  First cpu on runqueue, activating
> (XEN) Adding cpu 13 to runqueue 0
> (XEN) Adding cpu 14 to runqueue 0
> (XEN) Adding cpu 15 to runqueue 0
> (XEN) Adding cpu 16 to runqueue 0
> (XEN) Adding cpu 17 to runqueue 0
> (XEN) Adding cpu 18 to runqueue 0
> (XEN) Adding cpu 19 to runqueue 0
> (XEN) Adding cpu 20 to runqueue 0
> (XEN) Adding cpu 21 to runqueue 0
> (XEN) Adding cpu 22 to runqueue 0
> (XEN) Adding cpu 23 to runqueue 0
> 
> Which makes a lot more sense. :-)

But it looks suspicious that the low numbered CPUs get assigned to
runqueue 1. Is there an explanation for this, or are surprises to be
expected on larger than dual-socket systems?

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-08-22 17:28   ` Andrew Cooper
  2014-08-22 18:40     ` Dario Faggioli
@ 2014-08-25  8:35     ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-08-25  8:35 UTC (permalink / raw)
  To: Andrew Cooper, Dario Faggioli, xen-devel; +Cc: george.dunlap, keir

>>> On 22.08.14 at 19:28, <andrew.cooper3@citrix.com> wrote:
> On 22/08/14 18:15, Dario Faggioli wrote:
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -59,7 +59,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>>  cpumask_t cpu_online_map __read_mostly;
>>  EXPORT_SYMBOL(cpu_online_map);
>>  
>> -struct cpuinfo_x86 cpu_data[NR_CPUS];
>> +struct cpuinfo_x86 cpu_data[NR_CPUS] =
>> +	{ [0 ... NR_CPUS-1] = { .phys_proc_id=-1 } };
> 
> This moves a huge chunk of data from .bss to .data.  Can it not be fixed
> by enumerating topology on APs before setting scheduling up?

Yeah, I was intending to ask the same thing. Just set .phys_proc_id
to -1 early during CPU bringup.

>> --- a/xen/include/asm-x86/processor.h
>> +++ b/xen/include/asm-x86/processor.h
>> @@ -213,8 +213,10 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
>>  
>>  extern void detect_ht(struct cpuinfo_x86 *c);
>>  
>> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
>> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
>> +#define cpu_to_core(_cpu)    (cpu_data[_cpu].cpu_core_id)
>> +#define cpu_to_socket(_cpu)  (cpu_data[_cpu].phys_proc_id)
>> +#define boot_cpu_to_core()   (boot_cpu_data.phys_core_id)
>> +#define boot_cpu_to_socket() (boot_cpu_data.phys_proc_id)
> 
> This change is pointless.  Anything explicitly referencing the bsp
> should just use boot_cpu_data directly, rather than obscuring the IDs
> behind macros like this.

If the purpose is to use these in arch-independent code, I think
the "obscuring" actually makes sense.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-08-22 17:15 ` [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu Dario Faggioli
  2014-08-22 17:28   ` Andrew Cooper
@ 2014-08-25  8:39   ` Jan Beulich
  2014-09-01 15:12     ` George Dunlap
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-08-25  8:39 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: george.dunlap, Andrew Cooper, keir, xen-devel

>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1262,6 +1262,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      timer_init();
>  
> +    /*
> +     * Identify the boot CPU, in case the scheduler initialization
> +     * needs to know about it (e.g., topology, etc.)
> +     */
> +    identify_cpu(&boot_cpu_data);
> +
>      init_idle_domain();
>  
>      trap_init();
> @@ -1272,8 +1278,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      arch_init_memory();
>  
> -    identify_cpu(&boot_cpu_data);
> -
>      if ( cpu_has_fxsr )
>          set_in_cr4(X86_CR4_OSFXSR);
>      if ( cpu_has_xmm )

I'm not sure about this part: It currently makes quite a bit of sense
to have identify_cpu() immediately before explicit users of the
gathered data (all code following up to the alternative_instructions()
call). Perhaps if you move identify_cpu(), all the others should get
moved too?

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched: credit2: use boot CPU info for CPU #0
  2014-08-22 17:15 ` [PATCH 2/2] sched: credit2: use boot CPU info for CPU #0 Dario Faggioli
@ 2014-08-25  8:41   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-08-25  8:41 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: george.dunlap, Andrew Cooper, keir, xen-devel

>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
> -    /* Figure out which runqueue to put it in */
> -    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
> +    /* cpu 0 doesn't get a STARTING callback, so use boot CPU data for it */
>      if ( cpu == 0 )
> -        rqi = 0;
> +        rqi = boot_cpu_to_socket();

A change like this should also convert the "cpu == 0" condition to
something guaranteeing this is the boot CPU (both for the [unlikely]
case of the boot CPU being other than CPU 0 and for the [more
likely] case of CPU 0 having got brought down and then being
brought back up again.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] Credit2: fix per-socket runqueue setup
  2014-08-25  8:31 ` [PATCH 0/2] Credit2: fix per-socket runqueue setup Jan Beulich
@ 2014-09-01 13:59   ` George Dunlap
  2014-09-02 16:46     ` Dario Faggioli
  0 siblings, 1 reply; 14+ messages in thread
From: George Dunlap @ 2014-09-01 13:59 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli; +Cc: Andrew Cooper, keir, xen-devel

On 08/25/2014 09:31 AM, Jan Beulich wrote:
>>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
>> root@tg03:~# xl dmesg |grep -i runqueue
>> (XEN) Adding cpu 0 to runqueue 1
>> (XEN)  First cpu on runqueue, activating
>> (XEN) Adding cpu 1 to runqueue 1
>> (XEN) Adding cpu 2 to runqueue 1
>> (XEN) Adding cpu 3 to runqueue 1
>> (XEN) Adding cpu 4 to runqueue 1
>> (XEN) Adding cpu 5 to runqueue 1
>> (XEN) Adding cpu 6 to runqueue 1
>> (XEN) Adding cpu 7 to runqueue 1
>> (XEN) Adding cpu 8 to runqueue 1
>> (XEN) Adding cpu 9 to runqueue 1
>> (XEN) Adding cpu 10 to runqueue 1
>> (XEN) Adding cpu 11 to runqueue 1
>> (XEN) Adding cpu 12 to runqueue 0
>> (XEN)  First cpu on runqueue, activating
>> (XEN) Adding cpu 13 to runqueue 0
>> (XEN) Adding cpu 14 to runqueue 0
>> (XEN) Adding cpu 15 to runqueue 0
>> (XEN) Adding cpu 16 to runqueue 0
>> (XEN) Adding cpu 17 to runqueue 0
>> (XEN) Adding cpu 18 to runqueue 0
>> (XEN) Adding cpu 19 to runqueue 0
>> (XEN) Adding cpu 20 to runqueue 0
>> (XEN) Adding cpu 21 to runqueue 0
>> (XEN) Adding cpu 22 to runqueue 0
>> (XEN) Adding cpu 23 to runqueue 0
>>
>> Which makes a lot more sense. :-)
> But it looks suspicious that the low numbered CPUs get assigned to
> runqueue 1. Is there an explanation for this, or are surprises to be
> expected on larger than dual-socket systems?

Well the explanation is most likely from the cpu_topology info from the 
cover letter (0/2): On his machine, cpus 0-11 are on socket 1, and cpus 
12-23 are on socket 0.  Why that's the topology reported (I presume in 
ACPI?) I'm not sure.

  -George

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-08-25  8:39   ` Jan Beulich
@ 2014-09-01 15:12     ` George Dunlap
  2014-09-01 15:24       ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: George Dunlap @ 2014-09-01 15:12 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli; +Cc: Andrew Cooper, keir, xen-devel

On 08/25/2014 09:39 AM, Jan Beulich wrote:
>>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1262,6 +1262,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>   
>>       timer_init();
>>   
>> +    /*
>> +     * Identify the boot CPU, in case the scheduler initialization
>> +     * needs to know about it (e.g., topology, etc.)
>> +     */
>> +    identify_cpu(&boot_cpu_data);
>> +
>>       init_idle_domain();
>>   
>>       trap_init();
>> @@ -1272,8 +1278,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>   
>>       arch_init_memory();
>>   
>> -    identify_cpu(&boot_cpu_data);
>> -
>>       if ( cpu_has_fxsr )
>>           set_in_cr4(X86_CR4_OSFXSR);
>>       if ( cpu_has_xmm )
> I'm not sure about this part: It currently makes quite a bit of sense
> to have identify_cpu() immediately before explicit users of the
> gathered data (all code following up to the alternative_instructions()
> call). Perhaps if you move identify_cpu(), all the others should get
> moved too?

Or possibly, move init_idle_domain() until after?  I took a quick look 
through the functions called between init_idle_domain() and 
identify_cpu(), and didn't see any dependencies on the idle domain or 
scheduling functions.  It's quite possible I missed something though.

Overall I think this approach looks good, and I agree with the comments 
already given.

  -George

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu
  2014-09-01 15:12     ` George Dunlap
@ 2014-09-01 15:24       ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-09-01 15:24 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap; +Cc: Andrew Cooper, keir, xen-devel

>>> On 01.09.14 at 17:12, <george.dunlap@eu.citrix.com> wrote:
> On 08/25/2014 09:39 AM, Jan Beulich wrote:
>>>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
>>> --- a/xen/arch/x86/setup.c
>>> +++ b/xen/arch/x86/setup.c
>>> @@ -1262,6 +1262,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>   
>>>       timer_init();
>>>   
>>> +    /*
>>> +     * Identify the boot CPU, in case the scheduler initialization
>>> +     * needs to know about it (e.g., topology, etc.)
>>> +     */
>>> +    identify_cpu(&boot_cpu_data);
>>> +
>>>       init_idle_domain();
>>>   
>>>       trap_init();
>>> @@ -1272,8 +1278,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>   
>>>       arch_init_memory();
>>>   
>>> -    identify_cpu(&boot_cpu_data);
>>> -
>>>       if ( cpu_has_fxsr )
>>>           set_in_cr4(X86_CR4_OSFXSR);
>>>       if ( cpu_has_xmm )
>> I'm not sure about this part: It currently makes quite a bit of sense
>> to have identify_cpu() immediately before explicit users of the
>> gathered data (all code following up to the alternative_instructions()
>> call). Perhaps if you move identify_cpu(), all the others should get
>> moved too?
> 
> Or possibly, move init_idle_domain() until after?  I took a quick look 
> through the functions called between init_idle_domain() and 
> identify_cpu(), and didn't see any dependencies on the idle domain or 
> scheduling functions.  It's quite possible I missed something though.

If it wasn't scheduler_init() setting up the idle domain, moving it
further down would have been an option; I'd rather not defer
set_current() more than necessary, namely e.g. past rcu_init()
(even if currently there may not be an actual dependency).

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] Credit2: fix per-socket runqueue setup
  2014-09-01 13:59   ` George Dunlap
@ 2014-09-02 16:46     ` Dario Faggioli
  2014-09-03 10:00       ` George Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: Dario Faggioli @ 2014-09-02 16:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, keir, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4205 bytes --]

On lun, 2014-09-01 at 14:59 +0100, George Dunlap wrote:
> On 08/25/2014 09:31 AM, Jan Beulich wrote:
> >>>> On 22.08.14 at 19:15, <dario.faggioli@citrix.com> wrote:
> >> root@tg03:~# xl dmesg |grep -i runqueue
> >> (XEN) Adding cpu 0 to runqueue 1
> >> (XEN)  First cpu on runqueue, activating
> >> (XEN) Adding cpu 1 to runqueue 1
> >> (XEN) Adding cpu 2 to runqueue 1
> >> (XEN) Adding cpu 3 to runqueue 1
> >> (XEN) Adding cpu 4 to runqueue 1
> >> (XEN) Adding cpu 5 to runqueue 1
> >> (XEN) Adding cpu 6 to runqueue 1
> >> (XEN) Adding cpu 7 to runqueue 1
> >> (XEN) Adding cpu 8 to runqueue 1
> >> (XEN) Adding cpu 9 to runqueue 1
> >> (XEN) Adding cpu 10 to runqueue 1
> >> (XEN) Adding cpu 11 to runqueue 1
> >> (XEN) Adding cpu 12 to runqueue 0
> >> (XEN)  First cpu on runqueue, activating
> >> (XEN) Adding cpu 13 to runqueue 0
> >> (XEN) Adding cpu 14 to runqueue 0
> >> (XEN) Adding cpu 15 to runqueue 0
> >> (XEN) Adding cpu 16 to runqueue 0
> >> (XEN) Adding cpu 17 to runqueue 0
> >> (XEN) Adding cpu 18 to runqueue 0
> >> (XEN) Adding cpu 19 to runqueue 0
> >> (XEN) Adding cpu 20 to runqueue 0
> >> (XEN) Adding cpu 21 to runqueue 0
> >> (XEN) Adding cpu 22 to runqueue 0
> >> (XEN) Adding cpu 23 to runqueue 0
> >>
> >> Which makes a lot more sense. :-)
> > But it looks suspicious that the low numbered CPUs get assigned to
> > runqueue 1. Is there an explanation for this, or are surprises to be
> > expected on larger than dual-socket systems?
> 
Not sure what kind of surprises you're thinking to, but I have a big box
handy. I'll test the new version of the series on it, and report what
happens.

> Well the explanation is most likely from the cpu_topology info from the 
> cover letter (0/2): On his machine, cpus 0-11 are on socket 1, and cpus 
> 12-23 are on socket 0.  
>
Exactly, here it is again, coming from `xl info -n'.

cpu_topology           :
cpu:    core    socket     node
  0:       0        1        0
  1:       0        1        0
  2:       1        1        0
  3:       1        1        0
  4:       2        1        0
  5:       2        1        0
  6:       8        1        0
  7:       8        1        0
  8:       9        1        0
  9:       9        1        0
 10:      10        1        0
 11:      10        1        0
 12:       0        0        1
 13:       0        0        1
 14:       1        0        1
 15:       1        0        1
 16:       2        0        1
 17:       2        0        1
 18:       8        0        1
 19:       8        0        1
 20:       9        0        1
 21:       9        0        1
 22:      10        0        1
 23:      10        0        1

> Why that's the topology reported (I presume in 
> ACPI?) I'm not sure.
> 
Me neither. BTW, on baremetal, here's what I see:
root@tg03:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 18432 MB
node 0 free: 17927 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 18419 MB
node 1 free: 17926 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

Also:
root@tg03:~# for i in `seq 0 23`;do echo "CPU$i is on socket `cat /sys/bus/cpu/devices/cpu$i/topology/physical_package_id`";done
CPU0 is on socket 1
CPU1 is on socket 0
CPU2 is on socket 1
CPU3 is on socket 0
CPU4 is on socket 1
CPU5 is on socket 0
CPU6 is on socket 1
CPU7 is on socket 0
CPU8 is on socket 1
CPU9 is on socket 0
CPU10 is on socket 1
CPU11 is on socket 0
CPU12 is on socket 1
CPU13 is on socket 0
CPU14 is on socket 1
CPU15 is on socket 0
CPU16 is on socket 1
CPU17 is on socket 0
CPU18 is on socket 1
CPU19 is on socket 0
CPU20 is on socket 1
CPU21 is on socket 0
CPU22 is on socket 1
CPU23 is on socket 0

I've noticed this before, but, TBH, I never dug the cause of the
discrepancy between us and Linux.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] Credit2: fix per-socket runqueue setup
  2014-09-02 16:46     ` Dario Faggioli
@ 2014-09-03 10:00       ` George Dunlap
  0 siblings, 0 replies; 14+ messages in thread
From: George Dunlap @ 2014-09-03 10:00 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, keir, Jan Beulich, xen-devel

On 09/02/2014 05:46 PM, Dario Faggioli wrote:
> Me neither. BTW, on baremetal, here's what I see:
> root@tg03:~# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
> node 0 size: 18432 MB
> node 0 free: 17927 MB
> node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
> node 1 size: 18419 MB
> node 1 free: 17926 MB
> node distances:
> node   0   1
>    0:  10  20
>    1:  20  10
>
> Also:
> root@tg03:~# for i in `seq 0 23`;do echo "CPU$i is on socket `cat /sys/bus/cpu/devices/cpu$i/topology/physical_package_id`";done
> CPU0 is on socket 1
> CPU1 is on socket 0
> CPU2 is on socket 1
> CPU3 is on socket 0
> CPU4 is on socket 1
> CPU5 is on socket 0
> CPU6 is on socket 1
> CPU7 is on socket 0
> CPU8 is on socket 1
> CPU9 is on socket 0
> CPU10 is on socket 1
> CPU11 is on socket 0
> CPU12 is on socket 1
> CPU13 is on socket 0
> CPU14 is on socket 1
> CPU15 is on socket 0
> CPU16 is on socket 1
> CPU17 is on socket 0
> CPU18 is on socket 1
> CPU19 is on socket 0
> CPU20 is on socket 1
> CPU21 is on socket 0
> CPU22 is on socket 1
> CPU23 is on socket 0
>
> I've noticed this before, but, TBH, I never dug the cause of the
> discrepancy between us and Linux.

I remember at some point Xen purposely re-enumerating the cpu numbers so 
that they would have a more sensible arrangement -- i.e., you could 
expect logical cpus on the same thread / core / socket to be grouped 
together consecutively.

As you can see here though, cpu 0 is still on socket 1 (which is 
probably why Xen keeps cpu 0 on socket 1 in its re-enumertation).

  -George

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-03 10:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-22 17:15 [PATCH 0/2] Credit2: fix per-socket runqueue setup Dario Faggioli
2014-08-22 17:15 ` [PATCH 1/2] x86: during boot, anticipate identifying the boot cpu Dario Faggioli
2014-08-22 17:28   ` Andrew Cooper
2014-08-22 18:40     ` Dario Faggioli
2014-08-25  8:35     ` Jan Beulich
2014-08-25  8:39   ` Jan Beulich
2014-09-01 15:12     ` George Dunlap
2014-09-01 15:24       ` Jan Beulich
2014-08-22 17:15 ` [PATCH 2/2] sched: credit2: use boot CPU info for CPU #0 Dario Faggioli
2014-08-25  8:41   ` Jan Beulich
2014-08-25  8:31 ` [PATCH 0/2] Credit2: fix per-socket runqueue setup Jan Beulich
2014-09-01 13:59   ` George Dunlap
2014-09-02 16:46     ` Dario Faggioli
2014-09-03 10:00       ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.