All of lore.kernel.org
 help / color / mirror / Atom feed
* [tip:sched/core] sched/fair: Fix group power_orig computation
@ 2013-09-12 18:05 tip-bot for Peter Zijlstra
  2013-09-12 23:21 ` Michael Neuling
  2013-11-12 10:55 ` Srikar Dronamraju
  0 siblings, 2 replies; 32+ messages in thread
From: tip-bot for Peter Zijlstra @ 2013-09-12 18:05 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, mikey

Commit-ID:  863bffc80898b8df295ebac111af2335ec05f85d
Gitweb:     http://git.kernel.org/tip/863bffc80898b8df295ebac111af2335ec05f85d
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 28 Aug 2013 11:44:39 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 12 Sep 2013 19:14:43 +0200

sched/fair: Fix group power_orig computation

When looking at the code I noticed we don't actually compute
sgp->power_orig correctly for groups, fix that.

Currently the only consumer of that value is fix_small_capacity()
which is only used on POWER7+ and that code excludes this case by
being limited to SD_SHARE_CPUPOWER which is only ever set on the SMT
domain which must be the lowest domain and this has singleton groups.

So nothing should be affected by this change.

Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-db2pe0vxwunv37plc7onnugj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f9f4385..baba313 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4450,7 +4450,7 @@ void update_group_power(struct sched_domain *sd, int cpu)
 {
 	struct sched_domain *child = sd->child;
 	struct sched_group *group, *sdg = sd->groups;
-	unsigned long power;
+	unsigned long power, power_orig;
 	unsigned long interval;
 
 	interval = msecs_to_jiffies(sd->balance_interval);
@@ -4462,7 +4462,7 @@ void update_group_power(struct sched_domain *sd, int cpu)
 		return;
 	}
 
-	power = 0;
+	power_orig = power = 0;
 
 	if (child->flags & SD_OVERLAP) {
 		/*
@@ -4470,8 +4470,12 @@ void update_group_power(struct sched_domain *sd, int cpu)
 		 * span the current group.
 		 */
 
-		for_each_cpu(cpu, sched_group_cpus(sdg))
-			power += power_of(cpu);
+		for_each_cpu(cpu, sched_group_cpus(sdg)) {
+			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
+
+			power_orig += sg->sgp->power_orig;
+			power += sg->sgp->power;
+		}
 	} else  {
 		/*
 		 * !SD_OVERLAP domains can assume that child groups
@@ -4480,12 +4484,14 @@ void update_group_power(struct sched_domain *sd, int cpu)
 
 		group = child->groups;
 		do {
+			power_orig += group->sgp->power_orig;
 			power += group->sgp->power;
 			group = group->next;
 		} while (group != child->groups);
 	}
 
-	sdg->sgp->power_orig = sdg->sgp->power = power;
+	sdg->sgp->power_orig = power_orig;
+	sdg->sgp->power = power;
 }
 
 /*

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/core] sched/fair: Fix group power_orig computation
  2013-09-12 18:05 [tip:sched/core] sched/fair: Fix group power_orig computation tip-bot for Peter Zijlstra
@ 2013-09-12 23:21 ` Michael Neuling
  2013-11-12 10:55 ` Srikar Dronamraju
  1 sibling, 0 replies; 32+ messages in thread
From: Michael Neuling @ 2013-09-12 23:21 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, peterz, tglx, tip-bot for Peter Zijlstra
  Cc: linux-tip-commits

tip-bot for Peter Zijlstra <tipbot@zytor.com> wrote:

> Commit-ID:  863bffc80898b8df295ebac111af2335ec05f85d
> Gitweb:     http://git.kernel.org/tip/863bffc80898b8df295ebac111af2335ec05f85d
> Author:     Peter Zijlstra <peterz@infradead.org>
> AuthorDate: Wed, 28 Aug 2013 11:44:39 +0200
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Thu, 12 Sep 2013 19:14:43 +0200
> 
> sched/fair: Fix group power_orig computation
> 
> When looking at the code I noticed we don't actually compute
> sgp->power_orig correctly for groups, fix that.
> 
> Currently the only consumer of that value is fix_small_capacity()
> which is only used on POWER7+ and that code excludes this case by
> being limited to SD_SHARE_CPUPOWER which is only ever set on the SMT
> domain which must be the lowest domain and this has singleton groups.
> 
> So nothing should be affected by this change.
> 
> Cc: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Link: http://lkml.kernel.org/n/tip-db2pe0vxwunv37plc7onnugj@git.kernel.org
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

FWIW, this doesn't seem to break POWER7.  

Thanks!
Mikey

> ---
>  kernel/sched/fair.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f9f4385..baba313 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4450,7 +4450,7 @@ void update_group_power(struct sched_domain *sd, int cpu)
>  {
>  	struct sched_domain *child = sd->child;
>  	struct sched_group *group, *sdg = sd->groups;
> -	unsigned long power;
> +	unsigned long power, power_orig;
>  	unsigned long interval;
>  
>  	interval = msecs_to_jiffies(sd->balance_interval);
> @@ -4462,7 +4462,7 @@ void update_group_power(struct sched_domain *sd, int cpu)
>  		return;
>  	}
>  
> -	power = 0;
> +	power_orig = power = 0;
>  
>  	if (child->flags & SD_OVERLAP) {
>  		/*
> @@ -4470,8 +4470,12 @@ void update_group_power(struct sched_domain *sd, int cpu)
>  		 * span the current group.
>  		 */
>  
> -		for_each_cpu(cpu, sched_group_cpus(sdg))
> -			power += power_of(cpu);
> +		for_each_cpu(cpu, sched_group_cpus(sdg)) {
> +			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
> +
> +			power_orig += sg->sgp->power_orig;
> +			power += sg->sgp->power;
> +		}
>  	} else  {
>  		/*
>  		 * !SD_OVERLAP domains can assume that child groups
> @@ -4480,12 +4484,14 @@ void update_group_power(struct sched_domain *sd, int cpu)
>  
>  		group = child->groups;
>  		do {
> +			power_orig += group->sgp->power_orig;
>  			power += group->sgp->power;
>  			group = group->next;
>  		} while (group != child->groups);
>  	}
>  
> -	sdg->sgp->power_orig = sdg->sgp->power = power;
> +	sdg->sgp->power_orig = power_orig;
> +	sdg->sgp->power = power;
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/core] sched/fair: Fix group power_orig computation
  2013-09-12 18:05 [tip:sched/core] sched/fair: Fix group power_orig computation tip-bot for Peter Zijlstra
  2013-09-12 23:21 ` Michael Neuling
@ 2013-11-12 10:55 ` Srikar Dronamraju
  2013-11-12 11:57   ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Srikar Dronamraju @ 2013-11-12 10:55 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, peterz, tglx, mikey; +Cc: linux-tip-commits


With Commit-id 863bffc80898 (sched/fair: Fix group power_orig computation)
which is part of latest tip/master, numa machine may fail to boot both
on powerpc and x86_64.

On powerpc

[    0.710162] Unable to handle kernel paging request for data at address 0x00000010
[    0.710170] Faulting instruction address: 0xc0000000000f3db4
[    0.710177] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.710182] SMP NR_CPUS=1024 NUMA pSeries
[    0.710190] Modules linked in:
[    0.710199] CPU: 53 PID: 1 Comm: swapper/53 Not tainted 3.12.0-tip_master+ #1
[    0.710205] task: c000001713980000 ti: c000001713a00000 task.ti: c000001713a00000
[    0.710211] NIP: c0000000000f3db4 LR: c0000000000f3de0 CTR: 0000000000000000
[    0.710217] REGS: c000001713a036d0 TRAP: 0300   Not tainted  (3.12.0-tip_master+)
[    0.710223] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 48002044  XER: 20000003
[    0.710243] SOFTE: 1
[    0.710246] CFAR: c00000000000908c
[    0.710250] DAR: 0000000000000010, DSISR: 40000000
[    0.710254]
GPR00: c000000000e85e00 c000001713a03950 c0000000015a7a58 0000000000000030
GPR04: 0000000000000030 0000000000000000 0000000000000400 000000001dcd6500
GPR08: 0000000000000000 0000000000000000 0000000000000000 c00000000160c958
GPR12: 0000000000000000 c000000007edb980 c0000007e2e374a8 c000000001607a58
GPR16: c000001ef38f8f58 0000000000000180 c000000001770258 0000000000000001
GPR20: 000000000000005f c000001ef38f8f40 0000000000000001 c0000000014b9acf
GPR24: 0000000000000000 c000001712884a40 c000000001607a58 c000000000e87a58
GPR28: c00000000160fc54 c000001712884a58 0000000000000000 0000000000000000
[    0.710349] NIP [c0000000000f3db4] .update_group_power+0xd4/0x2e0
[    0.710355] LR [c0000000000f3de0] .update_group_power+0x100/0x2e0
[    0.710360] Call Trace:
[    0.710364] [c000001713a03950] [c0000000000f3d28] .update_group_power+0x48/0x2e0 (unreliable)
[    0.710375] [c000001713a03a00] [c0000000000ed73c] .build_sched_domains+0xadc/0xd90
[    0.710385] [c000001713a03b70] [c000000000bf39b0] .sched_init_smp+0x528/0x66c
[    0.710394] [c000001713a03ce0] [c000000000bd46a8] .kernel_init_freeable+0x200/0x398
[    0.710405] [c000001713a03db0] [c00000000000bc04] .kernel_init+0x24/0x140
[    0.710413] [c000001713a03e30] [c00000000000a16c] .ret_from_kernel_thread+0x5c/0x70
[    0.710419] Instruction dump:
[    0.710424] 3bb90018 3bc00000 3be00000 3860ffff 3f62ff8e 48000034 60000000 397a4f00
[    0.710439] 381be3a8 7d2b482a 7d204a14 e9290950 <e9290010> e9290010 81690004 80090008
[    0.710465] ---[ end trace b5091a0959b24fe3 ]---
[    0.710469]
[    2.710530] Kernel panic - not syncing: Fatal exception
[    2.710943] Rebooting in 10 seconds..

On x86_64

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-tip_v312+ #5
Hardware name: IBM System x3750 M4 -[8722C1A]-/00D1432, BIOS -[KOE116JUS-1.10]- 09/25/2012
task: ffff8810373a6040 ti: ffff8810373a8000 task.ti: ffff8810373a8000
RIP: 0010:[<ffffffff8108aa53>]  [<ffffffff8108aa53>] update_group_power+0xa3/0x130
RSP: 0000:ffff8810373a9db8  EFLAGS: 00010283
RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000040
RBP: ffff8810373a9de8 R08: ffff88203632f818 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 00000000001d3cc0 R14: ffff88203632f818 R15: ffff88203632f800
FS:  0000000000000000(0000) GS:ffff88103de00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000001a0b000 CR4: 00000000000407f0
Stack:
 ffff8810373a9dd8 ffff881035aa0c00 ffff88203632f800 ffff8820362fa800
 0000000000000008 0000000000000008 ffff8810373a9e58 ffffffff8108627b
 ffff8810373a9e18 ffffffff8129fcf8 00000000373a76e0 000000000000003f
Call Trace:
 [<ffffffff8108627b>] build_sched_domains+0x37b/0x3f0
 [<ffffffff8129fcf8>] ? alloc_cpumask_var_node+0x58/0x80
 [<ffffffff81d59783>] sched_init_smp+0x8f/0x13c
 [<ffffffff81d3db78>] kernel_init_freeable+0x27b/0x303
 [<ffffffff81593c6e>] ? kernel_init+0xe/0xf0
 [<ffffffff810a0ead>] ? trace_hardirqs_on_caller+0xfd/0x1c0
 [<ffffffff81593c60>] ? rest_init+0xd0/0xd0
 [<ffffffff81593c6e>] kernel_init+0xe/0xf0
 [<ffffffff815a9eac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81593c60>] ? rest_init+0xd0/0xd0
Code: ad 00 4d 8d 77 18 45 31 e4 31 db b8 ff ff ff ff eb 2d 66 0f 1f 44 00 00 48 63 d0 48 8b 14 d5 
c0 9d b4 81 49 8b 94 15 50 09 00 00 <48> 8b 52 10 48 8b 52 10 8b 4a 08 8b 52 04 49 01 cc 48 01 d3 8
3
RIP  [<ffffffff8108aa53>] update_group_power+0xa3/0x130
 RSP <ffff8810373a9db8>
CR2: 0000000000000010
---[ end trace cd8cb7fb261d7bea ]---
Kernel panic - not syncing: Fatal exception

This can be fixed by a simple check below.

-- 
Thanks and Regards
Srikar Dronamraju

-------->8---------------------------------------------
>From bfc5dced04472c6c499aa3c6773ddef42d83fefc Mon Sep 17 00:00:00 2001
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Date: Tue, 12 Nov 2013 03:05:31 -0500
Subject: [PATCH] sched: Check sched_domain before computing group power.

After Commit-id 863bffc80898 (sched/fair: Fix group power_orig
computation), we might end up computing group power before the
sched_domain for a cpu is updated.

Check for rq->sd before updating group power.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df77c60..f86f704 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5354,8 +5354,13 @@ void update_group_power(struct sched_domain *sd, int cpu)
 		 */
 
 		for_each_cpu(cpu, sched_group_cpus(sdg)) {
-			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
+			struct rq *rq = cpu_rq(cpu);
+			struct sched_group *sg;
 
+			if (!rq->sd)
+				continue;
+
+			sg = rq->sd->groups;
 			power_orig += sg->sgp->power_orig;
 			power += sg->sgp->power;
 		}
-- 
1.7.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/core] sched/fair: Fix group power_orig computation
  2013-11-12 10:55 ` Srikar Dronamraju
@ 2013-11-12 11:57   ` Peter Zijlstra
  2013-11-12 16:41     ` [PATCH v2] sched: Check sched_domain before computing group power Srikar Dronamraju
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-12 11:57 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

On Tue, Nov 12, 2013 at 04:25:47PM +0530, Srikar Dronamraju wrote:
> From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Date: Tue, 12 Nov 2013 03:05:31 -0500
> Subject: [PATCH] sched: Check sched_domain before computing group power.
> 
> After Commit-id 863bffc80898 (sched/fair: Fix group power_orig
> computation), we might end up computing group power before the
> sched_domain for a cpu is updated.
> 
> Check for rq->sd before updating group power.
> 
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Thanks! My head hurts a bit from going through the overlap init paths
but this does indeed revert to the previous behaviour.

I'm not entirely sure if either are fully correct, but given we
initialize the sgp->power to some 'reasonable' default we can rely on
runtime updates to correct any funnies.

> ---
>  kernel/sched/fair.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index df77c60..f86f704 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5354,8 +5354,13 @@ void update_group_power(struct sched_domain *sd, int cpu)
>  		 */
>  
>  		for_each_cpu(cpu, sched_group_cpus(sdg)) {
> -			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
> +			struct rq *rq = cpu_rq(cpu);
> +			struct sched_group *sg;
>  
> +			if (!rq->sd)
> +				continue;
> +
> +			sg = rq->sd->groups;
>  			power_orig += sg->sgp->power_orig;
>  			power += sg->sgp->power;
>  		}
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-12 11:57   ` Peter Zijlstra
@ 2013-11-12 16:41     ` Srikar Dronamraju
  2013-11-12 17:03       ` Peter Zijlstra
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Srikar Dronamraju @ 2013-11-12 16:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

After Commit-id 863bffc80898 (sched/fair: Fix group power_orig
computation), we might end up computing group power before the
sched_domain for a cpu is updated.

Update with cpu_power, if rq->sd is not yet updated.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog since v1: Fix divide by zero errors that can result because
power/power_orig was set to 0.

 kernel/sched/fair.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df77c60..8d92853 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5354,8 +5354,16 @@ void update_group_power(struct sched_domain *sd, int cpu)
 		 */
 
 		for_each_cpu(cpu, sched_group_cpus(sdg)) {
-			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
+			struct rq *rq = cpu_rq(cpu);
+			struct sched_group *sg;
 
+			if (!rq->sd) {
+				power_orig += power_of(cpu);
+				power += power_of(cpu);
+				continue;
+			}
+
+			sg = rq->sd->groups;
 			power_orig += sg->sgp->power_orig;
 			power += sg->sgp->power;
 		}
-- 
1.7.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-12 16:41     ` [PATCH v2] sched: Check sched_domain before computing group power Srikar Dronamraju
@ 2013-11-12 17:03       ` Peter Zijlstra
  2013-11-12 17:15         ` Srikar Dronamraju
       [not found]       ` <CAM4v1pNMn=5oZDiX3fUp9uPkZTPJgk=vEKEjevzvpwn=PjTzXg@mail.gmail.com>
  2013-11-13 15:17       ` Peter Zijlstra
  2 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-12 17:03 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

On Tue, Nov 12, 2013 at 10:11:26PM +0530, Srikar Dronamraju wrote:
> After Commit-id 863bffc80898 (sched/fair: Fix group power_orig
> computation), we might end up computing group power before the
> sched_domain for a cpu is updated.
> 
> Update with cpu_power, if rq->sd is not yet updated.
> 
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog since v1: Fix divide by zero errors that can result because
> power/power_orig was set to 0.

Hurm.. can you provide the actual topology of the machine that triggers
this? My brain hurts trying to thing through the weird cases of this
code.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-12 17:03       ` Peter Zijlstra
@ 2013-11-12 17:15         ` Srikar Dronamraju
  2013-11-12 17:55           ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Srikar Dronamraju @ 2013-11-12 17:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

> 
> Hurm.. can you provide the actual topology of the machine that triggers
> this? My brain hurts trying to thing through the weird cases of this
> code.
> 

Hope this helps. Please do let me know if you were looking for pdf output.

Machine (251GB)



   NUMANode P#0 (63GB)



   Socket P#0


     L3 (20MB)



     L2 (256KB)         L2 (256KB)         L2 (256KB)         L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)



     L1d (32KB)         L1d (32KB)         L1d (32KB)         L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)



     L1i (32KB)         L1i (32KB)         L1i (32KB)         L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)



     Core P#0           Core P#1           Core P#2           Core P#3     Core P#4     Core P#5     Core P#6     Core P#7


       PU P#0            PU P#1             PU P#2             PU P#3       PU P#4       PU P#5       PU P#6       PU P#7


       PU P#32           PU P#33            PU P#34            PU P#35      PU P#36      PU P#37      PU P#38      PU P#39




   NUMANode P#1 (63GB)



   Socket P#1


     L3 (20MB)



     L2 (256KB)         L2 (256KB)         L2 (256KB)         L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)



     L1d (32KB)         L1d (32KB)         L1d (32KB)         L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)



     L1i (32KB)         L1i (32KB)         L1i (32KB)         L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)



     Core P#0           Core P#1           Core P#2           Core P#3     Core P#4     Core P#5     Core P#6     Core P#7


       PU P#8            PU P#9             PU P#10            PU P#11      PU P#12      PU P#13      PU P#14      PU P#15


       PU P#40           PU P#41            PU P#42            PU P#43      PU P#44      PU P#45      PU P#46      PU P#47




   NUMANode P#2 (63GB)



   Socket P#2


     L3 (20MB)



     L2 (256KB)         L2 (256KB)         L2 (256KB)         L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)



     L1d (32KB)         L1d (32KB)         L1d (32KB)         L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)



     L1i (32KB)         L1i (32KB)         L1i (32KB)         L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)



     Core P#0           Core P#1           Core P#2           Core P#3     Core P#4     Core P#5     Core P#6     Core P#7


       PU P#16           PU P#17            PU P#18            PU P#19      PU P#20      PU P#21      PU P#22      PU P#23


       PU P#48           PU P#49            PU P#50            PU P#51      PU P#52      PU P#53      PU P#54      PU P#55




   NUMANode P#3 (62GB)



   Socket P#3


     L3 (20MB)



     L2 (256KB)         L2 (256KB)         L2 (256KB)         L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)   L2 (256KB)



     L1d (32KB)         L1d (32KB)         L1d (32KB)         L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)   L1d (32KB)



     L1i (32KB)         L1i (32KB)         L1i (32KB)         L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)   L1i (32KB)



     Core P#0           Core P#1           Core P#2           Core P#3     Core P#4     Core P#5     Core P#6     Core P#7


       PU P#24           PU P#25            PU P#26            PU P#27      PU P#28      PU P#29      PU P#30      PU P#31


       PU P#56           PU P#57            PU P#58            PU P#59      PU P#60      PU P#61      PU P#62      PU P#63




                    PCI 1000:005b


                        sda          sdb     sdc        sdd



                        sde          sdf     sdg        sdh




                    PCI 19a2:0710


                        eth0




                    PCI 19a2:0710


                        eth1




                    PCI 19a2:0710


                        eth2




                    PCI 19a2:0710


                        eth3




                                                      PCI 102b:0534



           PCI 8086:1d02



                    PCI 1000:0073



Host: kong.in.ibm.com

Indexes: physical

Date: Tuesday 12 November 2013 10:38:18 PM IST


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-12 17:15         ` Srikar Dronamraju
@ 2013-11-12 17:55           ` Peter Zijlstra
  2013-11-13  5:55             ` Srikar Dronamraju
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-12 17:55 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

On Tue, Nov 12, 2013 at 10:45:07PM +0530, Srikar Dronamraju wrote:
> > 
> > Hurm.. can you provide the actual topology of the machine that triggers
> > this? My brain hurts trying to thing through the weird cases of this
> > code.
> > 
> 
> Hope this helps. Please do let me know if you were looking for pdf output.

PDFs go into /dev/null..

the below misses the interesting bits; being the node distance table.
Also a complete sched_debug domain print is useful.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-12 17:55           ` Peter Zijlstra
@ 2013-11-13  5:55             ` Srikar Dronamraju
  0 siblings, 0 replies; 32+ messages in thread
From: Srikar Dronamraju @ 2013-11-13  5:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, hpa, linux-kernel, tglx, mikey

* Peter Zijlstra <peterz@infradead.org> [2013-11-12 18:55:54]:

> On Tue, Nov 12, 2013 at 10:45:07PM +0530, Srikar Dronamraju wrote:
> > > 
> > > Hurm.. can you provide the actual topology of the machine that triggers
> > > this? My brain hurts trying to thing through the weird cases of this
> > > code.
> > > 
> > 
> > Hope this helps. Please do let me know if you were looking for pdf output.
> 
> PDFs go into /dev/null..
> 
> the below misses the interesting bits; being the node distance table.
> Also a complete sched_debug domain print is useful.
> 

available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 64191 MB
node 0 free: 63194 MB
node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 1 size: 64481 MB
node 1 free: 63515 MB
node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 2 size: 64481 MB
node 2 free: 63536 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 63968 MB
node 3 free: 62981 MB
node distances:
node   0   1   2   3 
  0:  10  11  11  12 
  1:  11  10  12  11 
  2:  11  12  10  11 
  3:  12  11  11  10 




x86: Booting SMP configuration:
.... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7
.... node  #1, CPUs:    #8  #9 #10 #11 #12 #13 #14 #15
.... node  #2, CPUs:   #16 #17 #18 #19 #20 #21 #22 #23
.... node  #3, CPUs:   #24 #25 #26 #27 #28 #29 #30 #31
.... node  #0, CPUs:   #32 #33 #34 #35 #36 #37 #38 #39
.... node  #1, CPUs:   #40 #41 #42 #43 #44 #45 #46 #47
.... node  #2, CPUs:   #48 #49 #50 #51 #52 #53 #54 #55
.... node  #3, CPUs:   #56 #57 #58 #59 #60 #61 #62 #63
x86: Booted up 4 nodes, 64 CPUs
smpboot: Total of 64 processors activated (308393.92 BogoMIPS)
CPU0 attaching sched-domain:
 domain 0: span 0,32 level SIBLING
  groups: 0 (cpu_power = 588) 32 (cpu_power = 588)
  domain 1: span 0-7,32-39 level MC
   groups: 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU1 attaching sched-domain:
 domain 0: span 1,33 level SIBLING
  groups: 1 (cpu_power = 589) 33 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU2 attaching sched-domain:
 domain 0: span 2,34 level SIBLING
  groups: 2 (cpu_power = 589) 34 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU3 attaching sched-domain:
 domain 0: span 3,35 level SIBLING
  groups: 3 (cpu_power = 589) 35 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU4 attaching sched-domain:
 domain 0: span 4,36 level SIBLING
  groups: 4 (cpu_power = 589) 36 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU5 attaching sched-domain:
 domain 0: span 5,37 level SIBLING
  groups: 5 (cpu_power = 589) 37 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU6 attaching sched-domain:
 domain 0: span 6,38 level SIBLING
  groups: 6 (cpu_power = 589) 38 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU7 attaching sched-domain:
 domain 0: span 7,39 level SIBLING
  groups: 7 (cpu_power = 588) 39 (cpu_power = 588)
  domain 1: span 0-7,32-39 level MC
   groups: 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU8 attaching sched-domain:
 domain 0: span 8,40 level SIBLING
  groups: 8 (cpu_power = 588) 40 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU9 attaching sched-domain:
 domain 0: span 9,41 level SIBLING
  groups: 9 (cpu_power = 588) 41 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU10 attaching sched-domain:
 domain 0: span 10,42 level SIBLING
  groups: 10 (cpu_power = 588) 42 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU11 attaching sched-domain:
 domain 0: span 11,43 level SIBLING
  groups: 11 (cpu_power = 588) 43 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU12 attaching sched-domain:
 domain 0: span 12,44 level SIBLING
  groups: 12 (cpu_power = 588) 44 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU13 attaching sched-domain:
 domain 0: span 13,45 level SIBLING
  groups: 13 (cpu_power = 588) 45 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU14 attaching sched-domain:
 domain 0: span 14,46 level SIBLING
  groups: 14 (cpu_power = 588) 46 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU15 attaching sched-domain:
 domain 0: span 15,47 level SIBLING
  groups: 15 (cpu_power = 588) 47 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU16 attaching sched-domain:
 domain 0: span 16,48 level SIBLING
  groups: 16 (cpu_power = 588) 48 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU17 attaching sched-domain:
 domain 0: span 17,49 level SIBLING
  groups: 17 (cpu_power = 588) 49 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU18 attaching sched-domain:
 domain 0: span 18,50 level SIBLING
  groups: 18 (cpu_power = 588) 50 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU19 attaching sched-domain:
 domain 0: span 19,51 level SIBLING
  groups: 19 (cpu_power = 588) 51 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU20 attaching sched-domain:
 domain 0: span 20,52 level SIBLING
  groups: 20 (cpu_power = 588) 52 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU21 attaching sched-domain:
 domain 0: span 21,53 level SIBLING
  groups: 21 (cpu_power = 588) 53 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU22 attaching sched-domain:
 domain 0: span 22,54 level SIBLING
  groups: 22 (cpu_power = 588) 54 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU23 attaching sched-domain:
 domain 0: span 23,55 level SIBLING
  groups: 23 (cpu_power = 588) 55 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU24 attaching sched-domain:
 domain 0: span 24,56 level SIBLING
  groups: 24 (cpu_power = 588) 56 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU25 attaching sched-domain:
 domain 0: span 25,57 level SIBLING
  groups: 25 (cpu_power = 588) 57 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU26 attaching sched-domain:
 domain 0: span 26,58 level SIBLING
  groups: 26 (cpu_power = 588) 58 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU27 attaching sched-domain:
 domain 0: span 27,59 level SIBLING
  groups: 27 (cpu_power = 588) 59 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU28 attaching sched-domain:
 domain 0: span 28,60 level SIBLING
  groups: 28 (cpu_power = 588) 60 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU29 attaching sched-domain:
 domain 0: span 29,61 level SIBLING
  groups: 29 (cpu_power = 588) 61 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU30 attaching sched-domain:
 domain 0: span 30,62 level SIBLING
  groups: 30 (cpu_power = 588) 62 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU31 attaching sched-domain:
 domain 0: span 31,63 level SIBLING
  groups: 31 (cpu_power = 588) 63 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU32 attaching sched-domain:
 domain 0: span 0,32 level SIBLING
  groups: 32 (cpu_power = 588) 0 (cpu_power = 588)
  domain 1: span 0-7,32-39 level MC
   groups: 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU33 attaching sched-domain:
 domain 0: span 1,33 level SIBLING
  groups: 33 (cpu_power = 589) 1 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU34 attaching sched-domain:
 domain 0: span 2,34 level SIBLING
  groups: 34 (cpu_power = 589) 2 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU35 attaching sched-domain:
 domain 0: span 3,35 level SIBLING
  groups: 35 (cpu_power = 589) 3 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU36 attaching sched-domain:
 domain 0: span 4,36 level SIBLING
  groups: 36 (cpu_power = 589) 4 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU37 attaching sched-domain:
 domain 0: span 5,37 level SIBLING
  groups: 37 (cpu_power = 589) 5 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU38 attaching sched-domain:
 domain 0: span 6,38 level SIBLING
  groups: 38 (cpu_power = 589) 6 (cpu_power = 589)
  domain 1: span 0-7,32-39 level MC
   groups: 6,38 (cpu_power = 1178) 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU39 attaching sched-domain:
 domain 0: span 7,39 level SIBLING
  groups: 39 (cpu_power = 588) 7 (cpu_power = 588)
  domain 1: span 0-7,32-39 level MC
   groups: 7,39 (cpu_power = 1176) 0,32 (cpu_power = 1176) 1,33 (cpu_power = 1178) 2,34 (cpu_power = 1178) 3,35 (cpu_power = 1178) 4,36 (cpu_power = 1178) 5,37 (cpu_power = 1178) 6,38 (cpu_power = 1178)
   domain 2: span 0-23,32-55 level NUMA
    groups: 0-7,32-39 (cpu_power = 9420) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU40 attaching sched-domain:
 domain 0: span 8,40 level SIBLING
  groups: 40 (cpu_power = 588) 8 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU41 attaching sched-domain:
 domain 0: span 9,41 level SIBLING
  groups: 41 (cpu_power = 588) 9 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU42 attaching sched-domain:
 domain 0: span 10,42 level SIBLING
  groups: 42 (cpu_power = 588) 10 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU43 attaching sched-domain:
 domain 0: span 11,43 level SIBLING
  groups: 43 (cpu_power = 588) 11 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU44 attaching sched-domain:
 domain 0: span 12,44 level SIBLING
  groups: 44 (cpu_power = 588) 12 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU45 attaching sched-domain:
 domain 0: span 13,45 level SIBLING
  groups: 45 (cpu_power = 588) 13 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU46 attaching sched-domain:
 domain 0: span 14,46 level SIBLING
  groups: 46 (cpu_power = 588) 14 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 14,46 (cpu_power = 1176) 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU47 attaching sched-domain:
 domain 0: span 15,47 level SIBLING
  groups: 47 (cpu_power = 588) 15 (cpu_power = 588)
  domain 1: span 8-15,40-47 level MC
   groups: 15,47 (cpu_power = 1176) 8,40 (cpu_power = 1176) 9,41 (cpu_power = 1176) 10,42 (cpu_power = 1176) 11,43 (cpu_power = 1176) 12,44 (cpu_power = 1176) 13,45 (cpu_power = 1176) 14,46 (cpu_power = 1176)
   domain 2: span 0-15,24-47,56-63 level NUMA
    groups: 8-15,40-47 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU48 attaching sched-domain:
 domain 0: span 16,48 level SIBLING
  groups: 48 (cpu_power = 588) 16 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU49 attaching sched-domain:
 domain 0: span 17,49 level SIBLING
  groups: 49 (cpu_power = 588) 17 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU50 attaching sched-domain:
 domain 0: span 18,50 level SIBLING
  groups: 50 (cpu_power = 588) 18 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU51 attaching sched-domain:
 domain 0: span 19,51 level SIBLING
  groups: 51 (cpu_power = 588) 19 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU52 attaching sched-domain:
 domain 0: span 20,52 level SIBLING
  groups: 52 (cpu_power = 588) 20 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU53 attaching sched-domain:
 domain 0: span 21,53 level SIBLING
  groups: 53 (cpu_power = 588) 21 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU54 attaching sched-domain:
 domain 0: span 22,54 level SIBLING
  groups: 54 (cpu_power = 588) 22 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 22,54 (cpu_power = 1176) 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU55 attaching sched-domain:
 domain 0: span 23,55 level SIBLING
  groups: 55 (cpu_power = 588) 23 (cpu_power = 588)
  domain 1: span 16-23,48-55 level MC
   groups: 23,55 (cpu_power = 1176) 16,48 (cpu_power = 1176) 17,49 (cpu_power = 1176) 18,50 (cpu_power = 1176) 19,51 (cpu_power = 1176) 20,52 (cpu_power = 1176) 21,53 (cpu_power = 1176) 22,54 (cpu_power = 1176)
   domain 2: span 0-7,16-39,48-63 level NUMA
    groups: 16-23,48-55 (cpu_power = 9408) 24-31,56-63 (cpu_power = 9408) 0-7,32-39 (cpu_power = 9420)
    domain 3: span 0-63 level NUMA
     groups: 0-23,32-55 (cpu_power = 28236) 8-31,40-63 (cpu_power = 28224)
CPU56 attaching sched-domain:
 domain 0: span 24,56 level SIBLING
  groups: 56 (cpu_power = 588) 24 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU57 attaching sched-domain:
 domain 0: span 25,57 level SIBLING
  groups: 57 (cpu_power = 588) 25 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU58 attaching sched-domain:
 domain 0: span 26,58 level SIBLING
  groups: 58 (cpu_power = 588) 26 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU59 attaching sched-domain:
 domain 0: span 27,59 level SIBLING
  groups: 59 (cpu_power = 588) 27 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU60 attaching sched-domain:
 domain 0: span 28,60 level SIBLING
  groups: 60 (cpu_power = 588) 28 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU61 attaching sched-domain:
 domain 0: span 29,61 level SIBLING
  groups: 61 (cpu_power = 588) 29 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU62 attaching sched-domain:
 domain 0: span 30,62 level SIBLING
  groups: 62 (cpu_power = 588) 30 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 30,62 (cpu_power = 1176) 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
CPU63 attaching sched-domain:
 domain 0: span 31,63 level SIBLING
  groups: 63 (cpu_power = 588) 31 (cpu_power = 588)
  domain 1: span 24-31,56-63 level MC
   groups: 31,63 (cpu_power = 1176) 24,56 (cpu_power = 1176) 25,57 (cpu_power = 1176) 26,58 (cpu_power = 1176) 27,59 (cpu_power = 1176) 28,60 (cpu_power = 1176) 29,61 (cpu_power = 1176) 30,62 (cpu_power = 1176)
   domain 2: span 8-31,40-63 level NUMA
    groups: 24-31,56-63 (cpu_power = 9408) 8-15,40-47 (cpu_power = 9408) 16-23,48-55 (cpu_power = 9408)
    domain 3: span 0-63 level NUMA
     groups: 8-31,40-63 (cpu_power = 28224) 0-23,32-55 (cpu_power = 28236)
-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
       [not found]       ` <CAM4v1pNMn=5oZDiX3fUp9uPkZTPJgk=vEKEjevzvpwn=PjTzXg@mail.gmail.com>
@ 2013-11-13 11:23         ` Srikar Dronamraju
  2013-11-14  6:06           ` Preeti U Murthy
  0 siblings, 1 reply; 32+ messages in thread
From: Srikar Dronamraju @ 2013-11-13 11:23 UTC (permalink / raw)
  To: Preeti Murthy
  Cc: Peter Zijlstra, mingo, hpa, linux-kernel, Thomas Gleixner, mikey,
	linux-tip-commits, Preeti U Murthy

* Preeti Murthy <preeti.lkml@gmail.com> [2013-11-13 16:22:37]:

> Hi Srikar,
> 
> update_group_power() is called only during load balancing during
> update_sg_lb_stats().
> Load balancing begins at the base domain of the CPU,rq(cpu)->sd. This is
> checked for
> NULL. So how can update_group_power() be called in a scenario where the
> base domain
> of the CPU is not initialized? I say 'initialized' since you check for NULL
> on rq(cpu)->sd.
> 

update_group_power() also gets called from init_sched_groups_power().
And if you see the oops message, we know that the oops happens from that
path. In build_sched_domains(), we do cpu_attach_domain() what updates
rq->sd after the call to init_sched_groups_power(). So by the time
init_sched_groups_power() is called rq->sd is not yet initialized. 

We only hit oops case, when the sd->flags has SD_OVERLAP set.



> In the changelog, you say 'updated'. Are you saying that it has a stale
> value?

I said, "before the sched_domain for a cpu is updated", so its not yet
updated or has stale value. As I said earlier in this mail, the
initialization happens after we do a update_group_power().

> Please do elaborate on how you observed this.
> 

Does this clarify?

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-12 16:41     ` [PATCH v2] sched: Check sched_domain before computing group power Srikar Dronamraju
  2013-11-12 17:03       ` Peter Zijlstra
       [not found]       ` <CAM4v1pNMn=5oZDiX3fUp9uPkZTPJgk=vEKEjevzvpwn=PjTzXg@mail.gmail.com>
@ 2013-11-13 15:17       ` Peter Zijlstra
  2013-11-14 10:50         ` Srikar Dronamraju
  2013-11-19 19:15         ` [tip:sched/urgent] " tip-bot for Srikar Dronamraju
  2 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-13 15:17 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

On Tue, Nov 12, 2013 at 10:11:26PM +0530, Srikar Dronamraju wrote:
> After Commit-id 863bffc80898 (sched/fair: Fix group power_orig
> computation), we might end up computing group power before the
> sched_domain for a cpu is updated.
> 
> Update with cpu_power, if rq->sd is not yet updated.
> 
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog since v1: Fix divide by zero errors that can result because
> power/power_orig was set to 0.

Duh yes!, init_sched_groups_power() is called before we attach the
actual domains, so the above will _always_ fail for the
build_sched_domain() case.

I was a bit puzzled how we could ever have 0 since surely at least the
current cpu should have some !0 contribution, but no!

---
Subject: sched: Check sched_domain before computing group power
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Date: Tue, 12 Nov 2013 22:11:26 +0530

After Commit-id 863bffc80898 (sched/fair: Fix group power_orig
computation), we can dereference rq->sd before its set.

Fix this by falling back to power_of() in this case and add a comment
explaining things.

Cc: mikey@neuling.org
Cc: mingo@kernel.org
Cc: hpa@zytor.com
Cc: tglx@linutronix.de
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[peterz: added comment and tweaked patch]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131112164126.GF2559@linux.vnet.ibm.com
---
 kernel/sched/fair.c |   27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5354,10 +5354,31 @@ void update_group_power(struct sched_dom
 		 */
 
 		for_each_cpu(cpu, sched_group_cpus(sdg)) {
-			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
+			struct sched_group_power *sgp;
+			struct rq *rq = cpu_rq(cpu);
 
-			power_orig += sg->sgp->power_orig;
-			power += sg->sgp->power;
+			/*
+			 * build_sched_domains() -> init_sched_groups_power()
+			 * gets here before we've attached the domains to the
+			 * runqueues.
+			 *
+			 * Use power_of(), which is set irrespective of domains
+			 * in update_cpu_power().
+			 *
+			 * This avoids power/power_orig from being 0 and
+			 * causing divide-by-zero issues on boot.
+			 *
+			 * Runtime updates will correct power_orig.
+			 */
+			if (!rq->sd) {
+				power_orig += power_of(cpu);
+				power += power_of(cpu);
+				continue;
+			}
+
+			sgp = rq->sd->groups->sgp;
+			power_orig += sgp->power_orig;
+			power += sgp->power;
 		}
 	} else  {
 		/*

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-13 11:23         ` Srikar Dronamraju
@ 2013-11-14  6:06           ` Preeti U Murthy
  2013-11-14  8:30             ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Preeti U Murthy @ 2013-11-14  6:06 UTC (permalink / raw)
  To: Srikar Dronamraju, Peter Zijlstra
  Cc: Preeti Murthy, mingo, hpa, linux-kernel, Thomas Gleixner, mikey,
	linux-tip-commits

Hi,

On 11/13/2013 04:53 PM, Srikar Dronamraju wrote:
> * Preeti Murthy <preeti.lkml@gmail.com> [2013-11-13 16:22:37]:
> 
>> Hi Srikar,
>>
>> update_group_power() is called only during load balancing during
>> update_sg_lb_stats().
>> Load balancing begins at the base domain of the CPU,rq(cpu)->sd. This is
>> checked for
>> NULL. So how can update_group_power() be called in a scenario where the
>> base domain
>> of the CPU is not initialized? I say 'initialized' since you check for NULL
>> on rq(cpu)->sd.
>>
> 
> update_group_power() also gets called from init_sched_groups_power().
> And if you see the oops message, we know that the oops happens from that
> path. In build_sched_domains(), we do cpu_attach_domain() what updates
> rq->sd after the call to init_sched_groups_power(). So by the time
> init_sched_groups_power() is called rq->sd is not yet initialized. 
> 
> We only hit oops case, when the sd->flags has SD_OVERLAP set.
> 
> 
> 
>> In the changelog, you say 'updated'. Are you saying that it has a stale
>> value?
> 
> I said, "before the sched_domain for a cpu is updated", so its not yet
> updated or has stale value. As I said earlier in this mail, the
> initialization happens after we do a update_group_power().
> 
>> Please do elaborate on how you observed this.
>>
> 
> Does this clarify?

Yes it clarifies, thank you.

However I was thinking that a better fix would be to reorder the way we call
update_group_power() and cpu_attach_domain(). Why do we need to do
update_group_power() of the groups of the sched domains that would probably
degenerate in cpu_attach_domain()? So it seemed best to move update_group_power()
to after cpu_attach_domain() so that it saves unnecessary iterations over
sched domains which could degenerate, and it fixes the issue that you have brought out
as well. See below for the patch:

-------------------------------------------------------------------------------

sched: Update power of sched groups after sched domains have been attached to CPUs

From: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Avoid iterating unnecessarily over the sched domains which could potentially
degenerate,  while updating sched groups' power. This can be done by moving
the call to init_sched_groups_power() to after cpu_attach_domain(), when the
possibility of degenerating sched domains is examined and appropriately sched
domains are degenerated.

But claim_allocations() which does a NULL on the struct sd_data members for
each sched domain should iterate over all the initally built sched domains.
So move claim_allocations() to a loop where we build sched groups for each domain.
We would not require to reference sd_data after sched domains and sched
groups have been built.

Another use of this re-ordering is with reference to the commit
"sched/fair: Fix group power_orig computation". After this commit,
we end up dereferencing cpu_rq(cpu)->sd in update_group_power(). This would
lead to a NULL pointer since this parameter is updated after call to
update_group_power() in build_sched_domains() during initialization of sched
domains per CPU. The below change prevents this from occuring.

Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---
 kernel/sched/core.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e6a6244..d9703ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6211,17 +6211,7 @@ static int build_sched_domains(const struct cpumask *cpu_map,
 				if (build_sched_groups(sd, i))
 					goto error;
 			}
-		}
-	}
-
-	/* Calculate CPU power for physical packages and nodes */
-	for (i = nr_cpumask_bits-1; i >= 0; i--) {
-		if (!cpumask_test_cpu(i, cpu_map))
-			continue;
-
-		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
 			claim_allocations(i, sd);
-			init_sched_groups_power(i, sd);
 		}
 	}
 
@@ -6233,6 +6223,16 @@ static int build_sched_domains(const struct cpumask *cpu_map,
 	}
 	rcu_read_unlock();
 
+	/* Calculate CPU power for physical packages and nodes */
+	for (i = nr_cpumask_bits-1; i >= 0; i--) {
+		if (!cpumask_test_cpu(i, cpu_map))
+			continue;
+
+		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent)
+			init_sched_groups_power(i, sd);
+	}
+
+
 	ret = 0;
 error:
 	__free_domain_allocs(&d, alloc_state, cpu_map);

> 
Regards
Preeti U. Murthy


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-14  6:06           ` Preeti U Murthy
@ 2013-11-14  8:30             ` Peter Zijlstra
  2013-11-14  9:12               ` Preeti U Murthy
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-14  8:30 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Srikar Dronamraju, Preeti Murthy, mingo, hpa, linux-kernel,
	Thomas Gleixner, mikey, linux-tip-commits

On Thu, Nov 14, 2013 at 11:36:27AM +0530, Preeti U Murthy wrote:
> However I was thinking that a better fix would be to reorder the way we call
> update_group_power() and cpu_attach_domain(). Why do we need to do
> update_group_power() of the groups of the sched domains that would probably
> degenerate in cpu_attach_domain()? So it seemed best to move update_group_power()
> to after cpu_attach_domain() so that it saves unnecessary iterations over
> sched domains which could degenerate, and it fixes the issue that you have brought out
> as well. See below for the patch:

So how is publishing the domain tree before we've set these values at
all going to help avoid the divide-by-zero problem?

Also its just terribly bad form to publish something before you're done
with initialization.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-14  8:30             ` Peter Zijlstra
@ 2013-11-14  9:12               ` Preeti U Murthy
  0 siblings, 0 replies; 32+ messages in thread
From: Preeti U Murthy @ 2013-11-14  9:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Preeti Murthy, mingo, hpa, linux-kernel,
	Thomas Gleixner, mikey, linux-tip-commits

Hi Peter,

On 11/14/2013 02:00 PM, Peter Zijlstra wrote:
> On Thu, Nov 14, 2013 at 11:36:27AM +0530, Preeti U Murthy wrote:
>> However I was thinking that a better fix would be to reorder the way we call
>> update_group_power() and cpu_attach_domain(). Why do we need to do
>> update_group_power() of the groups of the sched domains that would probably
>> degenerate in cpu_attach_domain()? So it seemed best to move update_group_power()
>> to after cpu_attach_domain() so that it saves unnecessary iterations over
>> sched domains which could degenerate, and it fixes the issue that you have brought out
>> as well. See below for the patch:
> 
> So how is publishing the domain tree before we've set these values at
> all going to help avoid the divide-by-zero problem?

We are still doing initialization of cpu power and power_orig during
building of sched domains right? Except that it is being done after CPUs
have base domains attached to them.

But if you are talking about the check in sched_debug_one() on if
power_orig has been initialized, then yes, this patch fails. I am sorry
I overlooked the sched_debug() checks in cpu_attach_domain().
> 
> Also its just terribly bad form to publish something before you're done
> with initialization.

You are right. cpu_rq(cpu)->sd is going to be used by anyone intending
to iterate through the sched domains. By the time we publish this, every
parameter related to sched domains and groups need to be initialized.
The fact that sched_domain_debug() is going to be called by
cpu_attach_domain() to do one final sanity check on all the parameters
of the sched domains further emphasizes that we cannot have anything
un-initialised at this stage. So clearly my patch is incorrect.
Please add my Reviewed-by to Srikar's patch.

Thanks.

Regards
Preeti U. Murthy
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-13 15:17       ` Peter Zijlstra
@ 2013-11-14 10:50         ` Srikar Dronamraju
  2013-11-14 11:15           ` Peter Zijlstra
  2013-11-19 19:15         ` [tip:sched/urgent] " tip-bot for Srikar Dronamraju
  1 sibling, 1 reply; 32+ messages in thread
From: Srikar Dronamraju @ 2013-11-14 10:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

> +			/*
> +			 * build_sched_domains() -> init_sched_groups_power()
> +			 * gets here before we've attached the domains to the
> +			 * runqueues.
> +			 *
> +			 * Use power_of(), which is set irrespective of domains
> +			 * in update_cpu_power().
> +			 *
> +			 * This avoids power/power_orig from being 0 and
> +			 * causing divide-by-zero issues on boot.
> +			 *
> +			 * Runtime updates will correct power_orig.
> +			 */
> +			if (!rq->sd) {

Because this condition is only true during boot up, I am now
thinking if we should do mark this as unlikely i.e if (unlikely(!rq->sd)) {

Peter, 

Please do let me know if you agree and if you want to spin of a new
version?

> +				power_orig += power_of(cpu);
> +				power += power_of(cpu);
> +				continue;
> +			}
> +
> +			sgp = rq->sd->groups->sgp;
> +			power_orig += sgp->power_orig;
> +			power += sgp->power;
>  		}
>  	} else  {
>  		/*
> 

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] sched: Check sched_domain before computing group power.
  2013-11-14 10:50         ` Srikar Dronamraju
@ 2013-11-14 11:15           ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-14 11:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: mingo, hpa, linux-kernel, tglx, mikey, linux-tip-commits

On Thu, Nov 14, 2013 at 04:20:17PM +0530, Srikar Dronamraju wrote:
> > +			/*
> > +			 * build_sched_domains() -> init_sched_groups_power()
> > +			 * gets here before we've attached the domains to the
> > +			 * runqueues.
> > +			 *
> > +			 * Use power_of(), which is set irrespective of domains
> > +			 * in update_cpu_power().
> > +			 *
> > +			 * This avoids power/power_orig from being 0 and
> > +			 * causing divide-by-zero issues on boot.
> > +			 *
> > +			 * Runtime updates will correct power_orig.
> > +			 */
> > +			if (!rq->sd) {
> 
> Because this condition is only true during boot up, I am now
> thinking if we should do mark this as unlikely i.e if (unlikely(!rq->sd)) {

Makes sense, edited the patch.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-13 15:17       ` Peter Zijlstra
  2013-11-14 10:50         ` Srikar Dronamraju
@ 2013-11-19 19:15         ` tip-bot for Srikar Dronamraju
  2013-11-19 23:36           ` Yinghai Lu
  2013-11-28  2:57           ` David Rientjes
  1 sibling, 2 replies; 32+ messages in thread
From: tip-bot for Srikar Dronamraju @ 2013-11-19 19:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, srikar, tglx

Commit-ID:  9abf24d465180f5f2eb26a43545348262f16b771
Gitweb:     http://git.kernel.org/tip/9abf24d465180f5f2eb26a43545348262f16b771
Author:     Srikar Dronamraju <srikar@linux.vnet.ibm.com>
AuthorDate: Tue, 12 Nov 2013 22:11:26 +0530
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 19 Nov 2013 17:01:15 +0100

sched: Check sched_domain before computing group power

After commit 863bffc80898 ("sched/fair: Fix group power_orig
computation"), we can dereference rq->sd before it is set.

Fix this by falling back to power_of() in this case and add a comment
explaining things.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[ Added comment and tweaked patch. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mikey@neuling.org
Link: http://lkml.kernel.org/r/20131113151718.GN21461@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8b652e..fd773ad 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5379,10 +5379,31 @@ void update_group_power(struct sched_domain *sd, int cpu)
 		 */
 
 		for_each_cpu(cpu, sched_group_cpus(sdg)) {
-			struct sched_group *sg = cpu_rq(cpu)->sd->groups;
+			struct sched_group_power *sgp;
+			struct rq *rq = cpu_rq(cpu);
 
-			power_orig += sg->sgp->power_orig;
-			power += sg->sgp->power;
+			/*
+			 * build_sched_domains() -> init_sched_groups_power()
+			 * gets here before we've attached the domains to the
+			 * runqueues.
+			 *
+			 * Use power_of(), which is set irrespective of domains
+			 * in update_cpu_power().
+			 *
+			 * This avoids power/power_orig from being 0 and
+			 * causing divide-by-zero issues on boot.
+			 *
+			 * Runtime updates will correct power_orig.
+			 */
+			if (unlikely(!rq->sd)) {
+				power_orig += power_of(cpu);
+				power += power_of(cpu);
+				continue;
+			}
+
+			sgp = rq->sd->groups->sgp;
+			power_orig += sgp->power_orig;
+			power += sgp->power;
 		}
 	} else  {
 		/*

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-19 19:15         ` [tip:sched/urgent] " tip-bot for Srikar Dronamraju
@ 2013-11-19 23:36           ` Yinghai Lu
  2013-11-21 15:03             ` Peter Zijlstra
  2013-11-28  2:57           ` David Rientjes
  1 sibling, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2013-11-19 23:36 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Peter Zijlstra, Thomas Gleixner
  Cc: linux-tip-commits

On Tue, Nov 19, 2013 at 11:15 AM, tip-bot for Srikar Dronamraju
<tipbot@zytor.com> wrote:
> Commit-ID:  9abf24d465180f5f2eb26a43545348262f16b771
> Gitweb:     http://git.kernel.org/tip/9abf24d465180f5f2eb26a43545348262f16b771
> Author:     Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> AuthorDate: Tue, 12 Nov 2013 22:11:26 +0530
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 19 Nov 2013 17:01:15 +0100
>
> sched: Check sched_domain before computing group power
>
> After commit 863bffc80898 ("sched/fair: Fix group power_orig
> computation"), we can dereference rq->sd before it is set.
>
> Fix this by falling back to power_of() in this case and add a comment
> explaining things.
>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> [ Added comment and tweaked patch. ]
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: mikey@neuling.org
> Link: http://lkml.kernel.org/r/20131113151718.GN21461@twins.programming.kicks-ass.net
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/fair.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e8b652e..fd773ad 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5379,10 +5379,31 @@ void update_group_power(struct sched_domain *sd, int cpu)
>                  */
>
>                 for_each_cpu(cpu, sched_group_cpus(sdg)) {
> -                       struct sched_group *sg = cpu_rq(cpu)->sd->groups;
> +                       struct sched_group_power *sgp;
> +                       struct rq *rq = cpu_rq(cpu);
>
> -                       power_orig += sg->sgp->power_orig;
> -                       power += sg->sgp->power;
> +                       /*
> +                        * build_sched_domains() -> init_sched_groups_power()
> +                        * gets here before we've attached the domains to the
> +                        * runqueues.
> +                        *
> +                        * Use power_of(), which is set irrespective of domains
> +                        * in update_cpu_power().
> +                        *
> +                        * This avoids power/power_orig from being 0 and
> +                        * causing divide-by-zero issues on boot.
> +                        *
> +                        * Runtime updates will correct power_orig.
> +                        */
> +                       if (unlikely(!rq->sd)) {
> +                               power_orig += power_of(cpu);
> +                               power += power_of(cpu);
> +                               continue;
> +                       }
> +
> +                       sgp = rq->sd->groups->sgp;
> +                       power_orig += sgp->power_orig;
> +                       power += sgp->power;
>                 }
>         } else  {
>                 /*

This one seems fix NULL reference in compute_group_power.

but get following on current Linus tree plus tip/sched/urgent.

divide error: 0000 [#1]  SMP
[   28.190477] Modules linked in:
[   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
3.12.0-yh-10487-g4b94e59-dirty #2044
[   28.210488] Hardware name: Oracle Corporation  Sun Fire
[   28.229877] task: ffff88ff25205140 ti: ffff88ff2520a000 task.ti:
ffff88ff2520a000
[   28.236139] RIP: 0010:[<ffffffff810d9ff4>]  [<ffffffff810d9ff4>]
find_busiest_group+0x2b4/0x8a0
[   28.252075] RSP: 0000:ffff88ff2520b9a8  EFLAGS: 00010046
[   28.269591] RAX: 0000000000013fff RBX: 00000000ffffffff RCX: 00000000000000a0
[   28.272977] RDX: 0000000000000000 RSI: 0000000000014000 RDI: 0000000000000050
[   28.291968] RBP: ffff88ff2520bb08 R08: 00000000000003b6 R09: 0000000000000000
[   28.309327] R10: 0000000000000000 R11: 0000000000000002 R12: ffff88ff2520ba90
[   28.314222] R13: ffff887f2491c000 R14: 0000000000014000 R15: ffff88ff2520bba0
[   28.331408] FS:  0000000000000000(0000) GS:ffff887f7d800000(0000)
knlGS:0000000000000000
[   28.349333] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.351524] CR2: 0000000000000168 CR3: 0000000002c14000 CR4: 00000000000007e0
[   28.370245] Stack:
[   28.371466]  000000002520b9b8 000000000000000b 0000000000000048
0000000000000000
[   28.389951]  ffff887f2491c018 ffff88ff2520ba20 000000000000003c
00000000000002b6
[   28.395085]  00000000000002b6 00000000000002b6 0000000000002df0
0000000100000001
[   28.412079] Call Trace:
[   28.413617]  [<ffffffff810da798>] load_balance+0x1b8/0x8c0
[   28.429692]  [<ffffffff810ec67b>] ? __lock_acquire+0xadb/0xce0
[   28.433037]  [<ffffffff810db3c1>] idle_balance+0x101/0x1c0
[   28.450328]  [<ffffffff810db304>] ? idle_balance+0x44/0x1c0
[   28.453420]  [<ffffffff8214a13b>] __schedule+0x2cb/0xa10
[   28.469847]  [<ffffffff810e66d8>] ? trace_hardirqs_off_caller+0x28/0x160
[   28.473782]  [<ffffffff810e681d>] ? trace_hardirqs_off+0xd/0x10
[   28.490654]  [<ffffffff810d1c14>] ? local_clock+0x34/0x60
[   28.493723]  [<ffffffff810b837b>] ? worker_thread+0x2db/0x370
[   28.510363]  [<ffffffff8214f450>] ? _raw_spin_unlock_irq+0x30/0x40
[   28.514002]  [<ffffffff8214a935>] schedule+0x65/0x70
[   28.530380]  [<ffffffff810b8380>] worker_thread+0x2e0/0x370
[   28.533450]  [<ffffffff810ea19d>] ? trace_hardirqs_on+0xd/0x10
[   28.550976]  [<ffffffff810b80a0>] ? manage_workers.isra.17+0x330/0x330
[   28.554356]  [<ffffffff810bf598>] kthread+0x108/0x110
[   28.571857]  [<ffffffff810bf490>] ? __init_kthread_worker+0x70/0x70
[   28.588961]  [<ffffffff82157cec>] ret_from_fork+0x7c/0xb0
[   28.592017]  [<ffffffff810bf490>] ? __init_kthread_worker+0x70/0x70
[   28.609904] Code: 89 85 b8 fe ff ff 49 8b 45 10 41 8b 7d 0c 44 8b
50 08 44 8b 70 04 89 f8 48 c1 e0 0a 45 89 d1 49 8d 44 01 ff 48 89 c2
48 c1 fa 3f <49> f7 f9 31 d2 49 89 c1 89 f8 44 89 f7 41 f7 f1 48 81 c7
00 02
[   28.641210] RIP  [<ffffffff810d9ff4>] find_busiest_group+0x2b4/0x8a0
[   28.650476]  RSP <ffff88ff2520b9a8>
[   28.651754] divide error: 0000 [#2] [   28.651762] ---[ end trace
bcaaa28065586d41 ]---

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-19 23:36           ` Yinghai Lu
@ 2013-11-21 15:03             ` Peter Zijlstra
  2013-11-21 17:22               ` Yinghai Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-21 15:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Thomas Gleixner, linux-tip-commits

On Tue, Nov 19, 2013 at 03:36:12PM -0800, Yinghai Lu wrote:
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5379,10 +5379,31 @@ void update_group_power(struct sched_domain *sd, int cpu)
> >                  */
> >
> >                 for_each_cpu(cpu, sched_group_cpus(sdg)) {
> > -                       struct sched_group *sg = cpu_rq(cpu)->sd->groups;
> > +                       struct sched_group_power *sgp;
> > +                       struct rq *rq = cpu_rq(cpu);
> >
> > -                       power_orig += sg->sgp->power_orig;
> > -                       power += sg->sgp->power;
> > +                       /*
> > +                        * build_sched_domains() -> init_sched_groups_power()
> > +                        * gets here before we've attached the domains to the
> > +                        * runqueues.
> > +                        *
> > +                        * Use power_of(), which is set irrespective of domains
> > +                        * in update_cpu_power().
> > +                        *
> > +                        * This avoids power/power_orig from being 0 and
> > +                        * causing divide-by-zero issues on boot.
> > +                        *
> > +                        * Runtime updates will correct power_orig.
> > +                        */
> > +                       if (unlikely(!rq->sd)) {
> > +                               power_orig += power_of(cpu);
> > +                               power += power_of(cpu);
> > +                               continue;
> > +                       }
> > +
> > +                       sgp = rq->sd->groups->sgp;
> > +                       power_orig += sgp->power_orig;
> > +                       power += sgp->power;
> >                 }
> >         } else  {
> >                 /*
> 
> This one seems fix NULL reference in compute_group_power.
> 
> but get following on current Linus tree plus tip/sched/urgent.
> 
> divide error: 0000 [#1]  SMP
> [   28.190477] Modules linked in:
> [   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
> 3.12.0-yh-10487-g4b94e59-dirty #2044
> [   28.210488] Hardware name: Oracle Corporation  Sun Fire
> [   28.229877] task: ffff88ff25205140 ti: ffff88ff2520a000 task.ti:
> ffff88ff2520a000
> [   28.236139] RIP: 0010:[<ffffffff810d9ff4>]  [<ffffffff810d9ff4>]
> find_busiest_group+0x2b4/0x8a0

Hurmph.. what kind of hardware is that? and is there anything funny you
do to make it do this?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-21 15:03             ` Peter Zijlstra
@ 2013-11-21 17:22               ` Yinghai Lu
  2013-11-21 22:03                 ` Yinghai Lu
  2013-11-22 12:07                 ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Yinghai Lu @ 2013-11-21 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Thomas Gleixner, linux-tip-commits

On Thu, Nov 21, 2013 at 7:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> This one seems fix NULL reference in compute_group_power.
>>
>> but get following on current Linus tree plus tip/sched/urgent.
>>
>> divide error: 0000 [#1]  SMP
>> [   28.190477] Modules linked in:
>> [   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
>> 3.12.0-yh-10487-g4b94e59-dirty #2044
>> [   28.210488] Hardware name: Oracle Corporation  Sun Fire
>> [   28.229877] task: ffff88ff25205140 ti: ffff88ff2520a000 task.ti:
>> ffff88ff2520a000
>> [   28.236139] RIP: 0010:[<ffffffff810d9ff4>]  [<ffffffff810d9ff4>]
>> find_busiest_group+0x2b4/0x8a0
>
> Hurmph.. what kind of hardware is that? and is there anything funny you
> do to make it do this?

intel nehanem-ex or westmere-ex 8 sockets system.

I tried without my local patches, the problem is still there.

Yinghai

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-21 17:22               ` Yinghai Lu
@ 2013-11-21 22:03                 ` Yinghai Lu
  2013-11-28  3:02                   ` David Rientjes
  2013-11-22 12:07                 ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2013-11-21 22:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Thomas Gleixner, linux-tip-commits

On Thu, Nov 21, 2013 at 9:22 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Nov 21, 2013 at 7:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> This one seems fix NULL reference in compute_group_power.
>>>
>>> but get following on current Linus tree plus tip/sched/urgent.
>>>
>>> divide error: 0000 [#1]  SMP
>>> [   28.190477] Modules linked in:
>>> [   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
>>> 3.12.0-yh-10487-g4b94e59-dirty #2044
>>> [   28.210488] Hardware name: Oracle Corporation  Sun Fire
>>> [   28.229877] task: ffff88ff25205140 ti: ffff88ff2520a000 task.ti:
>>> ffff88ff2520a000
>>> [   28.236139] RIP: 0010:[<ffffffff810d9ff4>]  [<ffffffff810d9ff4>]
>>> find_busiest_group+0x2b4/0x8a0
>>
>> Hurmph.. what kind of hardware is that? and is there anything funny you
>> do to make it do this?
>
> intel nehanem-ex or westmere-ex 8 sockets system.
>
> I tried without my local patches, the problem is still there.

original one in linus's tree:

[    8.952728] NMI watchdog: enabled on all CPUs, permanently consumes
one hw-PMU counter.
[    8.965697] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000010
[    8.969495] IP: [<ffffffff810d7b53>] update_group_power+0x1d3/0x250
[    8.987159] PGD 0
[    8.989280] Oops: 0000 [#1] SMP
[    8.991686] Modules linked in:
[    8.993803] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.12.0-yh-02845-g527d151 #2048
[    9.009175] Hardware name: Oracle Corporation  Sun Fire X4800 M2 /
   , BIOS 15013200    04/19/2012
[    9.028433] task: ffff883f24e28000 ti: ffff883f24e24000 task.ti:
ffff883f24e24000
[    9.033249] RIP: 0010:[<ffffffff810d7b53>]  [<ffffffff810d7b53>]
update_group_power+0x1d3/0x250
[    9.051193] RSP: 0000:ffff883f24e25d68  EFLAGS: 00010283
[    9.068162] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[    9.071838] RDX: 0000000000000000 RSI: 00000000000000a0 RDI: 00000000000000a0
[    9.090260] RBP: ffff883f24e25d98 R08: ffff88ffc4891020 R09: 0000000000000000
[    9.107870] R10: ffff88ffc4890818 R11: 0000000000000001 R12: 00000000001d40c0
[    9.111527] R13: ffff88ffc4891018 R14: ffff88ffc4891000 R15: 0000000000000000
[    9.131279] FS:  0000000000000000(0000) GS:ffff883f7d600000(0000)
knlGS:0000000000000000
[    9.148870] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    9.151914] CR2: 0000000000000010 CR3: 0000000002c14000 CR4: 00000000000007f0
[    9.168645] Stack:
[    9.169871]  ffff883f24e25d88 ffff88ffc4891000 ffff88ffc4870000
0000000000000001
[    9.188660]  0000000000000001 ffff883f23b0d400 ffff883f24e25e58
ffffffff810ce094
[    9.193232]  ffff883f24e25dd8 0000000000000246 0000000000000003
ffffffff000000a0
[    9.210992] Call Trace:
[    9.212524]  [<ffffffff810ce094>] build_sched_domains+0x6f4/0x980
[    9.229900]  [<ffffffff83042a2b>] sched_init_smp+0x95/0x146
[    9.233236]  [<ffffffff83023023>] kernel_init_freeable+0x148/0x259
[    9.250019]  [<ffffffff82121bce>] ? kernel_init+0xe/0x130
[    9.253356]  [<ffffffff82121bc0>] ? rest_init+0xd0/0xd0
[    9.268882]  [<ffffffff82121bce>] kernel_init+0xe/0x130
[    9.271661]  [<ffffffff8215176c>] ret_from_fork+0x7c/0xb0
[    9.288882]  [<ffffffff82121bc0>] ? rest_init+0xd0/0xd0
[    9.292476] Code: ff 31 db b8 ff ff ff ff 4d 8d 6e 18 eb 31 66 2e
0f 1f 84 00 00 00 00 00 48 63 d0 48 8b 14 d5 40 c4 e2 82 49 8b 94 14
08 09 00 00 <48> 8b 52 10 48 8b 52 10 8b 4a 08 8b 52 04 49 01 cf 48 01
d3 83
[    9.335669] RIP  [<ffffffff810d7b53>] update_group_power+0x1d3/0x250
[    9.348090]  RSP <ffff883f24e25d68>
[    9.350240] CR2: 0000000000000010
[    9.351803] ---[ end trace a21cca9ad6b48d40 ]---
[    9.367839] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000009
[    9.367839]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-21 17:22               ` Yinghai Lu
  2013-11-21 22:03                 ` Yinghai Lu
@ 2013-11-22 12:07                 ` Peter Zijlstra
  2013-11-23  5:00                   ` Yinghai Lu
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-22 12:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Thomas Gleixner, linux-tip-commits

On Thu, Nov 21, 2013 at 09:22:24AM -0800, Yinghai Lu wrote:
> On Thu, Nov 21, 2013 at 7:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> This one seems fix NULL reference in compute_group_power.
> >>
> >> but get following on current Linus tree plus tip/sched/urgent.
> >>
> >> divide error: 0000 [#1]  SMP
> >> [   28.190477] Modules linked in:
> >> [   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
> >> 3.12.0-yh-10487-g4b94e59-dirty #2044
> >> [   28.210488] Hardware name: Oracle Corporation  Sun Fire
> >> [   28.229877] task: ffff88ff25205140 ti: ffff88ff2520a000 task.ti:
> >> ffff88ff2520a000
> >> [   28.236139] RIP: 0010:[<ffffffff810d9ff4>]  [<ffffffff810d9ff4>]
> >> find_busiest_group+0x2b4/0x8a0
> >
> > Hurmph.. what kind of hardware is that? and is there anything funny you
> > do to make it do this?
> 
> intel nehanem-ex or westmere-ex 8 sockets system.
> 
> I tried without my local patches, the problem is still there.

And I suppose a kernel before

  863bffc80898 ("sched/fair: Fix group power_orig computation")

work fine, eh?

I'll further assume that your RIP points to:

	sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;

indicating that sds.total_pwr := 0.

update_sd_lb_stats() computes it like:

		sds->total_pwr += sgs->group_power;

which comes out of update_sg_lb_stats() like:

	sgs->group_power = group->sgp->power;

Which we compute in update_group_power() similarly to how we did before
863bffc80898.

Which leaves me a bit puzzled.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-22 12:07                 ` Peter Zijlstra
@ 2013-11-23  5:00                   ` Yinghai Lu
  2013-11-23 18:53                     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2013-11-23  5:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Thomas Gleixner, linux-tip-commits

On Fri, Nov 22, 2013 at 4:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Nov 21, 2013 at 09:22:24AM -0800, Yinghai Lu wrote:
>> On Thu, Nov 21, 2013 at 7:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >>
>> >> This one seems fix NULL reference in compute_group_power.
>> >>
>> >> but get following on current Linus tree plus tip/sched/urgent.
>> >>
>> >> divide error: 0000 [#1]  SMP
>> >> [   28.190477] Modules linked in:
>> >> [   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
>> >> 3.12.0-yh-10487-g4b94e59-dirty #2044
>> >> [   28.210488] Hardware name: Oracle Corporation  Sun Fire
>> >> [   28.229877] task: ffff88ff25205140 ti: ffff88ff2520a000 task.ti:
>> >> ffff88ff2520a000
>> >> [   28.236139] RIP: 0010:[<ffffffff810d9ff4>]  [<ffffffff810d9ff4>]
>> >> find_busiest_group+0x2b4/0x8a0
>> >
>> > Hurmph.. what kind of hardware is that? and is there anything funny you
>> > do to make it do this?
>>
>> intel nehanem-ex or westmere-ex 8 sockets system.
>>
>> I tried without my local patches, the problem is still there.
>
> And I suppose a kernel before
>
>   863bffc80898 ("sched/fair: Fix group power_orig computation")
>
> work fine, eh?
>
> I'll further assume that your RIP points to:
>
>         sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
>
> indicating that sds.total_pwr := 0.
>
> update_sd_lb_stats() computes it like:
>
>                 sds->total_pwr += sgs->group_power;
>
> which comes out of update_sg_lb_stats() like:
>
>         sgs->group_power = group->sgp->power;
>
> Which we compute in update_group_power() similarly to how we did before
> 863bffc80898.
>
> Which leaves me a bit puzzled.

Hi,
for linus tree i need to revert commit-863bffc.
   commit-863bffc

for linus tree + sched/urgent, I need to revert
   commit-42eb088
   commit-9abf24d
   commit-863bffc
.
If only revert commit-42eb088, still have problem.
if only revert  commit-9abf24d, commit-863bffc, still have problem.

Assume you need to dump sched/urgent,
and revert commit-863bffc directly from Linus's tree.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-23  5:00                   ` Yinghai Lu
@ 2013-11-23 18:53                     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-23 18:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Linus Torvalds, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Fri, Nov 22, 2013 at 09:00:54PM -0800, Yinghai Lu wrote:
> On Fri, Nov 22, 2013 at 4:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > And I suppose a kernel before
> >
> >   863bffc80898 ("sched/fair: Fix group power_orig computation")
> >
> > work fine, eh?
> >
> > I'll further assume that your RIP points to:
> >
> >         sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
> >
> > indicating that sds.total_pwr := 0.
> >
> > update_sd_lb_stats() computes it like:
> >
> >                 sds->total_pwr += sgs->group_power;
> >
> > which comes out of update_sg_lb_stats() like:
> >
> >         sgs->group_power = group->sgp->power;
> >
> > Which we compute in update_group_power() similarly to how we did before
> > 863bffc80898.
> >
> > Which leaves me a bit puzzled.
> 
> Hi,
> for linus tree i need to revert commit-863bffc.
>    commit-863bffc
> 
> for linus tree + sched/urgent, I need to revert
>    commit-42eb088
>    commit-9abf24d
>    commit-863bffc
> .
> If only revert commit-42eb088, still have problem.
> if only revert  commit-9abf24d, commit-863bffc, still have problem.
> 
> Assume you need to dump sched/urgent,
> and revert commit-863bffc directly from Linus's tree.

That doesn't answer any of the questions above and only raises more
questions.

I also will not revert until a little later, I really need to understand
this. My wsm-ep system boots just fine, so there's something funny
somewhere.

I also cannot see the difference between 863bffc^1 and 9abf24d.

Also, you mentioning 42eb088 is new; what does that have to do with
anything? You cannot revert that without also reverting 37dc6b50cee9,
but you don't mention that commit at all.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched:  Check sched_domain before computing group power
  2013-11-19 19:15         ` [tip:sched/urgent] " tip-bot for Srikar Dronamraju
  2013-11-19 23:36           ` Yinghai Lu
@ 2013-11-28  2:57           ` David Rientjes
  1 sibling, 0 replies; 32+ messages in thread
From: David Rientjes @ 2013-11-28  2:57 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, srikar, peterz, tglx; +Cc: linux-tip-commits

On Tue, 19 Nov 2013, tip-bot for Srikar Dronamraju wrote:

> Commit-ID:  9abf24d465180f5f2eb26a43545348262f16b771
> Gitweb:     http://git.kernel.org/tip/9abf24d465180f5f2eb26a43545348262f16b771
> Author:     Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> AuthorDate: Tue, 12 Nov 2013 22:11:26 +0530
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 19 Nov 2013 17:01:15 +0100
> 
> sched: Check sched_domain before computing group power
> 
> After commit 863bffc80898 ("sched/fair: Fix group power_orig
> computation"), we can dereference rq->sd before it is set.
> 
> Fix this by falling back to power_of() in this case and add a comment
> explaining things.
> 
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> [ Added comment and tweaked patch. ]
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: mikey@neuling.org
> Link: http://lkml.kernel.org/r/20131113151718.GN21461@twins.programming.kicks-ass.net
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

Acked-by: David Rientjes <rientjes@google.com>

Fixes a boot failure for me, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-21 22:03                 ` Yinghai Lu
@ 2013-11-28  3:02                   ` David Rientjes
  2013-11-28  7:07                     ` Yinghai Lu
  0 siblings, 1 reply; 32+ messages in thread
From: David Rientjes @ 2013-11-28  3:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Thu, 21 Nov 2013, Yinghai Lu wrote:

> original one in linus's tree:
> 
> [    8.952728] NMI watchdog: enabled on all CPUs, permanently consumes
> one hw-PMU counter.
> [    8.965697] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000010
> [    8.969495] IP: [<ffffffff810d7b53>] update_group_power+0x1d3/0x250

This should have been fixed by Srikar's patch, no?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-28  3:02                   ` David Rientjes
@ 2013-11-28  7:07                     ` Yinghai Lu
  2013-11-28  9:38                       ` Peter Zijlstra
  2013-12-06  6:24                       ` Yinghai Lu
  0 siblings, 2 replies; 32+ messages in thread
From: Yinghai Lu @ 2013-11-28  7:07 UTC (permalink / raw)
  To: David Rientjes
  Cc: Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Wed, Nov 27, 2013 at 7:02 PM, David Rientjes <rientjes@google.com> wrote:
> On Thu, 21 Nov 2013, Yinghai Lu wrote:
>
>> original one in linus's tree:
>>
>> [    8.952728] NMI watchdog: enabled on all CPUs, permanently consumes
>> one hw-PMU counter.
>> [    8.965697] BUG: unable to handle kernel NULL pointer dereference
>> at 0000000000000010
>> [    8.969495] IP: [<ffffffff810d7b53>] update_group_power+0x1d3/0x250
>
> This should have been fixed by Srikar's patch, no?

maybe not related, now in another system, linus's tree + Srikar's patch.

got

[   33.546361] divide error: 0000 [#1]
SMP
[   33.589436] Modules linked in:
[   33.592869] CPU: 15 PID: 567 Comm: kworker/u482:0 Not tainted
3.13.0-rc1-yh-00324-gcf1be1c-dirty #10
[   33.603075] Hardware name: Oracle Corporation
[   33.609571] calling  ipc_ns_init+0x0/0x14 @ 1
[   33.609575] initcall ipc_ns_init+0x0/0x14 returned 0 after 0 usecs
[   33.609577] calling  init_mmap_min_addr+0x0/0x16 @ 1
[   33.609579] initcall init_mmap_min_addr+0x0/0x16 returned 0 after 0 usecs
[   33.609583] calling  init_cpufreq_transition_notifier_list+0x0/0x1b @ 1
[   33.609621] initcall init_cpufreq_transition_notifier_list+0x0/0x1b
returned 0 after 0 usecs
[   33.609624] calling  net_ns_init+0x0/0xfa @ 1
[   33.677194] task: ffff897c5ba5c8c0 ti: ffff897c5ba8e000 task.ti:
ffff897c5ba8e000
[   33.685558] RIP: 0010:[<ffffffff810dbf2c>]  [<ffffffff810dbf2c>]
find_busiest_group+0x2ac/0x880
[   33.695310] RSP: 0000:ffff897c5ba8f9a8  EFLAGS: 00010046
[   33.701253] RAX: 000000000001dfff RBX: 00000000ffffffff RCX: 000000000001e000
[   33.709226] RDX: 0000000000000000 RSI: 0000000000000078 RDI: 0000000000000000
[   33.717198] RBP: ffff897c5ba8fb08 R08: 0000000000000000 R09: 0000000000000000
[   33.725178] R10: 0000000000000000 R11: 000000000001e000 R12: ffff897c5ba8fa90
[   33.733156] R13: ffff897c5ad61d80 R14: 0000000000000000 R15: ffff897c5ba8fba0
[   33.741132] FS:  0000000000000000(0000) GS:ffff897d7c200000(0000)
knlGS:0000000000000000
[   33.750164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.756593] CR2: 0000000000000168 CR3: 0000000002a14000 CR4: 00000000001407e0
[   33.764571] Stack:
[   33.766822]  0000000000000000 0000000000000046 0000000000000048
0000000000000000
[   33.775141]  ffff897c5ad61d98 ffff897c5ba8fa20 0000000000000036
00000000000003ab
[   33.783461]  00000000000003ab 0000000000000139 00000000000044e8
0000000100000003
[   33.791789] Call Trace:
[   33.794549]  [<ffffffff810dc6c8>] load_balance+0x1c8/0x8d0
[   33.800701]  [<ffffffff810ee65b>] ? __lock_acquire+0xadb/0xce0
[   33.807222]  [<ffffffff810dd2d1>] idle_balance+0x101/0x1c0
[   33.813355]  [<ffffffff810dd214>] ? idle_balance+0x44/0x1c0
[   33.819618]  [<ffffffff8207a5bb>] __schedule+0x2cb/0xa10
[   33.825584]  [<ffffffff810e86c8>] ? trace_hardirqs_off_caller+0x28/0x160
[   33.833089]  [<ffffffff810e880d>] ? trace_hardirqs_off+0xd/0x10
[   33.839731]  [<ffffffff810d3b84>] ? local_clock+0x34/0x60
[   33.845788]  [<ffffffff810ba7bb>] ? worker_thread+0x2db/0x370
[   33.852241]  [<ffffffff8207f8a0>] ? _raw_spin_unlock_irq+0x30/0x40
[   33.859150]  [<ffffffff8207ad65>] schedule+0x65/0x70
[   33.864700]  [<ffffffff810ba7c0>] worker_thread+0x2e0/0x370
[   33.870932]  [<ffffffff810ec17d>] ? trace_hardirqs_on+0xd/0x10
[   33.877472]  [<ffffffff810ba4e0>] ? manage_workers.isra.17+0x330/0x330
[   33.884789]  [<ffffffff810c18c8>] kthread+0x108/0x110
[   33.890441]  [<ffffffff810c17c0>] ? __init_kthread_worker+0x70/0x70
[   33.897465]  [<ffffffff8208812c>] ret_from_fork+0x7c/0xb0
[   33.903504]  [<ffffffff810c17c0>] ? __init_kthread_worker+0x70/0x70
[   33.910508] Code: 89 85 b8 fe ff ff 49 8b 45 10 41 8b 75 0c 44 8b
50 08 44 8b 58 04 89 f0 48 c1 e0 0a 45 89 d1 49 8d 44 01 ff 48 89 c2
48 c1 fa 3f <49> f7 f9 31 d2 49 89 c1 89 f0 44 89 de 41 f7 f1 48 81 c6
00 02
[   33.932375] RIP  [<ffffffff810dbf2c>] find_busiest_group+0x2ac/0x880
[   33.939491]  RSP <ffff897c5ba8f9a8>
[   33.943418] ---[ end trace 7a833c0cac54cac8 ]---

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-28  7:07                     ` Yinghai Lu
@ 2013-11-28  9:38                       ` Peter Zijlstra
  2013-11-28 20:23                         ` Yinghai Lu
  2013-12-06  6:24                       ` Yinghai Lu
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-11-28  9:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Rientjes, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Wed, Nov 27, 2013 at 11:07:04PM -0800, Yinghai Lu wrote:
> On Wed, Nov 27, 2013 at 7:02 PM, David Rientjes <rientjes@google.com> wrote:
> > On Thu, 21 Nov 2013, Yinghai Lu wrote:
> >
> >> original one in linus's tree:
> >>
> >> [    8.952728] NMI watchdog: enabled on all CPUs, permanently consumes
> >> one hw-PMU counter.
> >> [    8.965697] BUG: unable to handle kernel NULL pointer dereference
> >> at 0000000000000010
> >> [    8.969495] IP: [<ffffffff810d7b53>] update_group_power+0x1d3/0x250
> >
> > This should have been fixed by Srikar's patch, no?
> 
> maybe not related, now in another system, linus's tree + Srikar's patch.

And this another such -EX system? And from what I can tell its during
boot, right?

Totally weird.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-28  9:38                       ` Peter Zijlstra
@ 2013-11-28 20:23                         ` Yinghai Lu
  0 siblings, 0 replies; 32+ messages in thread
From: Yinghai Lu @ 2013-11-28 20:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Rientjes, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Thu, Nov 28, 2013 at 1:38 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 27, 2013 at 11:07:04PM -0800, Yinghai Lu wrote:
>> On Wed, Nov 27, 2013 at 7:02 PM, David Rientjes <rientjes@google.com> wrote:
>> > On Thu, 21 Nov 2013, Yinghai Lu wrote:
>> >
>> >> original one in linus's tree:
>> >>
>> >> [    8.952728] NMI watchdog: enabled on all CPUs, permanently consumes
>> >> one hw-PMU counter.
>> >> [    8.965697] BUG: unable to handle kernel NULL pointer dereference
>> >> at 0000000000000010
>> >> [    8.969495] IP: [<ffffffff810d7b53>] update_group_power+0x1d3/0x250
>> >
>> > This should have been fixed by Srikar's patch, no?
>>
>> maybe not related, now in another system, linus's tree + Srikar's patch.
>
> And this another such -EX system? And from what I can tell its during
> boot, right?

Ivybridge EX, yes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-11-28  7:07                     ` Yinghai Lu
  2013-11-28  9:38                       ` Peter Zijlstra
@ 2013-12-06  6:24                       ` Yinghai Lu
  2013-12-10 10:58                         ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2013-12-06  6:24 UTC (permalink / raw)
  To: David Rientjes, Peter Zijlstra
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List, srikar,
	Thomas Gleixner, linux-tip-commits

[-- Attachment #1: Type: text/plain, Size: 3883 bytes --]

On Wed, Nov 27, 2013 at 11:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Nov 27, 2013 at 7:02 PM, David Rientjes <rientjes@google.com> wrote:

> maybe not related, now in another system, linus's tree + Srikar's patch.
>
> got
>
> [   33.546361] divide error: 0000 [#1]
> SMP
> [   33.589436] Modules linked in:
> [   33.592869] CPU: 15 PID: 567 Comm: kworker/u482:0 Not tainted
> 3.13.0-rc1-yh-00324-gcf1be1c-dirty #10
> [   33.603075] Hardware name: Oracle Corporation
> [   33.609571] calling  ipc_ns_init+0x0/0x14 @ 1
> [   33.609575] initcall ipc_ns_init+0x0/0x14 returned 0 after 0 usecs
> [   33.609577] calling  init_mmap_min_addr+0x0/0x16 @ 1
> [   33.609579] initcall init_mmap_min_addr+0x0/0x16 returned 0 after 0 usecs
> [   33.609583] calling  init_cpufreq_transition_notifier_list+0x0/0x1b @ 1
> [   33.609621] initcall init_cpufreq_transition_notifier_list+0x0/0x1b
> returned 0 after 0 usecs
> [   33.609624] calling  net_ns_init+0x0/0xfa @ 1
> [   33.677194] task: ffff897c5ba5c8c0 ti: ffff897c5ba8e000 task.ti:
> ffff897c5ba8e000
> [   33.685558] RIP: 0010:[<ffffffff810dbf2c>]  [<ffffffff810dbf2c>]
> find_busiest_group+0x2ac/0x880
> [   33.695310] RSP: 0000:ffff897c5ba8f9a8  EFLAGS: 00010046
> [   33.701253] RAX: 000000000001dfff RBX: 00000000ffffffff RCX: 000000000001e000
> [   33.709226] RDX: 0000000000000000 RSI: 0000000000000078 RDI: 0000000000000000
> [   33.717198] RBP: ffff897c5ba8fb08 R08: 0000000000000000 R09: 0000000000000000
> [   33.725178] R10: 0000000000000000 R11: 000000000001e000 R12: ffff897c5ba8fa90
> [   33.733156] R13: ffff897c5ad61d80 R14: 0000000000000000 R15: ffff897c5ba8fba0
> [   33.741132] FS:  0000000000000000(0000) GS:ffff897d7c200000(0000)
> knlGS:0000000000000000
> [   33.750164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   33.756593] CR2: 0000000000000168 CR3: 0000000002a14000 CR4: 00000000001407e0
> [   33.764571] Stack:
> [   33.766822]  0000000000000000 0000000000000046 0000000000000048
> 0000000000000000
> [   33.775141]  ffff897c5ad61d98 ffff897c5ba8fa20 0000000000000036
> 00000000000003ab
> [   33.783461]  00000000000003ab 0000000000000139 00000000000044e8
> 0000000100000003
> [   33.791789] Call Trace:
> [   33.794549]  [<ffffffff810dc6c8>] load_balance+0x1c8/0x8d0
> [   33.800701]  [<ffffffff810ee65b>] ? __lock_acquire+0xadb/0xce0
> [   33.807222]  [<ffffffff810dd2d1>] idle_balance+0x101/0x1c0
> [   33.813355]  [<ffffffff810dd214>] ? idle_balance+0x44/0x1c0
> [   33.819618]  [<ffffffff8207a5bb>] __schedule+0x2cb/0xa10
> [   33.825584]  [<ffffffff810e86c8>] ? trace_hardirqs_off_caller+0x28/0x160
> [   33.833089]  [<ffffffff810e880d>] ? trace_hardirqs_off+0xd/0x10
> [   33.839731]  [<ffffffff810d3b84>] ? local_clock+0x34/0x60
> [   33.845788]  [<ffffffff810ba7bb>] ? worker_thread+0x2db/0x370
> [   33.852241]  [<ffffffff8207f8a0>] ? _raw_spin_unlock_irq+0x30/0x40
> [   33.859150]  [<ffffffff8207ad65>] schedule+0x65/0x70
> [   33.864700]  [<ffffffff810ba7c0>] worker_thread+0x2e0/0x370
> [   33.870932]  [<ffffffff810ec17d>] ? trace_hardirqs_on+0xd/0x10
> [   33.877472]  [<ffffffff810ba4e0>] ? manage_workers.isra.17+0x330/0x330
> [   33.884789]  [<ffffffff810c18c8>] kthread+0x108/0x110
> [   33.890441]  [<ffffffff810c17c0>] ? __init_kthread_worker+0x70/0x70
> [   33.897465]  [<ffffffff8208812c>] ret_from_fork+0x7c/0xb0
> [   33.903504]  [<ffffffff810c17c0>] ? __init_kthread_worker+0x70/0x70
> [   33.910508] Code: 89 85 b8 fe ff ff 49 8b 45 10 41 8b 75 0c 44 8b
> 50 08 44 8b 58 04 89 f0 48 c1 e0 0a 45 89 d1 49 8d 44 01 ff 48 89 c2
> 48 c1 fa 3f <49> f7 f9 31 d2 49 89 c1 89 f0 44 89 de 41 f7 f1 48 81 c6
> 00 02
> [   33.932375] RIP  [<ffffffff810dbf2c>] find_busiest_group+0x2ac/0x880
> [   33.939491]  RSP <ffff897c5ba8f9a8>
> [   33.943418] ---[ end trace 7a833c0cac54cac8 ]---

Hi, PeterZ,

This divide_by_zero could be workaround with attached patch.

Yinghai

[-- Attachment #2: sched_divide_by_zero_workaround.patch --]
[-- Type: text/x-patch, Size: 469 bytes --]

---
 kernel/sched/core.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6/kernel/sched/core.c
===================================================================
--- linux-2.6.orig/kernel/sched/core.c
+++ linux-2.6/kernel/sched/core.c
@@ -5737,6 +5737,9 @@ static int __sdt_alloc(const struct cpum
 			if (!sgp)
 				return -ENOMEM;
 
+			/* avoid divide-by-zero in sg_capacity() */
+			sgp->power_orig = 1;
+
 			*per_cpu_ptr(sdd->sgp, j) = sgp;
 		}
 	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-12-06  6:24                       ` Yinghai Lu
@ 2013-12-10 10:58                         ` Peter Zijlstra
  2013-12-10 21:26                           ` Yinghai Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-12-10 10:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Rientjes, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Thu, Dec 05, 2013 at 10:24:25PM -0800, Yinghai Lu wrote:
> ---
>  kernel/sched/core.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6/kernel/sched/core.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/core.c
> +++ linux-2.6/kernel/sched/core.c
> @@ -5737,6 +5737,9 @@ static int __sdt_alloc(const struct cpum
>  			if (!sgp)
>  				return -ENOMEM;
>  
> +			/* avoid divide-by-zero in sg_capacity() */
> +			sgp->power_orig = 1;
> +
>  			*per_cpu_ptr(sdd->sgp, j) = sgp;
>  		}
>  	}


Ooh, sg_capacity() is generating the /0.. 

Does the below work too?

---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87c3bc47d99d..40b185f5a3ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5115,6 +5115,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 		 * die on a /0 trap.
 		 */
 		sg->sgp->power = SCHED_POWER_SCALE * cpumask_weight(sg_span);
+		sg->sgp->power_orig = sg->sgp->power;
 
 		/*
 		 * Make sure the first group of this domain contains the

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
  2013-12-10 10:58                         ` Peter Zijlstra
@ 2013-12-10 21:26                           ` Yinghai Lu
  0 siblings, 0 replies; 32+ messages in thread
From: Yinghai Lu @ 2013-12-10 21:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Rientjes, Ingo Molnar, H. Peter Anvin,
	Linux Kernel Mailing List, srikar, Thomas Gleixner,
	linux-tip-commits

On Tue, Dec 10, 2013 at 2:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> Ooh, sg_capacity() is generating the /0..
>
> Does the below work too?
>
> ---
>  kernel/sched/core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 87c3bc47d99d..40b185f5a3ec 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5115,6 +5115,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
>                  * die on a /0 trap.
>                  */
>                 sg->sgp->power = SCHED_POWER_SCALE * cpumask_weight(sg_span);
> +               sg->sgp->power_orig = sg->sgp->power;
>
>                 /*
>                  * Make sure the first group of this domain contains the


Yes, it works.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-12-10 21:26 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-12 18:05 [tip:sched/core] sched/fair: Fix group power_orig computation tip-bot for Peter Zijlstra
2013-09-12 23:21 ` Michael Neuling
2013-11-12 10:55 ` Srikar Dronamraju
2013-11-12 11:57   ` Peter Zijlstra
2013-11-12 16:41     ` [PATCH v2] sched: Check sched_domain before computing group power Srikar Dronamraju
2013-11-12 17:03       ` Peter Zijlstra
2013-11-12 17:15         ` Srikar Dronamraju
2013-11-12 17:55           ` Peter Zijlstra
2013-11-13  5:55             ` Srikar Dronamraju
     [not found]       ` <CAM4v1pNMn=5oZDiX3fUp9uPkZTPJgk=vEKEjevzvpwn=PjTzXg@mail.gmail.com>
2013-11-13 11:23         ` Srikar Dronamraju
2013-11-14  6:06           ` Preeti U Murthy
2013-11-14  8:30             ` Peter Zijlstra
2013-11-14  9:12               ` Preeti U Murthy
2013-11-13 15:17       ` Peter Zijlstra
2013-11-14 10:50         ` Srikar Dronamraju
2013-11-14 11:15           ` Peter Zijlstra
2013-11-19 19:15         ` [tip:sched/urgent] " tip-bot for Srikar Dronamraju
2013-11-19 23:36           ` Yinghai Lu
2013-11-21 15:03             ` Peter Zijlstra
2013-11-21 17:22               ` Yinghai Lu
2013-11-21 22:03                 ` Yinghai Lu
2013-11-28  3:02                   ` David Rientjes
2013-11-28  7:07                     ` Yinghai Lu
2013-11-28  9:38                       ` Peter Zijlstra
2013-11-28 20:23                         ` Yinghai Lu
2013-12-06  6:24                       ` Yinghai Lu
2013-12-10 10:58                         ` Peter Zijlstra
2013-12-10 21:26                           ` Yinghai Lu
2013-11-22 12:07                 ` Peter Zijlstra
2013-11-23  5:00                   ` Yinghai Lu
2013-11-23 18:53                     ` Peter Zijlstra
2013-11-28  2:57           ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.