linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.17-mm5
@ 2006-07-02 23:27 Martin J. Bligh
  2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Martin J. Bligh @ 2006-07-02 23:27 UTC (permalink / raw)
  To: akpm, linux-kernel

Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch

divide error: 0000 [#1]
8K_STACKS SMP 
last sysfs file: 
Modules linked in:
CPU:    1
EIP:    0060:[<c0112b6e>]    Not tainted VLI
EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
EIP is at find_busiest_group+0x1a3/0x47c
eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
       00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
Call Trace:
 [<c0119020>] vprintk+0x5f/0x213
 [<c0112efb>] load_balance+0x54/0x1d6
 [<c011332d>] rebalance_tick+0xc5/0xe3
 [<c01137a3>] scheduler_tick+0x2cb/0x2d3
 [<c01215b4>] update_process_times+0x51/0x5d
 [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
 [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
 [<c01006c0>] default_idle+0x0/0x59
 [<c01006f1>] default_idle+0x31/0x59
 [<c0100791>] cpu_idle+0x64/0x79
Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 23:27 2.6.17-mm5 Martin J. Bligh
@ 2006-07-02 23:41 ` Andrew Morton
  2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
  2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2006-07-02 23:41 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andy Whitcroft

On Sun, 02 Jul 2006 16:27:55 -0700
"Martin J. Bligh" <mbligh@mbligh.org> wrote:

> Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
> 
> divide error: 0000 [#1]
> 8K_STACKS SMP 
> last sysfs file: 
> Modules linked in:
> CPU:    1
> EIP:    0060:[<c0112b6e>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
> EIP is at find_busiest_group+0x1a3/0x47c
> eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
> Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
> Call Trace:
>  [<c0119020>] vprintk+0x5f/0x213
>  [<c0112efb>] load_balance+0x54/0x1d6
>  [<c011332d>] rebalance_tick+0xc5/0xe3
>  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>  [<c01215b4>] update_process_times+0x51/0x5d
>  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>  [<c01006c0>] default_idle+0x0/0x59
>  [<c01006f1>] default_idle+0x31/0x59
>  [<c0100791>] cpu_idle+0x64/0x79
> Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
> EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58

Yes, Andy's reporting that too.  I asked him to identify the file-n-line
and he ran away on me.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch] sched: fix macro -> inline function conversion bug
  2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
@ 2006-07-03  5:25   ` Ingo Molnar
  2006-07-03  5:42     ` Andrew Morton
  2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
  1 sibling, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2006-07-03  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel, Andy Whitcroft


* Andrew Morton <akpm@osdl.org> wrote:

> On Sun, 02 Jul 2006 16:27:55 -0700
> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> 
> > Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
> > 
> > divide error: 0000 [#1]
> > 8K_STACKS SMP 
> > last sysfs file: 
> > Modules linked in:
> > CPU:    1
> > EIP:    0060:[<c0112b6e>]    Not tainted VLI
> > EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
> > EIP is at find_busiest_group+0x1a3/0x47c
> > eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> > esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
> > ds: 007b   es: 007b   ss: 0068
> > Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
> > Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
> >        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
> >        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
> > Call Trace:
> >  [<c0119020>] vprintk+0x5f/0x213
> >  [<c0112efb>] load_balance+0x54/0x1d6
> >  [<c011332d>] rebalance_tick+0xc5/0xe3
> >  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
> >  [<c01215b4>] update_process_times+0x51/0x5d
> >  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
> >  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
> >  [<c01006c0>] default_idle+0x0/0x59
> >  [<c01006f1>] default_idle+0x31/0x59
> >  [<c0100791>] cpu_idle+0x64/0x79
> > Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
> > EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
> 
> Yes, Andy's reporting that too.  I asked him to identify the 
> file-n-line and he ran away on me.

i checked the scheduler queue and nothing jumped out at me, except the 
cleanup bug fixed by the patch below. (which should be harmless in this 
particular case - nr_running should never be smaller than 0 or larger 
than ~4 billion. A fix is warranted nevertheless.)

	Ingo

-------------->
Subject: sched: fix macro -> inline function conversion bug
From: Ingo Molnar <mingo@elte.hu>

nr_running is long, not int.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -2480,7 +2480,7 @@ find_busiest_queue(struct sched_group *g
  */
 #define MAX_PINNED_INTERVAL	512
 
-static inline int minus_1_or_zero(int n)
+static inline unsigned long minus_1_or_zero(unsigned long n)
 {
 	return n > 0 ? n - 1 : 0;
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
@ 2006-07-03  5:42     ` Andrew Morton
  2006-07-03  6:03       ` Ingo Molnar
  2006-07-03  6:06       ` Peter Williams
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2006-07-03  5:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: mbligh, linux-kernel, apw

On Mon, 3 Jul 2006 07:25:39 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > On Sun, 02 Jul 2006 16:27:55 -0700
> > "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> > 
> > > Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
> > > 
> > > divide error: 0000 [#1]
> > > 8K_STACKS SMP 
> > > last sysfs file: 
> > > Modules linked in:
> > > CPU:    1
> > > EIP:    0060:[<c0112b6e>]    Not tainted VLI
> > > EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
> > > EIP is at find_busiest_group+0x1a3/0x47c
> > > eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> > > esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
> > > ds: 007b   es: 007b   ss: 0068
> > > Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
> > > Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
> > >        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
> > >        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
> > > Call Trace:
> > >  [<c0119020>] vprintk+0x5f/0x213
> > >  [<c0112efb>] load_balance+0x54/0x1d6
> > >  [<c011332d>] rebalance_tick+0xc5/0xe3
> > >  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
> > >  [<c01215b4>] update_process_times+0x51/0x5d
> > >  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
> > >  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
> > >  [<c01006c0>] default_idle+0x0/0x59
> > >  [<c01006f1>] default_idle+0x31/0x59
> > >  [<c0100791>] cpu_idle+0x64/0x79
> > > Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
> > > EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
> > 
> > Yes, Andy's reporting that too.  I asked him to identify the 
> > file-n-line and he ran away on me.
> 
> i checked the scheduler queue and nothing jumped out at me, except the 
> cleanup bug fixed by the patch below. (which should be harmless in this 
> particular case - nr_running should never be smaller than 0 or larger 
> than ~4 billion. A fix is warranted nevertheless.)

Did you work out which divide is getting the div-by-zero?  I started at it
a bit and wasn't sure - am getting wildly different code generation over
here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  5:42     ` Andrew Morton
@ 2006-07-03  6:03       ` Ingo Molnar
  2006-07-03  6:08         ` Ingo Molnar
  2006-07-03  6:06       ` Peter Williams
  1 sibling, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2006-07-03  6:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mbligh, linux-kernel, apw


* Andrew Morton <akpm@osdl.org> wrote:

> > i checked the scheduler queue and nothing jumped out at me, except 
> > the cleanup bug fixed by the patch below. (which should be harmless 
> > in this particular case - nr_running should never be smaller than 0 
> > or larger than ~4 billion. A fix is warranted nevertheless.)
> 
> Did you work out which divide is getting the div-by-zero?  I started 
> at it a bit and wasn't sure - am getting wildly different code 
> generation over here.

my bet is on sched-group-cpu-power-setup-cleanup.patch.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  5:42     ` Andrew Morton
  2006-07-03  6:03       ` Ingo Molnar
@ 2006-07-03  6:06       ` Peter Williams
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Williams @ 2006-07-03  6:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, mbligh, linux-kernel, apw

Andrew Morton wrote:
> On Mon, 3 Jul 2006 07:25:39 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
>> * Andrew Morton <akpm@osdl.org> wrote:
>>
>>> On Sun, 02 Jul 2006 16:27:55 -0700
>>> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
>>>
>>>> Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
>>>>
>>>> divide error: 0000 [#1]
>>>> 8K_STACKS SMP 
>>>> last sysfs file: 
>>>> Modules linked in:
>>>> CPU:    1
>>>> EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>>> EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
>>>> EIP is at find_busiest_group+0x1a3/0x47c
>>>> eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>>> esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
>>>> ds: 007b   es: 007b   ss: 0068
>>>> Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
>>>> Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>>>>        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>>>>        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
>>>> Call Trace:
>>>>  [<c0119020>] vprintk+0x5f/0x213
>>>>  [<c0112efb>] load_balance+0x54/0x1d6
>>>>  [<c011332d>] rebalance_tick+0xc5/0xe3
>>>>  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>>>>  [<c01215b4>] update_process_times+0x51/0x5d
>>>>  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>>>>  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>>>>  [<c01006c0>] default_idle+0x0/0x59
>>>>  [<c01006f1>] default_idle+0x31/0x59
>>>>  [<c0100791>] cpu_idle+0x64/0x79
>>>> Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
>>>> EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
>>> Yes, Andy's reporting that too.  I asked him to identify the 
>>> file-n-line and he ran away on me.
>> i checked the scheduler queue and nothing jumped out at me, except the 
>> cleanup bug fixed by the patch below. (which should be harmless in this 
>> particular case - nr_running should never be smaller than 0 or larger 
>> than ~4 billion. A fix is warranted nevertheless.)
> 
> Did you work out which divide is getting the div-by-zero?  I started at it
> a bit and wasn't sure - am getting wildly different code generation over
> here.

As far as I can see all divides, except those that rely on 
group->cpu_power being non zero, in find_busiest_queue() are protected 
against divide by zero.  So this would suggest that initialization of 
the scheduler group data would be the place to look.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  6:03       ` Ingo Molnar
@ 2006-07-03  6:08         ` Ingo Molnar
  2006-07-05 19:36           ` Siddha, Suresh B
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2006-07-03  6:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mbligh, linux-kernel, apw


* Ingo Molnar <mingo@elte.hu> wrote:

> > Did you work out which divide is getting the div-by-zero?  I started 
> > at it a bit and wasn't sure - am getting wildly different code 
> > generation over here.
> 
> my bet is on sched-group-cpu-power-setup-cleanup.patch.

in particular, we dont seem to initialize ->cpu_power properly. Martin, 
does the patch below solve your crash?

	Ingo

-------------->
Subject: sched: group cpu power setup cleanup, fix
From: Ingo Molnar <mingo@elte.hu>

- fix missing initialization of ->cpu_power
- clean up the cleanup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h |    2 +-
 kernel/sched.c        |    9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -636,7 +636,7 @@ enum idle_type
 	((sched_mc_power_savings || sched_smt_power_savings) ?	\
 					SD_POWERSAVINGS_BALANCE : 0)
 
-#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
+#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
 
 
 struct sched_group {
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1292,7 +1292,7 @@ static int sched_balance_self(int cpu, i
 		cpu = new_cpu;
 nextlevel:
 		sd = sd->child;
-		if (sd && sd->flags & flag)
+		if (test_sd_flag(sd, flag))
 			goto nextlevel;
 		/* while loop will break here if sd == NULL */
 	}
@@ -6224,6 +6224,7 @@ static int cpu_to_allnodes_group(int cpu
 {
 	return cpu_to_node(cpu);
 }
+
 static void init_numa_sched_groups_power(struct sched_group *group_head)
 {
 	struct sched_group *sg = group_head;
@@ -6314,8 +6315,12 @@ static void init_sched_groups_power(int 
 	struct sched_domain *child;
 	struct sched_group *group;
 
-	if (!sd || !sd->groups || (cpu != first_cpu(sd->groups->cpumask)))
+	WARN_ON(!sd || !sd->groups);
+
+	if (cpu != first_cpu(sd->groups->cpumask)) {
+		sd->groups->cpu_power = SCHED_LOAD_SCALE;
 		return;
+	}
 
 	child = sd->child;
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
  2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
@ 2006-07-03  8:23   ` Andy Whitcroft
  2006-07-03 14:19     ` 2.6.17-mm5 Andy Whitcroft
  1 sibling, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2006-07-03  8:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

Andrew Morton wrote:
> On Sun, 02 Jul 2006 16:27:55 -0700
> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> 
> 
>>Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
>>
>>divide error: 0000 [#1]
>>8K_STACKS SMP 
>>last sysfs file: 
>>Modules linked in:
>>CPU:    1
>>EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
>>EIP is at find_busiest_group+0x1a3/0x47c
>>eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
>>ds: 007b   es: 007b   ss: 0068
>>Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
>>Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>>       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>>       00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
>>Call Trace:
>> [<c0119020>] vprintk+0x5f/0x213
>> [<c0112efb>] load_balance+0x54/0x1d6
>> [<c011332d>] rebalance_tick+0xc5/0xe3
>> [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>> [<c01215b4>] update_process_times+0x51/0x5d
>> [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>> [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>> [<c01006c0>] default_idle+0x0/0x59
>> [<c01006f1>] default_idle+0x31/0x59
>> [<c0100791>] cpu_idle+0x64/0x79
>>Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
>>EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
> 
> 
> Yes, Andy's reporting that too.  I asked him to identify the file-n-line
> and he ran away on me.

I went away to debug it, but then had to skip out to a BBQ.  Its
definatly the cpu_power on the group being zero.

group->cpu_power ZERO => c3150920

/me gives Ingo's patch a spin.

-apw

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.17-mm5
  2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-03 14:19     ` Andy Whitcroft
  0 siblings, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2006-07-03 14:19 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Andy Whitcroft, Andrew Morton, Martin J. Bligh, linux-kernel,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 3284 bytes --]

Andy Whitcroft wrote:
> Andrew Morton wrote:
> 
>>On Sun, 02 Jul 2006 16:27:55 -0700
>>"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>>
>>
>>
>>>Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
>>>
>>>divide error: 0000 [#1]
>>>8K_STACKS SMP 
>>>last sysfs file: 
>>>Modules linked in:
>>>CPU:    1
>>>EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>>EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
>>>EIP is at find_busiest_group+0x1a3/0x47c
>>>eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>>esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
>>>ds: 007b   es: 007b   ss: 0068
>>>Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
>>>Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>>>      ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>>>      00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
>>>Call Trace:
>>>[<c0119020>] vprintk+0x5f/0x213
>>>[<c0112efb>] load_balance+0x54/0x1d6
>>>[<c011332d>] rebalance_tick+0xc5/0xe3
>>>[<c01137a3>] scheduler_tick+0x2cb/0x2d3
>>>[<c01215b4>] update_process_times+0x51/0x5d
>>>[<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>>>[<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>>>[<c01006c0>] default_idle+0x0/0x59
>>>[<c01006f1>] default_idle+0x31/0x59
>>>[<c0100791>] cpu_idle+0x64/0x79
>>>Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
>>>EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
>>
>>
>>Yes, Andy's reporting that too.  I asked him to identify the file-n-line
>>and he ran away on me.
> 
> 
> I went away to debug it, but then had to skip out to a BBQ.  Its
> definatly the cpu_power on the group being zero.
> 
> group->cpu_power ZERO => c3150920
> 
> /me gives Ingo's patch a spin.

Ok.  Thats not fixed it either.

Confirmed that it is caused by the patch below, backing it out sorts things:

    sched-group-cpu-power-setup-cleanup.patch

Did a fair bit of analysis of this problem, and it seems that the issue
is where we initialise the NUMA domains.  For each cpu we initialise a
domain spanning the whole machine, but ordered starting at the node in
which we start.  In the original code we used the following to
initialise each of these groups:

        for (i = 0; i < MAX_NUMNODES; i++)
                init_numa_sched_groups_power(sched_group_nodes[i]);

init_numa_sched_groups_power iterated over all of the groups within the
domain and added up power based on the physical packages.  Now we use:

        for_each_cpu_mask(i, *cpu_map) {
                sd = &per_cpu(node_domains, i);
                init_sched_groups_power(i, sd);
        }

init_sched_groups_power only thinks in terms of the first group of the
domain, which leaves the subsequent groups in the domain at power 0.

I've tried reverting just that part of the change (as attached) and that
also seems to fix things.  However, are we correct in all the other
cases ignoring the subsequent groups?  I am also not sure if this will
change the purpose of the patch?  It seems unlikely but ...

Suresh I guess I'll punt to you :).

-apw

[-- Attachment #2: sched-revert-numa-domain-init --]
[-- Type: text/plain, Size: 1025 bytes --]

sched revert numa domain init

Seems that the schedular domains for the NUMA nodea arn't being
initialised correctly.  Their cpu_power is left at 0 leading to
panic in load balancing.  Seem that just reverting this part of
the change below is enough to fix booting of NUMA systems.

   sched-group-cpu-power-setup-cleanup.patch

Not sure if this changes the purpose of that patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 sched.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)
diff -upN reference/kernel/sched.c current/kernel/sched.c
--- reference/kernel/sched.c
+++ current/kernel/sched.c
@@ -6656,10 +6656,8 @@ printk(KERN_WARNING "init CPU domains\n"
 
 printk(KERN_WARNING "init NUMA domains\n");
 #ifdef CONFIG_NUMA
-	for_each_cpu_mask(i, *cpu_map) {
-		sd = &per_cpu(node_domains, i);
-		init_sched_groups_power(i, sd);
-	}
+        for (i = 0; i < MAX_NUMNODES; i++)
+		init_numa_sched_groups_power(sched_group_nodes[i]);
 
 	init_numa_sched_groups_power(sched_group_allnodes);
 #endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  6:08         ` Ingo Molnar
@ 2006-07-05 19:36           ` Siddha, Suresh B
  2006-07-05 20:02             ` Ingo Molnar
  2006-07-06  8:27             ` Andy Whitcroft
  0 siblings, 2 replies; 15+ messages in thread
From: Siddha, Suresh B @ 2006-07-05 19:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, mbligh, linux-kernel, apw

Martin, Andy: Can you please try the appended patch on top of 2.6.17-mm5?

thanks,
suresh

On Mon, Jul 03, 2006 at 08:08:32AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > Did you work out which divide is getting the div-by-zero?  I started 
> > > at it a bit and wasn't sure - am getting wildly different code 
> > > generation over here.
> > 
> > my bet is on sched-group-cpu-power-setup-cleanup.patch.
> 
> in particular, we dont seem to initialize ->cpu_power properly. Martin, 
> does the patch below solve your crash?
> 
>  		sd = sd->child;
> -		if (sd && sd->flags & flag)
> +		if (test_sd_flag(sd, flag))

There is a bug in my patch. Appended patch fixes this.

> -	if (!sd || !sd->groups || (cpu != first_cpu(sd->groups->cpumask)))
> +	WARN_ON(!sd || !sd->groups);
> +
> +	if (cpu != first_cpu(sd->groups->cpumask)) {
> +		sd->groups->cpu_power = SCHED_LOAD_SCALE;
>  		return;

This is also not correct and will corrupt some of the groups cpu_power.
NUMA sched group setup is some what different from the other domains like
HT and SMP. Appended patch has the correct fix.

--

- go back to original numa sched group power initialization
- fix the sched_balance_self code
- some cleanup as suggested by Ingo.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

--- linux-2.6.17mm5/kernel/sched.c~	2006-07-05 10:15:27.274721992 -0700
+++ linux-2.6.17mm5/kernel/sched.c	2006-07-05 10:34:01.072399008 -0700
@@ -1292,7 +1292,7 @@ static int sched_balance_self(int cpu, i
 		cpu = new_cpu;
 nextlevel:
 		sd = sd->child;
-		if (sd && sd->flags & flag)
+		if (sd && !(sd->flags & flag))
 			goto nextlevel;
 		/* while loop will break here if sd == NULL */
 	}
@@ -5534,7 +5534,7 @@ static void cpu_attach_domain(struct sch
 
 	if (sd && sd_degenerate(sd)) {
 		sd = sd->parent;
-		if(sd)
+		if (sd)
 			sd->child = NULL;
 	}
 
@@ -6224,6 +6224,7 @@ static int cpu_to_allnodes_group(int cpu
 {
 	return cpu_to_node(cpu);
 }
+
 static void init_numa_sched_groups_power(struct sched_group *group_head)
 {
 	struct sched_group *sg = group_head;
@@ -6314,7 +6315,9 @@ static void init_sched_groups_power(int 
 	struct sched_domain *child;
 	struct sched_group *group;
 
-	if (!sd || !sd->groups || (cpu != first_cpu(sd->groups->cpumask)))
+	WARN_ON(!sd || !sd->groups);
+
+	if (cpu != first_cpu(sd->groups->cpumask))
 		return;
 
 	child = sd->child;
@@ -6596,10 +6599,8 @@ static int build_sched_domains(const cpu
 	}
 
 #ifdef CONFIG_NUMA
-	for_each_cpu_mask(i, *cpu_map) {
-		sd = &per_cpu(node_domains, i);
-		init_sched_groups_power(i, sd);
-	}
+	for (i = 0; i < MAX_NUMNODES; i++)
+		init_numa_sched_groups_power(sched_group_nodes[i]);
 
 	init_numa_sched_groups_power(sched_group_allnodes);
 #endif
--- linux-2.6.17mm5/include/linux/sched.h~	2006-07-05 10:18:10.014981712 -0700
+++ linux-2.6.17mm5/include/linux/sched.h	2006-07-05 10:30:55.889551080 -0700
@@ -636,7 +636,7 @@ enum idle_type
 	((sched_mc_power_savings || sched_smt_power_savings) ?	\
 					SD_POWERSAVINGS_BALANCE : 0)
 
-#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
+#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
 
 
 struct sched_group {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 19:36           ` Siddha, Suresh B
@ 2006-07-05 20:02             ` Ingo Molnar
  2006-07-05 21:09               ` Siddha, Suresh B
  2006-07-06  8:27             ` Andy Whitcroft
  1 sibling, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2006-07-05 20:02 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Andrew Morton, mbligh, linux-kernel, apw


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> -		if (sd && sd->flags & flag)
> +		if (sd && !(sd->flags & flag))

use test_sd_flag() here, as i did in my fix patch.

> -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)

remove the 'sd' check in test_sd_flag. In the other cases we know that 
there's an sd. (it's usually a sign of spaghetti code if tests like this 
include a check for the existence of the object checked)

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 20:02             ` Ingo Molnar
@ 2006-07-05 21:09               ` Siddha, Suresh B
  2006-07-05 21:17                 ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Siddha, Suresh B @ 2006-07-05 21:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Siddha, Suresh B, Andrew Morton, mbligh, linux-kernel, apw

On Wed, Jul 05, 2006 at 10:02:45PM +0200, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> 
> > -		if (sd && sd->flags & flag)
> > +		if (sd && !(sd->flags & flag))
> 
> use test_sd_flag() here, as i did in my fix patch.
> 
> > -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> > +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
> 
> remove the 'sd' check in test_sd_flag. In the other cases we know that 
> there's an sd. (it's usually a sign of spaghetti code if tests like this 
> include a check for the existence of the object checked)

In other cases, we are passing sd->parent as the first argument to
test_sd_flag(). We know that there is a 'sd' but not sure about sd->parent or
sd->child.

thanks,
suresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 21:09               ` Siddha, Suresh B
@ 2006-07-05 21:17                 ` Ingo Molnar
  2006-07-05 21:21                   ` Siddha, Suresh B
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2006-07-05 21:17 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Andrew Morton, mbligh, linux-kernel, apw


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> On Wed, Jul 05, 2006 at 10:02:45PM +0200, Ingo Molnar wrote:
> > 
> > * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > 
> > > -		if (sd && sd->flags & flag)
> > > +		if (sd && !(sd->flags & flag))
> > 
> > use test_sd_flag() here, as i did in my fix patch.
> > 
> > > -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> > > +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
> > 
> > remove the 'sd' check in test_sd_flag. In the other cases we know that 
> > there's an sd. (it's usually a sign of spaghetti code if tests like this 
> > include a check for the existence of the object checked)
> 
> In other cases, we are passing sd->parent as the first argument to 
> test_sd_flag(). We know that there is a 'sd' but not sure about 
> sd->parent or sd->child.

ok. But the first issue above should be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 21:17                 ` Ingo Molnar
@ 2006-07-05 21:21                   ` Siddha, Suresh B
  0 siblings, 0 replies; 15+ messages in thread
From: Siddha, Suresh B @ 2006-07-05 21:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Siddha, Suresh B, Andrew Morton, mbligh, linux-kernel, apw

On Wed, Jul 05, 2006 at 11:17:02PM +0200, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> 
> > On Wed, Jul 05, 2006 at 10:02:45PM +0200, Ingo Molnar wrote:
> > > 
> > > * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > > 
> > > > -		if (sd && sd->flags & flag)
> > > > +		if (sd && !(sd->flags & flag))
> > > 
> > > use test_sd_flag() here, as i did in my fix patch.
> > > 
> > > > -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> > > > +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
> > > 
> > > remove the 'sd' check in test_sd_flag. In the other cases we know that 
> > > there's an sd. (it's usually a sign of spaghetti code if tests like this 
> > > include a check for the existence of the object checked)
> > 
> > In other cases, we are passing sd->parent as the first argument to 
> > test_sd_flag(). We know that there is a 'sd' but not sure about 
> > sd->parent or sd->child.
> 
> ok. But the first issue above should be fixed.

I can't simply change it to test_sd_flag(). In sched_balance_self(), paths for
sd == 0 and a 'flag' not set in sd->flags are different.

I can change that piece of code to (sd && !test_sd_flag(sd, flag)) though..
but that is not clean, right?

thanks,
suresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 19:36           ` Siddha, Suresh B
  2006-07-05 20:02             ` Ingo Molnar
@ 2006-07-06  8:27             ` Andy Whitcroft
  1 sibling, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2006-07-06  8:27 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Ingo Molnar, Andrew Morton, mbligh, linux-kernel

Siddha, Suresh B wrote:
> Martin, Andy: Can you please try the appended patch on top of 2.6.17-mm5?

Submitted, will let you know.

-apw

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-07-06  8:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-02 23:27 2.6.17-mm5 Martin J. Bligh
2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
2006-07-03  5:42     ` Andrew Morton
2006-07-03  6:03       ` Ingo Molnar
2006-07-03  6:08         ` Ingo Molnar
2006-07-05 19:36           ` Siddha, Suresh B
2006-07-05 20:02             ` Ingo Molnar
2006-07-05 21:09               ` Siddha, Suresh B
2006-07-05 21:17                 ` Ingo Molnar
2006-07-05 21:21                   ` Siddha, Suresh B
2006-07-06  8:27             ` Andy Whitcroft
2006-07-03  6:06       ` Peter Williams
2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
2006-07-03 14:19     ` 2.6.17-mm5 Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).