linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched isolcpus=1 related OOPS in 2.6.9
@ 2004-12-02 15:42 devik
  2004-12-03 16:28 ` devik
  0 siblings, 1 reply; 7+ messages in thread
From: devik @ 2004-12-02 15:42 UTC (permalink / raw)
  To: linux-kernel

Hello,

in Soyo dual CPU PII/350 system I experience early
OOPS (even ksymdump can't save it) during CPU#1
initialization when I use cmdline isolcpus=1 to force
only CPU#0 use (I want to use affinity to select CPU#1).
The OOPS triggers every time when I use isolcpus.

I traced the problem down into sched.c:1928 (find_busiest_group)
where group->cpu_power was zero (thus division by zero occured).
In call trace it goes swapper->schedule()->........->find_busiest_group.
Important registers there: eax=ecx=edx=0, ebx!=0.

Config and vmlinux:
http://luxik.cdi.cz/~devik/files/isolcpus-oops/

Sorry no oops yet (can't get it via ksymoops nor serial),
I can provide further info (screen photo).
Can anyone at least direct me where to look further ?
(I found no general description of group scheduling code
so that I'm lost in it).

thanks a much,

-------------------------------
    Martin Devera aka devik
Linux kernel QoS/HTB maintainer
  http://luxik.cdi.cz/~devik/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched isolcpus=1 related OOPS in 2.6.9
  2004-12-02 15:42 sched isolcpus=1 related OOPS in 2.6.9 devik
@ 2004-12-03 16:28 ` devik
  2004-12-03 17:18   ` Randy.Dunlap
  0 siblings, 1 reply; 7+ messages in thread
From: devik @ 2004-12-03 16:28 UTC (permalink / raw)
  To: linux-kernel

> only CPU#0 use (I want to use affinity to select CPU#1).
> The OOPS triggers every time when I use isolcpus.
>
> I traced the problem down into sched.c:1928 (find_busiest_group)
> where group->cpu_power was zero (thus division by zero occured).
> In call trace it goes swapper->schedule()->........->find_busiest_group.
> Important registers there: eax=ecx=edx=0, ebx!=0.

Well, I have more info. I setup bochs smp emulator and hacked
printk to output into e9 port which is then directed to a file.
Also I turned sched_domains debugging. From the result (below)
is clear that there is bug in isolated domains setup.

devik

<6>BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 0000000000100000 - 0000000002000000 (usable)
<5>32MB LOWMEM available.
<6>found SMP MP-table at 000fd0f0
<7>On node 0 totalpages: 8192
<7>  DMA zone: 4096 pages, LIFO batch:1
<7>  Normal zone: 4096 pages, LIFO batch:1
<7>  HighMem zone: 0 pages, LIFO batch:1
<6>DMI not present.
<3>ACPI: Unable to locate RSDP
<6>Intel MultiProcessor Specification v1.4
<6>    Virtual Wire compatibility mode.
<6>OEM ID: BOCHSCPU Product ID: 0.1          APIC at: 0xFEE00000
Processor #0 6:0 APIC version 17
Processor #1 6:0 APIC version 17
<6>I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
<6>Processors: 2
Built 1 zonelists
Kernel command line: BOOT_IMAGE=linux ro root=301 apic=debug noapic isolcpus=1
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
<6>Initializing CPU#0
CPU 0 irqstacks, hard=c035b000 soft=c0359000
PID hash table entries: 256 (order: 8, 4096 bytes)
Detected 2.001 MHz processor.
<6>Using tsc for high-res timesource
Console: colour VGA+ 80x50
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
<6>Memory: 29372k/32768k available (1511k kernel code, 2960k reserved, 698k data, 168k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
<7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
<7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000
<7>CPU: After vendor identify, caps:  0180a379 00000000 00000000 00000000
<7>CPU: After all inits, caps:        0180a379 00000000 00000000 00000040
<6>Enabling fast FPU save and restore... done.
<6>Checking 'hlt' instruction... OK.
CPU0: Intel Pentium III (Coppermine) stepping 03
per-CPU timeslice cutoff: 81.50 usecs.
task migration cache decay timeout: 1 msecs.
Getting VERSION: 170011
Getting VERSION: 170011
Getting ID: 0
Getting LVT0: 0
Getting LVT1: 0
enabled ExtINT on CPU#0
Booting processor 1/1 eip 2000
CPU 1 irqstacks, hard=c035c000 soft=c035a000
<6>Initializing CPU#1
masked ExtINT on CPU#1
<7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096)
<7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000
<7>CPU: After vendor identify, caps:  0180a379 00000000 00000000 00000000
<7>CPU: After all inits, caps:        0180a379 00000000 00000000 00000040
CPU1: Intel Pentium III (Coppermine) stepping 03
<6>Total of 2 processors activated (16.38 BogoMIPS).
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1.0999 MHz.
..... host bus clock speed is 1.0999 MHz.
<6>checking TSC synchronization across 2 CPUs:
<6>CPU#0 had 0 usecs TSC skew, fixed it up.
<6>CPU#1 had 0 usecs TSC skew, fixed it up.
Brought up 2 CPUs
<6>Setting up cpu 1 isolated.
<7>CPU0:  online
<7> domain 0: span 3
<7>  groups: 1 2
<7>CPU1:  online
<7> domain 0: span 2
<7>ERROR domain->cpu_power not set
<7>  groups: 2
<1>divide error: 0000 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c0116fd3>]    Not tainted VLI
EFLAGS: 00010046   (2.6.9imq)
EIP is at find_busiest_group+0x2b3/0x310
eax: 00000000   ebx: c10b2e74   ecx: 00000000   edx: 00000000
esi: c0360ee8   edi: c0351000   ebp: c10b2e84   esp: c10b2e38
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 1, threadinfo=c10b2000 task=c10b15a0)
Stack: c10b2e74 00000002 00000002 00004441 08bca3a6 c0351000 00000000 00000001
       00000080 00000080 00000080 00000000 00000000 c0360edc 00000000 00000002
       00000040 c1044940 00000001 c10b2eb8 c0117125 c1044940 00000000 c10b2ea8
Call Trace:
 [<c01071ff>] show_stack+0x7f/0xa0
 [<c01073ae>] show_registers+0x15e/0x1c0
 [<c01075c4>] die+0xf4/0x180
 [<c010775b>] do_divide_error+0x10b/0x130
 [<c0106ded>] error_code+0x2d/0x38
 [<c0117125>] load_balance+0x35/0x1a0
 [<c01175fa>] rebalance_tick+0xba/0xd0
 [<c0117732>] scheduler_tick+0x122/0x480
 [<c0124b85>] update_process_times+0x45/0x50
 [<c0111928>] smp_apic_timer_interrupt+0xf8/0x100
 [<c0106d52>] apic_timer_interrupt+0x1a/0x20
 [<c032d03a>] unpack_to_rootfs+0x17a/0x200
 [<c032d0eb>] populate_rootfs+0x2b/0x120
 [<c010058a>] init+0x8a/0x1e0
 [<c0104565>] kernel_thread_helper+0x5/0x10
Code: 00 0f 4d c2 83 f8 01 89 c1 7e ad 8b 4d d0 85 c9 0f 84 fe fd ff ff 8b 45 e0 01 45 dc 89 c2 8b 4e 08 01 4d d4 c1 e2 07 89 d0 31 d2 <f7> f1 8b 55 cc 85 d2 89 45 e0 75 1c 8b 45 e4 39 45 e0 76 09 89



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched isolcpus=1 related OOPS in 2.6.9
  2004-12-03 16:28 ` devik
@ 2004-12-03 17:18   ` Randy.Dunlap
  2004-12-03 17:46     ` devik
  2004-12-03 18:15     ` [PATCH] " devik
  0 siblings, 2 replies; 7+ messages in thread
From: Randy.Dunlap @ 2004-12-03 17:18 UTC (permalink / raw)
  To: devik; +Cc: linux-kernel, piggin

devik wrote:
>>only CPU#0 use (I want to use affinity to select CPU#1).
>>The OOPS triggers every time when I use isolcpus.
>>
>>I traced the problem down into sched.c:1928 (find_busiest_group)
>>where group->cpu_power was zero (thus division by zero occured).
>>In call trace it goes swapper->schedule()->........->find_busiest_group.
>>Important registers there: eax=ecx=edx=0, ebx!=0.
> 
> 
> Well, I have more info. I setup bochs smp emulator and hacked
> printk to output into e9 port which is then directed to a file.
> Also I turned sched_domains debugging. From the result (below)
> is clear that there is bug in isolated domains setup.

You are correct, of course.  If "isolcpus" is used, the isolated
cpu(s) (in <cpu_isolated_map>) are not init like the remaining
cpus are.

I don't know what's intended here... but it's not the divide by 0.

> devik
> 
> <6>BIOS-provided physical RAM map:
>  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>  BIOS-e820: 0000000000100000 - 0000000002000000 (usable)
> <5>32MB LOWMEM available.
> <6>found SMP MP-table at 000fd0f0
> <7>On node 0 totalpages: 8192
> <7>  DMA zone: 4096 pages, LIFO batch:1
> <7>  Normal zone: 4096 pages, LIFO batch:1
> <7>  HighMem zone: 0 pages, LIFO batch:1
> <6>DMI not present.
> <3>ACPI: Unable to locate RSDP
> <6>Intel MultiProcessor Specification v1.4
> <6>    Virtual Wire compatibility mode.
> <6>OEM ID: BOCHSCPU Product ID: 0.1          APIC at: 0xFEE00000
> Processor #0 6:0 APIC version 17
> Processor #1 6:0 APIC version 17
> <6>I/O APIC #2 Version 17 at 0xFEC00000.
> Enabling APIC mode:  Flat.  Using 1 I/O APICs
> <6>Processors: 2
> Built 1 zonelists
> Kernel command line: BOOT_IMAGE=linux ro root=301 apic=debug noapic isolcpus=1
> mapped APIC to ffffd000 (fee00000)
> mapped IOAPIC to ffffc000 (fec00000)
> <6>Initializing CPU#0
> CPU 0 irqstacks, hard=c035b000 soft=c0359000
> PID hash table entries: 256 (order: 8, 4096 bytes)
> Detected 2.001 MHz processor.
> <6>Using tsc for high-res timesource
> Console: colour VGA+ 80x50
> Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
> Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
> <6>Memory: 29372k/32768k available (1511k kernel code, 2960k reserved, 698k data, 168k init, 0k highmem)
> Checking if this processor honours the WP bit even in supervisor mode... Ok.
> <7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096)
> Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
> <7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000
> <7>CPU: After vendor identify, caps:  0180a379 00000000 00000000 00000000
> <7>CPU: After all inits, caps:        0180a379 00000000 00000000 00000040
> <6>Enabling fast FPU save and restore... done.
> <6>Checking 'hlt' instruction... OK.
> CPU0: Intel Pentium III (Coppermine) stepping 03
> per-CPU timeslice cutoff: 81.50 usecs.
> task migration cache decay timeout: 1 msecs.
> Getting VERSION: 170011
> Getting VERSION: 170011
> Getting ID: 0
> Getting LVT0: 0
> Getting LVT1: 0
> enabled ExtINT on CPU#0
> Booting processor 1/1 eip 2000
> CPU 1 irqstacks, hard=c035c000 soft=c035a000
> <6>Initializing CPU#1
> masked ExtINT on CPU#1
> <7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096)
> <7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000
> <7>CPU: After vendor identify, caps:  0180a379 00000000 00000000 00000000
> <7>CPU: After all inits, caps:        0180a379 00000000 00000000 00000040
> CPU1: Intel Pentium III (Coppermine) stepping 03
> <6>Total of 2 processors activated (16.38 BogoMIPS).
> Using local APIC timer interrupts.
> calibrating APIC timer ...
> ..... CPU clock speed is 1.0999 MHz.
> ..... host bus clock speed is 1.0999 MHz.
> <6>checking TSC synchronization across 2 CPUs:
> <6>CPU#0 had 0 usecs TSC skew, fixed it up.
> <6>CPU#1 had 0 usecs TSC skew, fixed it up.
> Brought up 2 CPUs
> <6>Setting up cpu 1 isolated.
> <7>CPU0:  online
> <7> domain 0: span 3
> <7>  groups: 1 2
> <7>CPU1:  online
> <7> domain 0: span 2
> <7>ERROR domain->cpu_power not set
> <7>  groups: 2
> <1>divide error: 0000 [#1]
> SMP
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c0116fd3>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.9imq)
> EIP is at find_busiest_group+0x2b3/0x310
> eax: 00000000   ebx: c10b2e74   ecx: 00000000   edx: 00000000
> esi: c0360ee8   edi: c0351000   ebp: c10b2e84   esp: c10b2e38
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 1, threadinfo=c10b2000 task=c10b15a0)
> Stack: c10b2e74 00000002 00000002 00004441 08bca3a6 c0351000 00000000 00000001
>        00000080 00000080 00000080 00000000 00000000 c0360edc 00000000 00000002
>        00000040 c1044940 00000001 c10b2eb8 c0117125 c1044940 00000000 c10b2ea8
> Call Trace:
>  [<c01071ff>] show_stack+0x7f/0xa0
>  [<c01073ae>] show_registers+0x15e/0x1c0
>  [<c01075c4>] die+0xf4/0x180
>  [<c010775b>] do_divide_error+0x10b/0x130
>  [<c0106ded>] error_code+0x2d/0x38
>  [<c0117125>] load_balance+0x35/0x1a0
>  [<c01175fa>] rebalance_tick+0xba/0xd0
>  [<c0117732>] scheduler_tick+0x122/0x480
>  [<c0124b85>] update_process_times+0x45/0x50
>  [<c0111928>] smp_apic_timer_interrupt+0xf8/0x100
>  [<c0106d52>] apic_timer_interrupt+0x1a/0x20
>  [<c032d03a>] unpack_to_rootfs+0x17a/0x200
>  [<c032d0eb>] populate_rootfs+0x2b/0x120
>  [<c010058a>] init+0x8a/0x1e0
>  [<c0104565>] kernel_thread_helper+0x5/0x10
> Code: 00 0f 4d c2 83 f8 01 89 c1 7e ad 8b 4d d0 85 c9 0f 84 fe fd ff ff 8b 45 e0 01 45 dc 89 c2 8b 4e 08 01 4d d4 c1 e2 07 89 d0 31 d2 <f7> f1 8b 55 cc 85 d2 89 45 e0 75 1c 8b 45 e4 39 45 e0 76 09 89


-- 
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched isolcpus=1 related OOPS in 2.6.9
  2004-12-03 17:18   ` Randy.Dunlap
@ 2004-12-03 17:46     ` devik
  2004-12-03 18:15     ` [PATCH] " devik
  1 sibling, 0 replies; 7+ messages in thread
From: devik @ 2004-12-03 17:46 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel, piggin

> > Well, I have more info. I setup bochs smp emulator and hacked
> > printk to output into e9 port which is then directed to a file.
> > Also I turned sched_domains debugging. From the result (below)
> > is clear that there is bug in isolated domains setup.
>
> You are correct, of course.  If "isolcpus" is used, the isolated
> cpu(s) (in <cpu_isolated_map>) are not init like the remaining
> cpus are.

The problem seems worse to me. See:
Brought up 2 CPUs
<7>CPU0:  online
<7> domain 0: span 3
<7>  groups: 1[128] 2[0]
<7>ERROR groups don't span domain->span
<7>CPU1:  online
<7> domain 0: span 2
<7>ERROR domain->cpu_power not set
<7>  groups: 2[0]
<1>divide error: 0000 [#1]

I added code which dumps cpu_power for each group (in brackets)
and it seems that only for the first group the power is computed
(even for regular non isolated domain).
Also the span for cpu0 should be 1 (it can be fixed by:
  sd->span = cpu_possible_map; => sd->span = cpu_default_map;
at line 4484).
Even then the group list is still bad. I'll dig more into it...

devik


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched isolcpus=1 related OOPS in 2.6.9
  2004-12-03 17:18   ` Randy.Dunlap
  2004-12-03 17:46     ` devik
@ 2004-12-03 18:15     ` devik
  2004-12-03 19:47       ` Dimitri Sivanich
  1 sibling, 1 reply; 7+ messages in thread
From: devik @ 2004-12-03 18:15 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel, piggin

[-- Attachment #1: Type: TEXT/PLAIN, Size: 483 bytes --]

> You are correct, of course.  If "isolcpus" is used, the isolated
> cpu(s) (in <cpu_isolated_map>) are not init like the remaining
> cpus are.
>
> I don't know what's intended here... but it's not the divide by 0.

A patch is attached which fixes problems with isolated
domains for me. I hope it is correct :-) I will try on
real SMP when I will be in work (now it boots on Boochs).

enjoy,

    Martin Devera aka devik
Linux kernel QoS/HTB maintainer
  http://luxik.cdi.cz/~devik/

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1224 bytes --]

--- linux-2.6.9/kernel/sched.c	Mon Oct 18 23:54:55 2004
+++ kernel/sched.c	Fri Dec  3 19:06:04 2004
@@ -4480,7 +4480,7 @@
 #ifdef CONFIG_NUMA
 		sd->span = nodemask;
 #else
-		sd->span = cpu_possible_map;
+		sd->span = cpu_default_map;
 #endif
 		sd->parent = p;
 		sd->groups = &sched_group_phys[group];
@@ -4512,11 +4512,14 @@
 
 	/* Set up isolated groups */
 	for_each_cpu_mask(i, cpu_isolated_map) {
+		int group;
 		cpumask_t mask;
 		cpus_clear(mask);
 		cpu_set(i, mask);
 		init_sched_build_groups(sched_group_isolated, mask,
 						&cpu_to_isolated_group);
+		group = cpu_to_isolated_group(i);
+		sched_group_isolated[group].cpu_power = SCHED_LOAD_SCALE;
 	}
 
 #ifdef CONFIG_NUMA
@@ -4532,7 +4535,7 @@
 						&cpu_to_phys_group);
 	}
 #else
-	init_sched_build_groups(sched_group_phys, cpu_possible_map,
+	init_sched_build_groups(sched_group_phys, cpu_default_map,
 							&cpu_to_phys_group);
 #endif
 
@@ -4634,7 +4637,7 @@
 				cpus_or(groupmask, groupmask, group->cpumask);
 
 				cpumask_scnprintf(str, NR_CPUS, group->cpumask);
-				printk(" %s", str);
+				printk(" %s[%ld]", str, group->cpu_power);
 
 				group = group->next;
 			} while (group != sd->groups);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched isolcpus=1 related OOPS in 2.6.9
  2004-12-03 18:15     ` [PATCH] " devik
@ 2004-12-03 19:47       ` Dimitri Sivanich
  2004-12-03 20:21         ` devik
  0 siblings, 1 reply; 7+ messages in thread
From: Dimitri Sivanich @ 2004-12-03 19:47 UTC (permalink / raw)
  To: devik; +Cc: Randy.Dunlap, linux-kernel, piggin

On Fri, Dec 03, 2004 at 07:15:58PM +0100, devik wrote:
> A patch is attached which fixes problems with isolated
> domains for me. I hope it is correct :-) I will try on
Martin,

After a quick look, this patch looks OK (although I haven't had a chance to
test it yet).  I don't know what what was intended with a default cpu_power
of 0, but I don't believe that a value of SCHED_LOAD_SCALE should negatively
affect the isolated domains (versus any other value).

Note that sched_init() does use SCHED_LOAD_SCALE as a default.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched isolcpus=1 related OOPS in 2.6.9
  2004-12-03 19:47       ` Dimitri Sivanich
@ 2004-12-03 20:21         ` devik
  0 siblings, 0 replies; 7+ messages in thread
From: devik @ 2004-12-03 20:21 UTC (permalink / raw)
  To: Dimitri Sivanich; +Cc: Randy.Dunlap, linux-kernel, piggin

> After a quick look, this patch looks OK (although I haven't had a chance to
> test it yet).  I don't know what what was intended with a default cpu_power
> of 0, but I don't believe that a value of SCHED_LOAD_SCALE should negatively
> affect the isolated domains (versus any other value).

Hello Dimitri,

thanks for your check. As I understand the code (it took me
5 hours eh eh) only relative sizes of cpu_power within one
domain matter. Thus in isolated domain one can use any nonzero
value. Also SCHED_LOAD_SCALE is probably ok in principle because
the value means "power of one typical CPU" AFAIK.

I'm not sure who is official maintainer of the scheduler and
whether he will see/integrate the patch ...

devik


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-12-03 20:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-02 15:42 sched isolcpus=1 related OOPS in 2.6.9 devik
2004-12-03 16:28 ` devik
2004-12-03 17:18   ` Randy.Dunlap
2004-12-03 17:46     ` devik
2004-12-03 18:15     ` [PATCH] " devik
2004-12-03 19:47       ` Dimitri Sivanich
2004-12-03 20:21         ` devik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).