linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops
@ 2003-09-09 20:31 Mikael Pettersson
  2003-09-10 10:26 ` Maciej W. Rozycki
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Pettersson @ 2003-09-09 20:31 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: linux-kernel, mingo

On Mon, 08 Sep 2003 19:22:17 -0400, Mathieu Desnoyers wrote:
>> >On kernel 2.4.21-pre2, there is a kernel oops before this, with a
>> >"Dereferencing NULL pointer".
>> 
>> You didn't run that through ksymoops and post it, so how is anyone
>> supposed to be able to debug it?
>
>As only 2.4.21-pre2 and 2.4.21-pre3 kernels show this problem, I thought
>it has been corrected in 2.4.21-pre4. But, as it can be very useful in
>finding the problem, here are the ksymoops for 2.4.21-pre2 and
>2.4.21-pre3 kernels, quite similar though.
...
>Code;  c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>
>00000000 <_EIP>:
>Code;  c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>   <=====
>   0:   83 3c 90 ff               cmpl   $0xffffffff,(%eax,%edx,4)   <=====

Ok, that one is line 295 in io_apic.c. It bombs in 2.4.21-pre{2,3}
because mp_bus_id_to_pci_bus was changed from a static array to
a dynamically allocated array. On your machine, smp_read_mpc() in
mpparse.c doesn't get to the point where it allocates that array,
so the array is NULL in io_apic.c and you get an oops.

Fixing the oops is easy (see below), but the real problem is
that 2.4.21-pre2 apparently broke MP table parsing on your HW.
I suggest you sprinkle tracing printk()s in setup/smpboot/mpparse
and compare 2.4.20 (good) and later (bad) to see where things
start to diverge.

/Mikael

--- linux-2.4.21-pre2/arch/i386/kernel/io_apic.c.~1~	2003-09-09 21:27:39.000000000 +0200
+++ linux-2.4.21-pre2/arch/i386/kernel/io_apic.c	2003-09-09 22:17:02.464082064 +0200
@@ -292,7 +292,7 @@
 
 	Dprintk("querying PCI -> IRQ mapping bus:%d, slot:%d, pin:%d.\n",
 		bus, slot, pin);
-	if (mp_bus_id_to_pci_bus[bus] == -1) {
+	if ((mp_bus_id_to_pci_bus==NULL) || mp_bus_id_to_pci_bus[bus] == -1) {
 		printk(KERN_WARNING "PCI BIOS passed nonexistent PCI bus %d!\n", bus);
 		return -1;
 	}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops
  2003-09-09 20:31 PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops Mikael Pettersson
@ 2003-09-10 10:26 ` Maciej W. Rozycki
  2003-09-10 16:18   ` Mikael Pettersson
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2003-09-10 10:26 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: mathieu.desnoyers, linux-kernel, mingo

On Tue, 9 Sep 2003, Mikael Pettersson wrote:

> Ok, that one is line 295 in io_apic.c. It bombs in 2.4.21-pre{2,3}
> because mp_bus_id_to_pci_bus was changed from a static array to
> a dynamically allocated array. On your machine, smp_read_mpc() in
> mpparse.c doesn't get to the point where it allocates that array,
> so the array is NULL in io_apic.c and you get an oops.

 As I have already written, the system uses a default MP configuration.
smp_read_mpc() isn't called at all.  construct_default_ISA_mptable() is
used instead.

> Fixing the oops is easy (see below), but the real problem is
> that 2.4.21-pre2 apparently broke MP table parsing on your HW.
> I suggest you sprinkle tracing printk()s in setup/smpboot/mpparse
> and compare 2.4.20 (good) and later (bad) to see where things
> start to diverge.

 There is no need to -- the problem is already known.  Mikael, if you need
additional details on how default MP configurations work in our code, feel
free to ask.  Unfortunately, I won't likely be able to do any coding
and/or testing in this area before October.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops
  2003-09-10 10:26 ` Maciej W. Rozycki
@ 2003-09-10 16:18   ` Mikael Pettersson
  2003-09-10 16:58     ` Maciej W. Rozycki
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Pettersson @ 2003-09-10 16:18 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: mathieu.desnoyers, linux-kernel, mingo

Maciej W. Rozycki writes:
 > > Fixing the oops is easy (see below), but the real problem is
 > > that 2.4.21-pre2 apparently broke MP table parsing on your HW.
 > > I suggest you sprinkle tracing printk()s in setup/smpboot/mpparse
 > > and compare 2.4.20 (good) and later (bad) to see where things
 > > start to diverge.
 > 
 >  There is no need to -- the problem is already known.  Mikael, if you need
 > additional details on how default MP configurations work in our code, feel

I think I nailed it.

First I found one very strange thing in Mathieu's boot log:

--- mpbug-2.4.20	Wed Sep 10 17:19:05 2003
+++ mpbug-2.4.23-pre3	Wed Sep 10 17:18:44 2003
...
+DMI not present.
 Intel MultiProcessor Specification v1.1
 Virtual Wire compatibility mode.
 Default MP configuration #6

This means construct_default_ISA_mptable() still gets called.
Ok so far.

...
 ENABLING IO-APIC IRQs
 Setting 2 in the phys_id_present_map
 ...changing IO-APIC physical APIC ID to 2 ... ok.

smp_found_config is true, we're now in setup_IO_APIC()
and have completed setup_ioapic_ids_from_mpc(). Ok so far.

-init IO_APIC IRQs
-IO-APIC (apicid-pin) 2-0 not connected.

THIS IS BAD. setup_IO_APIC() calls setup_IO_APIC_IRQs(),
which starts by printk()ing the first line above.
This line is missing from the 2.4.23-pre3 dmesg log, which
seems like an impossibility.

At this point I was thinking "memory corruption",
and the following struck me:

What used to be arrays (mp_irqs[] etc) are now pointers to
memory which is sized and allocated by smp_read_mpc().
In the case when construct_default_ISA_mptable() is called,
smp_read_mpc() is _not_ called, the pointers never get initialised,
and reads and writes of these arrays end up in la-la land.

The fix would be to add allocation and initialisation of
these pointers at the start of construct_default_ISA_mptable().

I'll prepare a patch doing this sometime tomorrow.

/Mikael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops
  2003-09-10 16:18   ` Mikael Pettersson
@ 2003-09-10 16:58     ` Maciej W. Rozycki
  0 siblings, 0 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2003-09-10 16:58 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: mathieu.desnoyers, linux-kernel, mingo

On Wed, 10 Sep 2003, Mikael Pettersson wrote:

> First I found one very strange thing in Mathieu's boot log:
> 
> --- mpbug-2.4.20	Wed Sep 10 17:19:05 2003
> +++ mpbug-2.4.23-pre3	Wed Sep 10 17:18:44 2003
> ...
> +DMI not present.
>  Intel MultiProcessor Specification v1.1
>  Virtual Wire compatibility mode.
>  Default MP configuration #6
> 
> This means construct_default_ISA_mptable() still gets called.
> Ok so far.

 Yep -- I've been aware of this.

> At this point I was thinking "memory corruption",
> and the following struck me:
> 
> What used to be arrays (mp_irqs[] etc) are now pointers to
> memory which is sized and allocated by smp_read_mpc().
> In the case when construct_default_ISA_mptable() is called,
> smp_read_mpc() is _not_ called, the pointers never get initialised,
> and reads and writes of these arrays end up in la-la land.

 Exactly.

> The fix would be to add allocation and initialisation of
> these pointers at the start of construct_default_ISA_mptable().

 Possibly -- I haven't thought on how to fix it yet.

> I'll prepare a patch doing this sometime tomorrow.

 Thanks a lot for taking care.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops
  2003-09-08  9:33 PROBLEM: APIC on a Pentium Classic SMP, kernel 2.4.21-pre5 to 2.4.23-pre3 Mikael Pettersson
@ 2003-09-08 23:22 ` Mathieu Desnoyers
  0 siblings, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2003-09-08 23:22 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-kernel, mathieu.desnoyers, mingo

> >On kernel 2.4.21-pre2, there is a kernel oops before this, with a
> >"Dereferencing NULL pointer".
> 
> You didn't run that through ksymoops and post it, so how is anyone
> supposed to be able to debug it?

As only 2.4.21-pre2 and 2.4.21-pre3 kernels show this problem, I thought
it has been corrected in 2.4.21-pre4. But, as it can be very useful in
finding the problem, here are the ksymoops for 2.4.21-pre2 and
2.4.21-pre3 kernels, quite similar though.


-------------------------------------------------------------------------------
2.4.21-pre2 ksymoops

Unable to handle kernel NULL pointer dereference at virtual address 00000000
c0115da7
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0115da7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: c1163400   ecx: 00000000   edx: 00000000
esi: 00000010   edi: c116bfbb   ebp: 0008e000   esp: c116bf90
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 1, stackpage=c116b000)
Stack: 00000010 ffffffff c1163400 00000010 c116bfbb 0008e000 c02fe4c9 00000000 
       00000001 00000000 0008e000 c116a000 c02f7fc8 c0105000 0008e000 c02fdef9 
       c030c7c6 c116a000 c02f87fb c0105078 00010f00 c02f7fc8 c0105000 0008e000 
Call Trace:    [<c0105000>] [<c0105078>] [<c0105000>] [<c0107406>] [<c0105050>]
Code: 83 3c 90 ff 0f 84 f1 00 00 00 a1 c0 80 34 c0 31 ff 89 04 24 


>>EIP; c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>   <=====

Trace; c0105000 <_stext+0/0>
Trace; c0105078 <init+28/180>
Trace; c0105000 <_stext+0/0>
Trace; c0107406 <kernel_thread+26/30>
Trace; c0105050 <init+0/180>

Code;  c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>
00000000 <_EIP>:
Code;  c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>   <=====
   0:   83 3c 90 ff               cmpl   $0xffffffff,(%eax,%edx,4)   <=====
Code;  c0115dab <IO_APIC_get_PCI_irq_vector+1b/130>
   4:   0f 84 f1 00 00 00         je     fb <_EIP+0xfb>
Code;  c0115db1 <IO_APIC_get_PCI_irq_vector+21/130>
   a:   a1 c0 80 34 c0            mov    0xc03480c0,%eax
Code;  c0115db6 <IO_APIC_get_PCI_irq_vector+26/130>
   f:   31 ff                     xor    %edi,%edi
Code;  c0115db8 <IO_APIC_get_PCI_irq_vector+28/130>
  11:   89 04 24                  mov    %eax,(%esp,1)

 <0>Kernel panic: Attempted to kill init!

-------------------------------------------------------------------------------
2.4.21-pre3 ksymoops

Unable to handle kernel NULL pointer dereference at virtual address 000000
c0115da7      
*pde = 00000000
Oops: 0000     
CPU:    0 
EIP:    0010:[<c0115da7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246                        
eax: 00000000   ebx: c1163400   ecx: 00000000   edx: 00000000
esi: 00000010   edi: c116bfbb   ebp: 0008e000   esp: c116bf90
ds: 0018   es: 0018   ss: 0018                               
Process swapper (pid: 1, stackpage=c116b000)
Stack: 00000010 ffffffff c1163400 00000010 c116bfbb 0008e000 c02fe549 000 
       00000001 00000000 0008e000 c116a000 c02f7fc8 c0105000 0008e000 c02 
       c030c846 c116a000 c02f87fb c0105078 00010f00 c02f7fc8 c0105000 000 
Call Trace:    [<c0105000>] [<c0105078>] [<c0105000>] [<c0107406>] [<c010]
Code: 83 3c 90 ff 0f 84 f1 00 00 00 a1 c0 80 34 c0 31 ff 89 04 24 


>>EIP; c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>   <=====

Trace; c0105000 <_stext+0/0>
Trace; c0105078 <init+28/180>
Trace; c0105000 <_stext+0/0>
Trace; c0107406 <kernel_thread+26/30>

Code;  c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>
00000000 <_EIP>:
Code;  c0115da7 <IO_APIC_get_PCI_irq_vector+17/130>   <=====
   0:   83 3c 90 ff               cmpl   $0xffffffff,(%eax,%edx,4)   <=====
Code;  c0115dab <IO_APIC_get_PCI_irq_vector+1b/130>
   4:   0f 84 f1 00 00 00         je     fb <_EIP+0xfb>
Code;  c0115db1 <IO_APIC_get_PCI_irq_vector+21/130>
   a:   a1 c0 80 34 c0            mov    0xc03480c0,%eax
Code;  c0115db6 <IO_APIC_get_PCI_irq_vector+26/130>
   f:   31 ff                     xor    %edi,%edi
Code;  c0115db8 <IO_APIC_get_PCI_irq_vector+28/130>
  11:   89 04 24                  mov    %eax,(%esp,1)

 <0>Kernel panic: Attempted to kill init!                         

-------------------------------------------------------------------------------


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-09-10 16:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-09 20:31 PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops Mikael Pettersson
2003-09-10 10:26 ` Maciej W. Rozycki
2003-09-10 16:18   ` Mikael Pettersson
2003-09-10 16:58     ` Maciej W. Rozycki
  -- strict thread matches above, loose matches on Subject: below --
2003-09-08  9:33 PROBLEM: APIC on a Pentium Classic SMP, kernel 2.4.21-pre5 to 2.4.23-pre3 Mikael Pettersson
2003-09-08 23:22 ` PROBLEM: APIC on a Pentium Classic SMP, 2.4.21-pre2 and 2.4.21-pre3 ksymoops Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).