linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Startup IPI (was: Re: test13-pre3)
@ 2000-12-19 18:49 Petr Vandrovec
  2000-12-19 20:36 ` ferret
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2000-12-19 18:49 UTC (permalink / raw)
  To: ferret; +Cc: Maciej W. Rozycki, Kernel Mailing List, mingo

On 18 Dec 00 at 21:59, ferret@phonewave.net wrote:
> 
> Pardon me for not fully groking the issues here and possibly coming to a
> wrong conclusion, but this has to do with SMP systems crashing at APIC
> init time, just before penguin display (with fbcon at least)? If so, I
> have a board that does this with certain cache settings made in the BIOS.
> It's a 430HX chipset with two Pentium MMX 200s installed, *ancient* BIOS.
 
I'm using BIOS dated 19/07/2000, last week it was latest BIOS on Gigabyte
site for 6VXD7 (two PIII/800). I did not looked for updates today yet.

I tried to change C2P Concurrency & Master (en/dis), AGP Mode (1x/2x/4x),
Power mgmt - Display Activity (monitor/ignore), PNP OS (yes/no)
(24 combinations total), but any combination dies if there are read
accesses to videoram during startup. Today I finally digged out some 
old ISA VGA (Realtek), plugged it in and - it dies too. So it does not 
depend on bus type.
                                                Best regards,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz
                                             
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19 18:49 Startup IPI (was: Re: test13-pre3) Petr Vandrovec
@ 2000-12-19 20:36 ` ferret
  2000-12-20  3:37   ` ferret
  0 siblings, 1 reply; 15+ messages in thread
From: ferret @ 2000-12-19 20:36 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Maciej W. Rozycki, Kernel Mailing List, mingo



On Tue, 19 Dec 2000, Petr Vandrovec wrote:

> On 18 Dec 00 at 21:59, ferret@phonewave.net wrote:
> > 
> > Pardon me for not fully groking the issues here and possibly coming to a
> > wrong conclusion, but this has to do with SMP systems crashing at APIC
> > init time, just before penguin display (with fbcon at least)? If so, I
> > have a board that does this with certain cache settings made in the BIOS.
> > It's a 430HX chipset with two Pentium MMX 200s installed, *ancient* BIOS.
>  
> I'm using BIOS dated 19/07/2000, last week it was latest BIOS on Gigabyte
> site for 6VXD7 (two PIII/800). I did not looked for updates today yet.
> 
> I tried to change C2P Concurrency & Master (en/dis), AGP Mode (1x/2x/4x),
> Power mgmt - Display Activity (monitor/ignore), PNP OS (yes/no)
> (24 combinations total), but any combination dies if there are read
> accesses to videoram during startup. Today I finally digged out some 
> old ISA VGA (Realtek), plugged it in and - it dies too. So it does not 
> depend on bus type.

Okay. Mine, as far as I can tell, only depends on the L2 cache being set
to '64MB' instead of '512MB' in the field 'L2 Cache Cacheable Size' under
'Chipset Features Setup' on my BIOS. This is unfortunately the latest BIOS
for this motherboard available. It's a TD5TH version 1.1

Hmmmm. Have you tried booting with an hercmono (if you can get your paws
on one, that is)?.


Right after 'Freeing unused kernel memory...'
I get a kernel BUG at buffer.c:821 with this setting at 256MB, -test12
without fbcon. With fbcon it would appear to switch video mode and
freeze with a black screen with cursor at the bottom, at that point.

And then I get an oops dump in the swapper task. I'll try decoding it in a
little while, since I'll have to manually input it.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19 20:36 ` ferret
@ 2000-12-20  3:37   ` ferret
  0 siblings, 0 replies; 15+ messages in thread
From: ferret @ 2000-12-20  3:37 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Maciej W. Rozycki, Kernel Mailing List, mingo



On Tue, 19 Dec 2000 ferret@phonewave.net wrote:
[snip of Petr's system info]

> Okay. Mine, as far as I can tell, only depends on the L2 cache being set
> to '64MB' instead of '512MB' in the field 'L2 Cache Cacheable Size' under
> 'Chipset Features Setup' on my BIOS. This is unfortunately the latest BIOS
> for this motherboard available. It's a TD5TH version 1.1
> 
> Hmmmm. Have you tried booting with an hercmono (if you can get your paws
> on one, that is)?.
> 
> 
> Right after 'Freeing unused kernel memory...'
> I get a kernel BUG at buffer.c:821 with this setting at 256MB, -test12
> without fbcon. With fbcon it would appear to switch video mode and
> freeze with a black screen with cursor at the bottom, at that point.
> 
> And then I get an oops dump in the swapper task. I'll try decoding it in a
> little while, since I'll have to manually input it.

Here we go: I DID have to copy it onto paper and type it in after
rebooting.


>>EIP; c01354a6 <end_buffer_io_async+ea/12c>   <=====
Trace; c0217d72 <tvecs+3a1e/b81c>
Trace; c02180da <tvecs+3d86/b81c>
Trace; c0186e85 <end_that_request_first+65/c0>
Trace; c019e238 <ide_end_request+34/88>
Trace; c01a22b7 <read_intr+e7/120>
Trace; c019fb0e <ide_intr+12a/194>
Trace; c01a21d0 <read_intr+0/120>
Trace; c010c2d1 <handle_IRQ_event+59/84>
Trace; c010c4b8 <do_IRQ+a8/fc>
Trace; c0108d40 <default_idle+0/34>
Trace; c010ac00 <ret_from_intr+0/20>
Trace; c0108d40 <default_idle+0/34>
Trace; c0108dbc <cpu_idle+28/54>
Trace; c0108dd2 <cpu_idle+3e/54>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c01001cf <L6+0/2>
Code;  c01354a6 <end_buffer_io_async+ea/12c>
0000000000000000 <_EIP>:
Code;  c01354a6 <end_buffer_io_async+ea/12c>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c01354a8 <end_buffer_io_async+ec/12c>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c01354ab <end_buffer_io_async+ef/12c>
   5:   90                        nop
Code;  c01354ac <end_buffer_io_async+f0/12c>
   6:   8d 74 26 00               lea    0x0(%esi,1),%esi
Code;  c01354b0 <end_buffer_io_async+f4/12c>
   a:   8d 5e 28                  lea    0x28(%esi),%ebx
Code;  c01354b3 <end_buffer_io_async+f7/12c>
   d:   8d 46 2c                  lea    0x2c(%esi),%eax
Code;  c01354b6 <end_buffer_io_async+fa/12c>
  10:   39 46 2c                  cmp    %eax,0x2c(%esi)
Code;  c01354b9 <end_buffer_io_async+fd/12c>
  13:   74 00                     je     15 <_EIP+0x15> c01354bb
<end_buffer_io_async+ff/12c>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-20 20:29 Petr Vandrovec
@ 2000-12-21 11:54 ` Maciej W. Rozycki
  0 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2000-12-21 11:54 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Kernel Mailing List, mingo

On Wed, 20 Dec 2000, Petr Vandrovec wrote:

> /usr/bin/time says that program runs for 3.40 - 3.56secs, so after dividing

 Well, the test looks reasonable if the system load is low.  Still the
performance is surprisingly low -- after changing the transfer width to 16
bits I ran the test on my dual P5MMX system equipped with an old ISA VGA
card and I achieved 10.74ms for VGA RAM accesses and 586.6us for uncached
main memory accesses. 

> by 1000 I get 3.4ms... Maybe I should complain to VIA or to Matrox that
> it is piece of crap ?

 For VIA -- definitely.  I don't think Matrox is at fault, though. 

> My order was simple: no rambus memory, dual PIII at least on 800MHz
> and UDMA66. Yes, maybe I should buy ServerWorks instead of VIA, but 
> I hoped...

 At least ServerWorks claims they are willing to cooperate with us
although results seem to be questionable so far... 

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
@ 2000-12-20 20:29 Petr Vandrovec
  2000-12-21 11:54 ` Maciej W. Rozycki
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2000-12-20 20:29 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kernel Mailing List, mingo

On 20 Dec 00 at 19:52, Maciej W. Rozycki wrote:
> > it kills machine; only problem is that 0x1300 wr-rd cycles to VGA apperture
> > take 3.48ms, and this does not correspond with needed 200us udelay.
> 
>  Hmm, how do you calculate the time?  Assuming AGP4x runs at 133MHz and a
> read or write cycle lasts for a single clock tick (I don't know exact AGP
> specs -- please correct me if I'm wrong), I find 0x1300 cycles to finish
> in about 73usecs.  The loop execution overhead may double the result and
> it will still fit within 300usecs. 

It is easy:
  int mfd;
  volatile unsigned long* memory;
  int i;
  
  mfd = open("/dev/mem", O_RDWR);
  memory = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, mfd, 0x000B8000);
  close(mfd); 
  for (i = 0; i < 0x1300 * 1000; i++) {
    *memory = i;
    *memory;
  }
  munmap(memory, 4096);

/usr/bin/time says that program runs for 3.40 - 3.56secs, so after dividing
by 1000 I get 3.4ms... Maybe I should complain to VIA or to Matrox that
it is piece of crap ?
  
> > Without VIA datasheet I cannot try to disable some PCI features to find
> > which one is culprit, so I'm sorry.
> 
>  But you may complain to the manufacturer and/or change hardware.  I'm
> still uncertain the delay should stay in...

My order was simple: no rambus memory, dual PIII at least on 800MHz
and UDMA66. Yes, maybe I should buy ServerWorks instead of VIA, but 
I hoped...
                                                    Best regards,
                                                        Petr Vandrovec
                                                        vandrove@vc.cvut.cz
                                                        
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19 21:18 Petr Vandrovec
@ 2000-12-20 18:52 ` Maciej W. Rozycki
  0 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2000-12-20 18:52 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Kernel Mailing List, mingo

On Tue, 19 Dec 2000, Petr Vandrovec wrote:

> I did... So it uses 'xchg %eax,APIC_ICR' instead of 'movl %eax,APIC_ICR',
> yes (as verified in generated code...)? No change, still dies, as expected
> (do not forget that before it dies, it can do ~0x1300 write-read cycles

 I've forgotten indeed...

> from videomemory (AGP4x), so secondary CPU just does some thinking before

 This might be the time needed to deliver the IPI.  Remember that the
inter-APIC bus is serial and not that fast.

> it kills machine; only problem is that 0x1300 wr-rd cycles to VGA apperture
> take 3.48ms, and this does not correspond with needed 200us udelay.

 Hmm, how do you calculate the time?  Assuming AGP4x runs at 133MHz and a
read or write cycle lasts for a single clock tick (I don't know exact AGP
specs -- please correct me if I'm wrong), I find 0x1300 cycles to finish
in about 73usecs.  The loop execution overhead may double the result and
it will still fit within 300usecs. 

> Maybe chipset decides to do something when second CPU cannot obtain
> bus access in 100000 pci cycles?).

 I guess a certain initial cycle from the AP confuses the chipset somehow.

> Do you (or anyone else) have code which can dump MTRR registers of each
> of CPU before mtrr driver takes over them? At least first CPU does not have
> any problem...

 A brief look at arch/i386/kernel/mtrr.c reveals the bootstrap CPU's
settings do not get changed.  As a result they may always be fetched from
the /proc filesystem.  For APs you probably need to tweak sources.

> I even placed 'wbinvd' and 'wbinvd; cpuid' before sending startup IPI,
> but it does not matter. Secondary CPU just does not finish even first
> instruction when first CPU reads from videoram again and again.

 Well, the CPU obeys the writeback and the invalidation, but does the
chipset?

> Without VIA datasheet I cannot try to disable some PCI features to find
> which one is culprit, so I'm sorry.

 But you may complain to the manufacturer and/or change hardware.  I'm
still uncertain the delay should stay in...

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
@ 2000-12-19 21:18 Petr Vandrovec
  2000-12-20 18:52 ` Maciej W. Rozycki
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2000-12-19 21:18 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kernel Mailing List, mingo

On 19 Dec 00 at 19:30, Maciej W. Rozycki wrote:
> > When I replaced address with 0xC01B8000 (some cachable memory), it worked
> > fine. When replaced with 0xC00C8000 (supposedly unused address, but maybe
> > it is just set as cacheable in chipset), it works too.
> 
>  Hmm, a read from an uncached location could result in sending delayed
> APIC writes to the bus in case of an incorrect MTRR setting for the APIC
> space.  Could you please disable CONFIG_X86_GOOD_APIC?  This will result
> in using locked cycles for APIC writes, i.e. immediate bus accesses.

I did... So it uses 'xchg %eax,APIC_ICR' instead of 'movl %eax,APIC_ICR',
yes (as verified in generated code...)? No change, still dies, as expected
(do not forget that before it dies, it can do ~0x1300 write-read cycles
from videomemory (AGP4x), so secondary CPU just does some thinking before 
it kills machine; only problem is that 0x1300 wr-rd cycles to VGA apperture
take 3.48ms, and this does not correspond with needed 200us udelay.
Maybe chipset decides to do something when second CPU cannot obtain
bus access in 100000 pci cycles?).
 
>  Please also check MTRR settings, especially for the APIC range.  They
> might need fixing. 

Do you (or anyone else) have code which can dump MTRR registers of each
of CPU before mtrr driver takes over them? At least first CPU does not have
any problem...
 
> > at the beginning of trampoline.S, and then boot with 'no-scroll', but
> > character in upper left corner did not change, so secondary CPU probably
> > even did not start code fetches. That's all I can say until
> > I put non-AGP card into the box (but I need AGP, so it is not real
> > option).
> 
>  An easier way to check an application processor is alive could be
> enabling the speaker -- after setting it up by the bootstrap CPU it only
> takes three instructions to set bits 0 and 1 of port 0x61 and the result
> is not volatile.  A LED diagnostic display would be better, but typical
> PCs don't have one, unfortunately.

Fortunately secondary CPU starts with AL & 3 == 0, so it is just
one 'outb %al,$0x61' instruction. When first CPU reads memory in loop,
it beeps and beeps and beeps. If first CPU does 'udelay(300);', it
works fine (I put mdelay(100) after enabling speaker, so I hear short
1000Hz beep during boot). So secondary CPU does not correctly execute
even first instruction. But it either locks bus forever (looks like that
because of ATX poweroff button does not work anymore), or confuses first
CPU so much that it also cannot continue...
 
> > Yeah. Just do not read video memory when another CPU starts. I'll try
> > disabling cache on both CPUs, maybe it will make some difference, as
> > secondary CPU should start with caches disabled. But maybe that it is 
> > just broken AGP bus, and nothing else. But until I find what's really
> > broken on my hardware, I'd like to leave 'udelay(300)' in.
> 
>  If the problem is with write combining then disabling the cache won't
> help, I'm afraid.

Read loop reads one short from one constant address, so any write* should
not make any problem.
 
> > instead of string as soon as second CPU started (no, it did not race due 
> > to missing console_lock; before first printk() secondary CPU should fill 
> > whole screen with letter '2'. It did not).
                      ^ digit. I'm sorry ;-)
> 
>  I would still verify (i.e. with the speaker) that's really the second CPU
> causing the corruption. 

I even placed 'wbinvd' and 'wbinvd; cpuid' before sending startup IPI,
but it does not matter. Secondary CPU just does not finish even first
instruction when first CPU reads from videoram again and again.

Without VIA datasheet I cannot try to disable some PCI features to find
which one is culprit, so I'm sorry.
                                            Best regards,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz
                                                    
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19  0:33 Petr Vandrovec
  2000-12-18 23:51 ` Alan Cox
  2000-12-19  5:59 ` ferret
@ 2000-12-19 18:30 ` Maciej W. Rozycki
  2 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2000-12-19 18:30 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Kernel Mailing List, mingo

On Tue, 19 Dec 2000, Petr Vandrovec wrote:

> Uh. It took couple of hours to find it. Just place 
> 
> { int i; volatile unsigned short* p = 0xC00B8000; for (i = 0; i < 6553600;
>    i++) { *p; } }                                            (**)
> 
> instead of udelay(300) and this loop does not finish. Same for 
> unsigned long* p. inb/outb(0x3C0) are ok. Writes are OK too. Only 
> simple fetches from videoram kills it.
> 
> When I replaced address with 0xC01B8000 (some cachable memory), it worked
> fine. When replaced with 0xC00C8000 (supposedly unused address, but maybe
> it is just set as cacheable in chipset), it works too.

 Hmm, a read from an uncached location could result in sending delayed
APIC writes to the bus in case of an incorrect MTRR setting for the APIC
space.  Could you please disable CONFIG_X86_GOOD_APIC?  This will result
in using locked cycles for APIC writes, i.e. immediate bus accesses.

 Please also check MTRR settings, especially for the APIC range.  They
might need fixing. 

> at the beginning of trampoline.S, and then boot with 'no-scroll', but
> character in upper left corner did not change, so secondary CPU probably
> even did not start code fetches. That's all I can say until
> I put non-AGP card into the box (but I need AGP, so it is not real
> option).

 An easier way to check an application processor is alive could be
enabling the speaker -- after setting it up by the bootstrap CPU it only
takes three instructions to set bits 0 and 1 of port 0x61 and the result
is not volatile.  A LED diagnostic display would be better, but typical
PCs don't have one, unfortunately.

> Yeah. Just do not read video memory when another CPU starts. I'll try
> disabling cache on both CPUs, maybe it will make some difference, as
> secondary CPU should start with caches disabled. But maybe that it is 
> just broken AGP bus, and nothing else. But until I find what's really
> broken on my hardware, I'd like to leave 'udelay(300)' in.

 If the problem is with write combining then disabling the cache won't
help, I'm afraid.

> (*) When I was calling directly 
> vt_console_print(NULL, "Message1\n", 9);
> vt_console_print(NULL, "Message2\n", 9);
> instead of printk, I got
> Message1
> Messag<0x..><0x..><0x00><0x80><0x..><0x80><0x..><0x80>...
> - wrong text with wrong length, so it probably started fetching garbage 
> instead of string as soon as second CPU started (no, it did not race due 
> to missing console_lock; before first printk() secondary CPU should fill 
> whole screen with letter '2'. It did not).

 I would still verify (i.e. with the speaker) that's really the second CPU
causing the corruption. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19 18:03 Petr Vandrovec
@ 2000-12-19 18:27 ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2000-12-19 18:27 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Alan Cox, Maciej W. Rozycki, Kernel Mailing List, mingo

> > In the case where it boots does it also report mismatched MTRRs ??
> 
> Yes, it complains. But BIOS correctly reports x1/x2 depending on
> number of CPUs I plug into motherboard, so I believe that it did
> some initialization before it start loading OS.

That may explain the hangs. Intel docs don't seem to guarantee what happens if 
the MTRRs don't match across CPU's.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
@ 2000-12-19 18:03 Petr Vandrovec
  2000-12-19 18:27 ` Alan Cox
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2000-12-19 18:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: Maciej W. Rozycki, Kernel Mailing List, mingo

On 18 Dec 00 at 23:51, Alan Cox wrote:

> > Yeah. Just do not read video memory when another CPU starts. I'll try
> > disabling cache on both CPUs, maybe it will make some difference, as
> > secondary CPU should start with caches disabled. But maybe that it is 
> > just broken AGP bus, and nothing else. But until I find what's really
> > broken on my hardware, I'd like to leave 'udelay(300)' in.
> 
> In the case where it boots does it also report mismatched MTRRs ??

Yes, it complains. But BIOS correctly reports x1/x2 depending on
number of CPUs I plug into motherboard, so I believe that it did
some initialization before it start loading OS.

calibrating APIC timer ...
..... CPU clock speed is 797.0452 MHz.
..... host bus clock speed is 99.6305 MHz.
cpu: 0, clocks: 996305, slice: 332101
CPU0<T0:996304,T1:664192,D:11,S:332101,C:996305>
cpu: 1, clocks: 996305, slice: 332101
CPU1<T0:996304,T1:332096,D:6,S:332101,C:996305>
checking TSC synchronization across CPUs: passed.
Setting commenced=1, go go go
mtrr: your CPUs had inconsistent variable MTRR settings
mtrr: probably your BIOS does not setup all CPUs

                                            Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19  0:33 Petr Vandrovec
  2000-12-18 23:51 ` Alan Cox
@ 2000-12-19  5:59 ` ferret
  2000-12-19 18:30 ` Maciej W. Rozycki
  2 siblings, 0 replies; 15+ messages in thread
From: ferret @ 2000-12-19  5:59 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Maciej W. Rozycki, Kernel Mailing List, mingo


Pardon me for not fully groking the issues here and possibly coming to a
wrong conclusion, but this has to do with SMP systems crashing at APIC
init time, just before penguin display (with fbcon at least)? If so, I
have a board that does this with certain cache settings made in the BIOS.
It's a 430HX chipset with two Pentium MMX 200s installed, *ancient* BIOS.

-- Ferret


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
@ 2000-12-19  0:33 Petr Vandrovec
  2000-12-18 23:51 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Petr Vandrovec @ 2000-12-19  0:33 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kernel Mailing List, mingo

On 18 Dec 00 at 19:44, Maciej W. Rozycki wrote:
> > No, I'll try. It occured with either AGP (Matrox G200/G400/G450) or
> > PCI (S3, CL5434) VGA adapter. I did not tried real ISA VGA...
> 
>  Oops, I've forgotten there exist non-ISA display adapters. ;-)  Just try
> if accessing one bus or another changes the behaviour. 

Uh. It took couple of hours to find it. Just place 

{ int i; volatile unsigned short* p = 0xC00B8000; for (i = 0; i < 6553600;
   i++) { *p; } }                                            (**)

instead of udelay(300) and this loop does not finish. Same for 
unsigned long* p. inb/outb(0x3C0) are ok. Writes are OK too. Only 
simple fetches from videoram kills it.

When I replaced address with 0xC01B8000 (some cachable memory), it worked
fine. When replaced with 0xC00C8000 (supposedly unused address, but maybe
it is just set as cacheable in chipset), it works too.

Symptoms of lockup are same as hangup in printk() without udelay(300), only 
problem is that 'vt_console_print' (*) does not do fetches from videoram, it 
does stores only...

Placing this loop before sending startup IPI, or just below udelay(300)
is OK (modulo that this loop takes so long that secondary CPU complains
about no callin received).

I even tried to add:

   mov $0xB800,%ax
   mov %ax,%ds
   movw %ax,0
   
at the beginning of trampoline.S, and then boot with 'no-scroll', but
character in upper left corner did not change, so secondary CPU probably
even did not start code fetches. That's all I can say until
I put non-AGP card into the box (but I need AGP, so it is not real option).
 
> > and VT82C686 (rev 22) ISA bridge. I tried to request documentation
> > of 694X from VIA, but I did not heard from them. They have probably
> > some secrets hidden in their hardware...
> 
>  They wan't to keep the competition from being bug-compatible, it would
> seem...

Yeah. Just do not read video memory when another CPU starts. I'll try
disabling cache on both CPUs, maybe it will make some difference, as
secondary CPU should start with caches disabled. But maybe that it is 
just broken AGP bus, and nothing else. But until I find what's really
broken on my hardware, I'd like to leave 'udelay(300)' in.

(*) When I was calling directly 
vt_console_print(NULL, "Message1\n", 9);
vt_console_print(NULL, "Message2\n", 9);
instead of printk, I got
Message1
Messag<0x..><0x..><0x00><0x80><0x..><0x80><0x..><0x80>...
- wrong text with wrong length, so it probably started fetching garbage 
instead of string as soon as second CPU started (no, it did not race due 
to missing console_lock; before first printk() secondary CPU should fill 
whole screen with letter '2'. It did not).

(**) When I had '*p = i; *p' in loop, from visual inspection it was
dying in range i=0x1380-0x13FF (blue background, cyan letter with diacritics).

End of guessing.
                                            Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-19  0:33 Petr Vandrovec
@ 2000-12-18 23:51 ` Alan Cox
  2000-12-19  5:59 ` ferret
  2000-12-19 18:30 ` Maciej W. Rozycki
  2 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2000-12-18 23:51 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Maciej W. Rozycki, Kernel Mailing List, mingo

> Yeah. Just do not read video memory when another CPU starts. I'll try
> disabling cache on both CPUs, maybe it will make some difference, as
> secondary CPU should start with caches disabled. But maybe that it is 
> just broken AGP bus, and nothing else. But until I find what's really
> broken on my hardware, I'd like to leave 'udelay(300)' in.

In the case where it boots does it also report mismatched MTRRs ??

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
@ 2000-12-18 19:19 Petr Vandrovec
  2000-12-18 18:44 ` Maciej W. Rozycki
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2000-12-18 19:19 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kernel Mailing List, mingo

On 18 Dec 00 at 18:18, Maciej W. Rozycki wrote:
> On Mon, 18 Dec 2000, Petr Vandrovec wrote:
> 
> > No. Without udelay() before first printk() it just does not boot on my
> > motherboard. There were two choices: either remove all printk() from
> > these loops (define Dprintk to null), or add udelay(x), where x >= 200,
> > before first printk. I sent patch twice to linux-kernel, and to 
> > mingo@redhat.com, and nobody said anything against it.
> 
>  I see.  But are you sure this is the right fix?  You may be covering
> the real problem with this arbitrary delay.

It is possible. But it is hard to track, as it works with serial console,
and it is not possible to paint characters to VGA screen, as vgacon uses
hardware panning instead of scrolling :-( And if it dies, shift-pageup
apparently does not work... And filling whole 32KB with some char
does not work, as it changes timing too much...
 
> > analyzer (or if I should come with motherboard), I'm willing to continue
> > testing. But current idea is that inb/outb done by cursor positioning
> > code is incompatible with something else done in secondary CPU startup.
> 
>  Have you tried putting explicit display adapter (other ISA) I/O accesses
> after sending the IPI to see if they trigger the problem?  IPIs are

No, I'll try. It occured with either AGP (Matrox G200/G400/G450) or
PCI (S3, CL5434) VGA adapter. I did not tried real ISA VGA...

> > Without delay() both CPU die, and board does not react to anything except
> > hard reset anymore (and sometime it does not react even to hard reset; lookup
> > for my messages during last week).
> 
>  Now THAT is weird.  It might mean a chipset bug.  Still no idea how an
> inter-APIC message might trigger it -- it completely bypasses MB

Yes. I could understand if I had to place bigger udelay() after INIT IPI,
as this can cause some specific PIII initialization and Intel says that
there should not be any MESI traffic during this init (at least I understand
it that way). But after startup IPI it should just start executing code...
I tried to put 'wbinvd' here and there, but it did not make any change,
only udelay() between startup IPI cmd and first printk() did.

> chipset...  Hmm, maybe not...  Is your I/O APIC discrete (like Intel's
> 82093AA) or integrated?  It appears there are vendors manufacturing I/O
> APIC clones and this may imply new problems, sigh...

I have no idea. I know that board has VT82C694X (rev c4) host and PCI bridge,
and VT82C686 (rev 22) ISA bridge. I tried to request documentation
of 694X from VIA, but I did not heard from them. They have probably
some secrets hidden in their hardware...
                                        Best regards,
                                            Petr Vandrovec
                                            vandrove@vc.cvut.cz
                                            
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Startup IPI (was: Re: test13-pre3)
  2000-12-18 19:19 Petr Vandrovec
@ 2000-12-18 18:44 ` Maciej W. Rozycki
  0 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2000-12-18 18:44 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Kernel Mailing List, mingo

On Mon, 18 Dec 2000, Petr Vandrovec wrote:

> It is possible. But it is hard to track, as it works with serial console,
> and it is not possible to paint characters to VGA screen, as vgacon uses
> hardware panning instead of scrolling :-( And if it dies, shift-pageup
> apparently does not work... And filling whole 32KB with some char
> does not work, as it changes timing too much...

 Just disable the problematic printk()s for making tests (you may just
undefine APIC_DEBUG in include/asm-i386/apic.h) -- we already know what is
going to be printed here. ;-)

> No, I'll try. It occured with either AGP (Matrox G200/G400/G450) or
> PCI (S3, CL5434) VGA adapter. I did not tried real ISA VGA...

 Oops, I've forgotten there exist non-ISA display adapters. ;-)  Just try
if accessing one bus or another changes the behaviour. 

> Yes. I could understand if I had to place bigger udelay() after INIT IPI,
> as this can cause some specific PIII initialization and Intel says that
> there should not be any MESI traffic during this init (at least I understand

 Hmm, weird -- for integrated APICs an INIT IPI is about the same as
shutdown apart from the fact an NMI won't wake up a CPU (that might
actually be the local APIC not passing NMIs to the CPU in this case,
though). 

> it that way). But after startup IPI it should just start executing code...
> I tried to put 'wbinvd' here and there, but it did not make any change,
> only udelay() between startup IPI cmd and first printk() did.

 Hmm, a startup IPI is rather fast so the code just after issuing it may
somehow interact with the application's CPU trampoline.  But try to
disable CONFIG_X86_GOOD_APIC, yet (you may configure for classic Pentium,
for example), and see if that changes anything (it shouldn't, but who
knows...). 

> I have no idea. I know that board has VT82C694X (rev c4) host and PCI bridge,

 Just look at the board and search for an I/O APIC chip. ;-) 

> and VT82C686 (rev 22) ISA bridge. I tried to request documentation
> of 694X from VIA, but I did not heard from them. They have probably
> some secrets hidden in their hardware...

 They wan't to keep the competition from being bug-compatible, it would
seem...

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2000-12-21 12:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-19 18:49 Startup IPI (was: Re: test13-pre3) Petr Vandrovec
2000-12-19 20:36 ` ferret
2000-12-20  3:37   ` ferret
  -- strict thread matches above, loose matches on Subject: below --
2000-12-20 20:29 Petr Vandrovec
2000-12-21 11:54 ` Maciej W. Rozycki
2000-12-19 21:18 Petr Vandrovec
2000-12-20 18:52 ` Maciej W. Rozycki
2000-12-19 18:03 Petr Vandrovec
2000-12-19 18:27 ` Alan Cox
2000-12-19  0:33 Petr Vandrovec
2000-12-18 23:51 ` Alan Cox
2000-12-19  5:59 ` ferret
2000-12-19 18:30 ` Maciej W. Rozycki
2000-12-18 19:19 Petr Vandrovec
2000-12-18 18:44 ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).